The objection which will occur to those, Lord help them, who have had some statistical training is that “increased” means a combination of “linear increase” and “significance.” These objections, as we’ll next see, are chimera, but the fault they are made lies with me. Mea culpa! I hereby accept blame for the poor statistical education most people receive. We statisticians often do a terrible job teaching our subject to outsiders (ask any student and they will agree with this). We know we do poorly because scarcely anybody remembers what these and other statistical concepts mean once they leave the classroom (however, their ignorance rarely affects their confidence). For my penance, this article of clarification.
Suppose you say that “increased” meant that the data did not decrease in a statistically significant linear fashion. That is, you are willing to allow that the actual data “had a downward trend”, but that this trend—by which you mean a straight line drawn through the data—was not “significant,” and that therefore an “increase” of some kind was still a possibility. The data is shown below, with a regression line drawn through.
First we have to focus on eq. (13). We must start, continue, and end with the idea firmly in mind that the actual data did not in fact “increase” (by our definition).
Second, the regression line is a model: call it Mr. It sounds as if we want to compute
(18) Pr(X decreased | Mr),
but this is not what people want when they think about classical statistics. If we mean by “decreased” the opposite of “increased”—that is, that X went down more often than it increased or stayed the same—we can calculate (18) (or any other function of the observed data), but nobody does. They instead calculate one of two different things, depending on whether they are a frequentist or Bayesian.
Before we get to that, we first have to understand what Mr means. We don’t have to get overly specific; all we have to know is that Mr is indexed by unobservable parameters, only one of which (for this simple regression) has to do with the “trend”: call this parameter θ. It helps to think of it as the slope of the line we drew. High school geometry tells us that if θ > 0, then the line will go up, and that if θ < 0 then the line will go down. If θ = 0 then the line will be flat.
A frequentist will calculate
(19) Pr( F(X) > f(X) | Mr, θ = 0),
where f(X) is an ad hoc function of the observed data, F(X) is the same function over data never seen, and where both are subjectively chosen from a very large supply of functions (usually the absolute value of the functions are taken). The probability assumes that the “experiment” that gave rise to X will be repeated indefinitely, and that for each repetition a new F(X) will be calculated. (19) is thus the probability of seeing a larger F(X) than the actual f(X) in these repetitions, assuming Mr is true but with its “slope” parameter set equal to 0. If (19) is “small” θ is said to be “not zero”; if instead (19) is “large”, θ is said to be 0 and the trend “not statistically significant.”
A Bayesian will calculate1
(20) Pr( θ < 0 | Mr & X & E),
which is the probability that the slope is less than 0, but still assuming that the model is true and given the old data and something called “E”, which is the evidence we need to tell us about the parameters before we see any data. We call this information the “prior”, but we needn’t spend any time on it, because happily for simple regression models like Mr the frequentist and Bayesian will agree about θ. For when (20) is “large”, (19) will be “small”, and θ will be declared not to be 0; and when (20) is “small”, (19) will be “large”, and θ will be declared to be 0.
It turns out that for this data, (19) is about 10-16, which is “small”, and (20) is about 1 – (19)/2, which is “large.” For this data, classical statisticians would announce, “X did in fact decrease” or “There was a statistically significant decrease in X.” A person ignorant of any statistics will have calculated (13) long ago and concluded that, yes, the data did in fact decrease, because it did.
But suppose instead that (19) was “large” and (20) “small”, but that (13) still holds. Then the statistician would say, “The decrease in X was not statistically significant.” Unfortunately for the statistician, this is not equivalent to “X did not decrease”, because we have already agreed that it did. This situation is thus somewhat akin to Congresspersons who say “The budget is being cut” when what they mean is “We are reducing the amount of increase, but there is still an increase.” Well, that’s statistics for you.
Now, as stated earlier, we could have computed (18) and said something about the probability of the actual observable X itself decreasing given the model2. (18) is not (19) nor is it (20) (but all assume the model is true), and in general these probabilities won’t match. It turns out that (18) is easy to calculate, but in order to do so we must first supply a guess of the parameters of Mr or, if you are a “predictive” Bayesian, to guess the parameters and then “integrate them out.” That is, Mr = Mr(parameters), so before we can compute (18) we need to plug in guesses of the parameters. Bayesians can actually integrate out all uncertainty in the parameters, frequentists do something else. It doesn’t matter which method you choose—pick maximum likelihood if you’re a frequentist, or say a BLUP estimator, or on and on, all the way to frequentist predictive techniques, which sometimes mimic the Bayesian predictive techniques. All we need understand is that a guess for parameters has been made and that the uncertainty in these guesses has been accounted for. (18) can then be calculated.
Suppose, after you’ve done this, (18) is “small”. We earlier saw that (18) was like (14) or (15), so just because (18) is “small” (or “large”), this does not change (13), which states that, given the observations and our definition of “decrease”, the data did in fact decrease. (18) is conditional on a model, which we assume is true. (13) is conditional on the observations, which we assumed was error free.
If you’re unhappy about this, you have two statistical options. The first is to change the model. We pulled Mr out of a hat anyway, so why not try different M? You’re bound to find one that agrees with what you wanted, which was to say that the data did not decrease. That is, you will surely, if you search hard enough, find an M which gives pleasing results for (18) — (20). After all, who said Mr was true? Nobody. We just assumed it. We’ll talk later about how to tell how good Mr is. All we have to understand here is that we can’t talk about “significance” or “trend” without assuming a model: you can’t have one without the others. It is an impossibility.
The second option is to include more data. Reject the original question, which was “Did X increase from time 1 to 156?”, or say it wasn’t really what you meant, and that you instead meant, “X increased over the longer term.” That’s certainly vague enough, and gives you room to play because it frees you from saying exactly what you mean by “longer term.”
But, invariably, there will be somebody who pins you to the wall and insists that you define, exactly, precisely what you mean by “over the longer term.” At this point you’re stuck3, for when you pick an exact start date, say X-n where negative indexes indicate time before 1, all of what was outlined above still holds. That is, all we need do is glance at the data and compute the new (13) and even the new (18) — (20) for this X-n. Depending on the start date, these four numbers will either agree or not: they will anyway change with every new start date. And for every new model M, including physical models. And all the while, (13) remains fixed and unbudgeable.
Oh, my. We still haven’t gotten to what to do if X is measured with error, or what the physical models mean, or what is a good or bad model. Stick around.
————————————————————————————-
1A Bayesian might calculate a “Bayes factor” instead of (20), but this difference does not matter here, because the conclusion would be the same. I mean, the interpretation (the meaning of what follows) would be the same.
2We might have to modify the notation of (18) to indicate whether we’re computing this probability before seeing X or after it.
3Good joke!
Before us are the observations X1 to X156. Recall we are assuming that each of these X has been measured without any error. Given that we observe X1 = 0.43, the probability X1 = 0,43 is 1 or 100%. Now let’s have some fun and ask some more complicated questions of our data. We can ask anything we like.
How about this? What is
(9) Pr(X156 > X1| Observations)?
Well, all we have to do is look. It appears from the plot, and a glance at the data confirms, that this probability is 0, or 0%. Why? Well, because X156 was not larger than X1. How could that cause any controversy?
But notice that (9) is an entirely different question than this one:
(10) Pr(X156 > X1| M1)
or this
(11) Pr(X156 > X1| M2),
It could very well be that (10) is greater than (say) 90% but (11) is less than 10%. It could even be, depending how rigid the models are, that (10) equals 1 and (11) equals 0; that is, the models could say the exact opposite of one another. And this is not a rare situation: indeed, we saw it last week when M1 was a climate model and M2 was a probability model. Don’t forget that we already agreed with eq. (3): different models produce different probabilities for Xi in general.
How about this question?
(12) What is the probability that temperatures (as measured by X) increased?
I don’t know and neither do you. Why? Because this question is ambiguous. Just what exactly does “increased” mean? Is it asking if X156 > X1? That’s a kind of increase. If so, then we can compute an answer (given our observations, it is 0, or 0%). Does “increased” mean that X increased at least once? If so, then we can compute an answer, which in this case is 1, or 100%, the opposite of the first definition. If you say that “increased” means “increased generally” then you have to supply the definition of “generally.”
Suppose “increased generally” means that X increased more often than it decreased or stayed the same. We can easily compute this, given our observations. And this is a different probability than if “increased generally” means that X increased or stayed the same more often than it decreased. If you do not see this, stop here and ponder until you do.
We could go on and on. “Increased” could be taken to me increased by a certain amount, or never decreased more than another fixed number. There are more possibilities, too, but we’ll skip them. All we need remember is that the probability of these questions can all be different. There is just no evading these kinds of definitions (even if you want to). So before you find your blood boiling and you hear yourself shouting “Denier!” make sure you understand what question you and your enemy are answering.
Let’s pick one of these definitions so that we can move on: let’s say that “increased generally” means “X increased or stayed the same more often than it decreased.” Now, what is (12) with this definition in mind? Well, all we have to do is glance at our observations: the probability, given these observations, is 0, or 0%. Let’s write this out in notation, to make it crystal clear:
(13) Pr( X increased | Observations) = 0.
Of course, (13) assumes our definition of “increased”, which to be precise we should write inside the parentheses; however, we’ll let it slide for ease of reading.
We seem to be done. We wanted to know if X “increased”, we agreed on a definition of what “increased” meant, we looked at the observations, and we knew the answer. We even agreed that each X is measured without error. Except for disagreements about the word “increased”, there would seem to be no room whatsoever for controversy.
There is, of course, plenty of space, but only because people confuse just what probability is being computed. Because some people, when hearing “What is the probability X increased”, instead of computing (13), compute this
(14) Pr(X increased | M1)
or this
(15) Pr(X increased | M2),
etc. There is no contradiction for using the past tense here, for we can just assume, as people do, that “increased” means “would have increased”. Now it is perfectly possible, as before, that (14) could be greater than, say, 90%, and that (15) could be less than, say, 10%—even though, in fact, X did not increase (by our definition). And we don’t have to settle for just these two models: we have an infinite supply of models, so that we can compute
(16) Pr(X increased | Mj)
for j = 1, 2, … Each (or most) of these probabilities will be different. Which is the correct one?
Well, that is a separate question. For real temperatures, I have no idea which model is best. Many people, however, claim they know exactly which model is the best and truest and most pure. Well, maybe they are right. They say that they have a model which takes into account, in just the proper way, forced versus unforced feedback, the effects of variable sunlight, the thermodynamics of this and that gas, and on and on. Suppose the folks making this claim are right and that their model is the “best”, where we can leave the notion “best” float for now. Further suppose that
(17) Pr(X increased | Mbest) > 90%;
that is, given the best model, the chance that X would have increased over this series is 90%. Given Mbest, this probability is true. But, alas, so is (13). It is still the case, no matter what (17) tells us, that (13) holds, that temperatures did not, in actual fact, increase. This is just too bad for Mbest. Since we are supposing that Mbest is best, all we have learned is that our model has rather severe limits and that X is not predictable to the extent we would like it to be. Tough luck for the model! Sometimes the universe is predictable and sometimes she isn’t.
But then, it is unlikely that Mbest really is best (just as it is even more likely that the model you, the reader, have in mind is best). I said I don’t know what model really is best, but I do claim to know, and I will shortly prove, that the way to think about temperatures is flawed, because we don’t consider that X is measured with error. We’ll tackle that next time. We’ll also talk about “significant” and “linear” increases, etc.
We’ve gone on and on about how to think about time series, but we are having trouble grasping some very simple ideas. The discussion here, and on other blogs, demonstrates there is a lot of confusion and plenty of misunderstanding. Also a complete lack of humor. Who would have guessed that something as banal as statistics could get so many people so excited?
The true test of an honest mind is how seriously it considers arguments that produce uncomfortable conclusions. This is not to say that uncomfortable conclusions are always right; clearly they are not. But in the case of how to think about time series, I am right and my enemies are wrong.
I rarely ask this, but you’d be doing us all a favor if you passed this series on to those in need of it. I’ll answer questions after the series is completed. I’ll Latex this up when it’s finished, sans asides, for easy and portable reading. Remember: be nice.
——————————————————————–
Below is pictured a time series. Imagine it is something to do with climate, say, monthly temperature anomalies. Let’s first suppose that each of the points on the picture are measured without error. That is, we are 100% sure that each point is what it is. The first value is X1 = 0.43. Given our observations, what is the probability that X1 = 0.43? It is 1, or 100%. And so on for each data point. If you find yourself disagreeing with me at this point, well, there is nothing I can do for you: we must remain forever at odds.
Now, something caused that data to take the values it did. Call this cause T. (Something causes every observation to take the values it does.) You must agree with this, too, or all is lost. T will be more or less complicated depending on what X is. If X is, say, global average temperature (anomaly), then T will contain everything that can change the temperature, even down to butterflies flapping their wings. T is the earth and sun, etc.
In real life, we rarely (if ever) exactly, precisely, down-to-each photon know what T is. But suppose we did. Then we can answer questions like this: what is
(1) Pr(X1 = 0.43 | T)?
It is 1, or 100%. T says, after all, exactly what causes each X, therefore if we know T we know before taking any observations what each X will be with certainty.1 Equation (1) is different than
(2) Pr(X1 = 0.43 | Observations),
which also equals 1, or 100%. In other words, we know (1) before we take observations, but we know (2) after. This is an important distinction. Okay so far?
Again, we hardly ever know T precisely in real life; we surely do not know it if X is any kind of atmospheric or oceanic temperature. We might guess, or use evidence compiled from various sources, to say that, although we cannot know T exactly, we can approximate it, i.e. we can model it. To be clear: no scientist claims to know T precisely, but all believe (I do, too) that we can approximate T by a model.
One person will say that the best model is M1, another will claim that it is M2, and so on. It will usually be the case that
(3) Pr(Xi = x | Mj) n.e. Pr(Xi = x | Mk)
where “n.e.” means “not equal”, x is some value, i is for the i-th value of X, and j and k are indexes over our collection of posited models. This should be no surprise: if instead in (3) there was equality for all i, j, and k, then there would be no difference in the models.
A sticking point for you might be using the language of probability to speak of physical models. It shouldn’t be. For one, probability is the language of uncertainty, and don’t forget that we don’t know T, we only guess that M is a good approximation of T, so we have to speak not in terms of certainty, but uncertainty.
Let’s take a fully deterministic model as an example:
(4) M = “Yi+1 = Yi + 2.”
From this we can ask, this (or any other question of the Ys),
(5) Pr( Y17 > Y12 | M )
which is 1, or 100%. There is no problem, therefore, using probability even though M itself has no probabilistic components. Again, if you fail to agree with this, we must part ways.
Let’s get back to X, which we are imagining has something to do with temperature. What we cannot ask is this: what is
(6) Pr(X1 = 0.43)?
There is no answer because we are not considering how X1 came about. We’re missing the stuff that comes after the vertical bar “|”. If we say X was caused as T said it was, then we have eq. (1). If we mean (6) to implicitly incorporate the observations, i.e. given we have already seen X1, then we have eq. (2). We must first supply a “premise” of how X1 came about: eq. (6) is therefore incomplete. We can ask, for instance, this:
(7) Pr(X1 = 0.43 | M1),
or similar questions for every different model we are considering.
The only other point you must understand, before we move on, is that usually
(8) Pr(Xi = x | Observations) n.e. Pr(Xi = x | M),
for most (or even all) i and for any M which is not T. That is, once we have seen what Xi is, the probability of Xi taking the value it took given we see Xi is 1 or 100%, but the probability the model predicted this value is in general something less than 100%. With me? I hope so, else we will have troubles with what follows.
—————————————————————————————-
1If your objection is that T might contain “randomness” (quantum or “normal”), wait until Part IV.
Update Link to the data (CSV file), for those who like to touch.
Somebody attributed to Max Planck, a constant1 source of wisdom, the saying that science advances funeral by funeral. This is a pithy condensation of his more famous quotation:
A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.
Every scientist laughs along with Planck, thinking to himself how silly were those people of yore who refused to believe what was now so obvious. When scientists gather they often tell each other cautionary tales about the simpleminded stubbornness of their forefathers. Things were different then, always then. Not for a moment does it cross their minds that Planck’s wisdom could possibly apply to them.
Yet Planck was wrong: at least, if he meant that nobody ever changes his mind. Some do. But only very, very few. Einstein famously did not change his. And it is no refutation to say that perhaps Einstein will be right after all, because that would imply that Einstein’s intellectual enemies were wrong not to have changed their minds.
It is also clear that Planck had in mind foundational questions. The more a new idea conforms to whatever the current consensus in science is, the more likely it will be accepted. The new idea says, “What you believe is indeed so”, which is comforting. But the stronger a novel philosophy thumps the base of the Old Way, the more vociferous the opposition. It is saying, “You are wrong,” words few can stomach. The Wegeners and Semmelweises who arise occasionally must expect their thrashings.
Once more we have Planck:
The man who cannot occasionally imagine events and conditions of existence that are contrary to the causal principle as he knows it will never enrich his science by the addition of a new idea.
It is true that new foundational ideas are radical departures, as Planck suggests, but then so are the flood of crank theories that wash over science. An idea’s novelty is thus not an argument in its favor. To think it is is to employ what Philosopher David Stove called the Columbus argument. They did all laugh at old Christopher, and laugh wrongly, but they were right to dismiss the vast majority of novel thought.
We often hear—it is part of the standard propaganda folder—that science is self-correcting. Is it? Well, this statement is either always true, or it is always false, or somewhere in between. If we claim it is always true, we claim too much, because it is to claim all wrong ideas will always be corrected, and where is the proof for that? If you believe this, you do so on faith and in opposition to history. After all, science has at times not progressed but actively regressed. So how do we know that some of our beliefs will never be challenged successfully? It is logically possible that we hold certain ideas that are false but we can never prove false.
Then it cannot always be false that science is self-correcting, because, as is obvious, science has often progressed. So it must be somewhere in between: science often but not always and not in all places self corrects. And this says nothing about the rate at which science self corrects. For trivial, small facts, the correction is quick, as any working scientist will tell you. Yet as Planck told us, self correction is painfully, even fatally, slow for foundational ideas.
The test for regression, the opposite of self-correction, appears to be how closely aligned a science is to politics. Trofim Lysenko leaps to mind as the man who halted biology and ordered it to about face, and then marched it along the path dictated by his socialist masters. On a smaller scale, there was that infamous bill proposed (but not passed) in Indiana which would make it the law of the land that the circle could be squared (and thus the value of pi should be changed). In recent years, we have had a spate of frauds which otherwise would have been caught had the results the frauds put forth not been what their audience wanted to hear.
Science is like the branch of a tree that twists and grows in the direction of the strongest sunlight and nutrients, i.e. money. But not only that. Scientists are just people and they like to get along with others, especially colleagues. They will thus often hold an idea more strongly just because others hold it, too. Which brings us right back to Planck.
——————————————————————–
That title is lifted from Popular Science’s brief article. The idea is that scientists—as philosopher Christopher Essex reminded me, just like doctors and accountants and businessmen and engineers and everybody else who offers opinions for consideration already do—should put their money where their pronouncements are.
Pop Sci reminds us of how Bernardo De Bernardinis, the vice-director of Italy’s Department of Civil Protection
told reporters that citizens should not worry, and even agreed with a journalist who suggested that people should relax with a glass of wine.
Six days later, a major earthquake struck L’Aquila, a city in Abruzzo, killing more than 300 people. Soon after, citizens requested an investigation into the panelists’ findings, and the public prosecutor obliged. De Bernardinis and the panelists were charged with manslaughter and now face up to 15 years in prison. The L’Aquila judge who determined that the case could go to court said the defendants provided “imprecise, incomplete and contradictory information” and effectively “thwarted the activities designed to protect the public.”
That’s a lotta years! All for a busted forecast. (We covered the story here; it’s more complicated than the quotation suggests.)
And then, “In 1989 a scientific advisory group reported that it was unlikely that BSE could be transmitted to humans. Through the early 1990s, government ministries reassured the public that it was safe to eat homegrown beef.” It wasn’t. People died, grief ensued, but suing lawyers were not unleashed. (Government bureaucrats were instead.)
South African lawmakers are proposing to make forecasting the weather illegal unless one has a license to do so (link). Easy to scoff at this one, since as Mark Twain said, etc. Then, think how seriously weather forecasts are taken in, say, Oklahoma. Somebody there says a tornado’s a comin’, and people take action. Expensive action, too. So the forecaster better be damn sure of himself. And woe betide the weatherman who says all is well when it isn’t.
But sue him for a mistake? Well, why not? Much as we hate to encourage unslakeable lawyers and their symbiotic gelatinous bottom feeders like JG Wentworth who will buy out (among other things) structured settlements won in law suits, these scavengers (sometimes) provide the useful function of eating the necrotic flesh of capitalism.
Instilling the fear of buzzards into scientists might sharpen their wits. It might for instance stem the flow of purple, hyperbolic prose and Chicken Little-ing from the environmentalist crowd if they knew that their words were going to be checked against the facts.
But how do we decide who gets sued? Should we sue those guys who a few months ago predicted, via statistical modeling, that the neutrino had no mass? Should we pull to the bar the group who got us all excited, via statistical modeling, that the Higgs was finally found, but who were (probably) premature? What about all the champagne that was bought and probably consumed after the first happy but ultimately wrong announcement? Should the crew who manned the accelerator be legally responsible for the bill?http://wmbriggs.com/blog/wp-admin/post-new.php
Can we sue those fellows who swore that brief exposure to a 72 × 45 pixels American flag turned people into Republicans? They learned this “fact” via statistical modeling. And what about all those especially earnest folks at the EPA who will protect us no matter what, who create statistical models aplenty proving that exposure to some barely detectable chemical will increase our risk of cancer from 20% to 20.001%? Can we haul them off to jail after it proves that their fretting was false?
How about the climatologists who swore, by golly and by gum, that the temperatures now should have been warmer than they are? Tar and feather them, litigationally speaking? After all, lots of money was spent believing these forecasts were accurate. Who should pay now that we have learned that they weren’t? We needn’t arrest James Hansen, incidentally, because he’s developed the habit of hauling himself off to jail from time to time.
Why, if we were allowed to sue scientists for failed predictions, the courts would have to run twenty-four hours a day, every day of the year, even Christmas, and that would be just for the sociologists, like those guys who claimed, via statistical modeling, that brief exposure to a 4th of July parade turned people into Republicans. We’d have to build special holding pens for the climatologists.
Scientists have a special right to be wrong, don’t they? They’re better than ordinary people somehow, aren’t they? If we held them accountable, they might be too scared to think of new theories. And the world always needs new theories. And, hey!, somebody might even sue me!
————————————————-
Thanks to an anonymous reader for suggesting this topic.
See this real-life possibility of a meteorologist being sued from Brazil, a case where lots of money was involved.
It is a sure sign that Sanity has packed her bags and headed for the door when otherwise sober scientists begin slinging around terms like “denier” and “denialist.” Language like this displays willful, pretended, or real ignorance of the historical context of these words. Anybody who talks like this makes himself an ass. They’s fightin’ words which start any discussion on an angry footing, their presence a certain indication we are dealing with zealotry, not science.
Let’s look again at the claim made by the scientists at the Wall Street Journal, over which many have popped their corks:
The lack of warming for more than a decade—indeed, the smaller-than-predicted warming over the 22 years since the U.N.’s Intergovernmental Panel on Climate Change (IPCC) began issuing projections—suggests that computer models have greatly exaggerated how much warming additional CO2 can cause.
There are two claims made here. Given the observational evidence we have, both claims appear true. The first (A) is that for the last ten years it has not grown warmer. Since it has grown warmer in some places and colder in others, this is evidently a claim about some global average and not any individual station. The second claim (B) says that the IPCC forecasts have been systematically too large: it is also concerned with some global average.
Both of these claims are quantitative and subject to easy verification. A person’s politics surely has no bearing on whether they are true or false claims. Now, the “global average” referenced is not a static thing, in the sense that, say, measurements from identical (and identically situated) thermometers at fixed locations are averaged together and called (arbitrarily, of course), the global average. Instead, the global average as it is operationally defined mixes sources and locations freely each year (and even within years). Therefore, when the “average” is computed there will be some uncertainty in it. Further, the uncertainty is larger in times historical than in times present. (There is even some uncertainty at individual locations, because no measurement apparatus is perfect, but this is generally small, though not always, especially in the past or when using proxies: see this series.)
The BEST people, for instance, recognized this and attempted to account for measurement uncertainty by speaking not just of averages, but of averages plus-or-minus. We can, and I did, argue over the better way to calculate and display this uncertainty. All we need to understand here is that some techniques underestimate this uncertainty. Actually, we don’t even need to agree about that: but we do need to see that some uncertainty is present, however small.
This is necessary because if we make claim (A), as the WSJ fellows did, we need to take uncertainty over the global average into account or we cannot know whether the claim is true or false. It is at this point when a lack of understanding of statistics can become a real hindrance. Sloppy language also hurts immeasurably. Let’s work through this slowly.
Suppose we have ten years of uncertainty-free global average temperature measurements. We can line them up and ask questions of this series. Was the temperature ten years ago warmer or colder than the temperature this year? All we have to do is look: it will be true or false at a glance. Was the temperature nine years ago warmer or colder than this year? True or false at a glance. And so on.
What does this mean in the context of claim (A)? Well, (A) says that temperatures have not gone up over the last decade. To verify this, all we need do is look to see if any of the temperatures of the last decade are lower than they are this year. If any are, the claim is false. If none are, the claim is true.
Maybe. Because claim (A) can also be taken to mean that at no time over the last decade have the temperatures increased (they could have stayed constant from year-to-year). Again, we can verify this claim with a glance at the data.
Which of these definitions is right? Evidently neither, because we all understand that the temperatures have some uncertainty in them. Because of that, we cannot just look at the data to say whether it has gone up or down; we instead have to speak of changes in probabilistic terms. And that means hauling in some kind of model.
The simplest (but not so good) model is to imagine each year’s data is irrelevant to knowing each other years’ data. That is, we take this year’s data and display it as an average with so, a plus-or-minus attached to indicate our uncertainty in it. That plus-or-minus can only come from some kind of probability model, meaning that the range of uncertainty will change when the model changes. Which is the best and most proper model? Nobody knows. But let’s imagine we all agree on one, such that displayed before us is a temperature series of averages and plus-and-minuses.
Now, if claim (A) means that temperatures this year are less than or equal to temperatures ten years ago, then we can make a comparison as before, but our comparison will be accompanied by a measure of uncertainty. Using predictive techniques (yes, this is the proper word: see this series), we can ask questions like, “Given the data and assuming our model is true, what is the probability this year’s temperature is less than or equal to temperatures ten (or nine, etc.) years ago?” Notice that this is not the same as a “t-test” or any other kind of statement about parameters of probability models: it is a statement about observable temperatures.
Or, if claim (A) means that temperatures did not increase even once over ten years, then we can get the probability of this just as simply. In support of either version of claim (A), I said that we cannot know with probability greater than 90% that temperatures have increased (over this last decade). In other words, it is likely that claim (A) is true.
This is so using the probability model I indicated. But what if we instead change the model to a linear regression—i.e. a straight line—drawn through the data? Well, we could go through the same steps and ascertain claim (A) in light of this model. But before we can begin we have several things to decide. Why a straight line? Just because it’s easy? Lazy, that. From what year do we start? See this post for the ways that choice can lead you wrong. Do we start with a date (as I joked) in the Jurassic? Or, for fun, in 1973? Every different start date will give a different answer. I will repeat that: every different start date will give a different answer. It is also a stretch, to say the least, to assume temperature always has been increasing in a straight line from whatever start date we pick. (Before the politicization of this subject, every physical scientist would have agreed with that last statement.)
But suppose we do agree on a date: 1964, say, a very fine year. Are we done? No, because we cannot forget that the data that goes into the straight-line model is still measured with uncertainty. We must, just as we did in the first model, account for this uncertainty. That means drawing any kind of naive line (even bold red ones) guarantees over-certainty.
Even if we were to agree on a date—in real life we do not—we could use a model of the measurement error, incorporate that into the model of straight-line change, and then assess claim (A): it is still probably true.
The best thing to do is to model the data in an intelligent way, taking into account the correlations of year-to-year (both auto-regressive and moving average), the measurement error, etc., etc. Hard work! As Doug Keenan has pointed out (often), it’s too much like work for anybody to do. I’d do it myself, but my check from Big Oil hasn’t yet arrived.
Whatever else you do in life, you must not, you must never, look at the pretty red (or blue, etc.) straight line you have just drawn and claim it is, or think of it as, the real data. (It is only in climatology where I have seen scientists forget error bars, and then pitch a fit when somebody points out the omission. You at least have to put predictive, and not parameters-based, error bars on the line, even ignoring measurement uncertainty of the data.)
What about claim (B)? Also likely true, as is generally recognized. We still have to incorporate the uncertainty in the global temperature measurements—there is no or little uncertainty in the forecasts—but this is no different than before.
What about the counter-claim (C) that the 2000′s where the “warmest years on record” or the like? It is trivially false. The 2000s simply were not the warmest. Four billion years ago, Earth was much hotter. “Wait! It’s obvious we weren’t talking about billions of years ago. Cheater! Denier!” Well, it isn’t obvious. What years did you have in mind as comparators? Ah, that’s the real question, isn’t it.
Did we mean just the last century? The last 1000 years? The last 10,000? What? You must supply a starting year. To make the claim (C) that it’s hotter now than before, you must tell us what you mean by before. If you say “before” means the last ten years, then claim (C) is identical with claim (A). If you say the last 200 years, then you have to do what BEST tried and incorporate the non-parameter error bars, otherwise there is no way to compare what happened a century ago with what happened last year. Obviously, the further you go back, the larger those uncertainty bars become, therefore the more difficult it becomes to claim (with any certainty) that now was hotter than then.
As I often say, over-certainty abounds in this field. People speak of models (statistical and physical) as if they were truth, as if the data that goes into them were granted some kind of special immunity from ordinary criticism. And when the critiques come, that’s when the asinine language breaks out. All sense of humor evaporates.
You would think that because both claims (A) and (B) are likely true (and claim (C) is unproved or likely false) that we have found a reason to celebrate! Perhaps our worst fears won’t be realized after all. This is good news! Wouldn’t it be great if we really did over-emphasize feedback in climate models and that whatever changes we do make to the climate are easily mitigated and not as horrific as posited?
Why so glum that things are so good?
Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.
Update It is imperative that all read this series, where I describe just how so many people make mistakes. Those below who have been shouting the loudest are most in need.
Forget climatology. It’s time for something really controversial. How to fill in those grid squares on the Super Bowl office pool. An example of one is shown below.
The full details are over at Edgehogs. Be sure to also make your picks for the game itself. The most especially prescient personage will pull home an electronic gizmo gratis.
Super Bowl Grid
Sometimes you get to see the labels on the grid rows and columns first, but sometimes you don’t and those labels are written in after everybody has bought their box. If you get to see them first, then use the strategy outlined. If you don’t see them until after, then at least you will be able to help figure your chances before the game.
Not surprisingly, it turns out some squares were seen more often than others. The green X’s show the best locations, and the red O’s show the worst.
Perhaps we should have some sort of poll: how many will watch the game? how many watch more for the commercials? how many genuinely follow the game? how many remember that spring training, and therefore a return to real sports, is only a month away?
Or maybe we should have a game-day temperature forecast contest?
Remember when I said how you shouldn’t draw straight lines in time series and then speak of the line as if the line was the data itself? About how the starting point made a big difference in the slope of the line, and how not accounting for uncertainty in the starting date translates into over-certainty in the results?
If you can’t recall, refresh your memory: How To Cheat, Or Fool Yourself, With Time Series: Climate Example.
Well, not everybody read those warnings. As an example of somebody who didn’t do his homework, I give you Phil Plait, a fellow who prides himself on exposing bad astronomy and blogs at Discover magazine. Well, Phil, old boy, I am the Statistician to the Stars—get it? get it?1—and I’m here to set you right.
The Wall Street Journal on 27 January 2012 published a letter from sixteen scientists entitled, No Need to Panic About Global Warming, the punchline of which was:
Every candidate should support rational measures to protect and improve our environment, but it makes no sense at all to back expensive programs that divert resources from real needs and are based on alarming but untenable claims of “incontrovertible” evidence.
Plait in response to these seemingly ho-hum words took the approach apoplectic, and fretted that “denialists” were reaching lower. Reaching where he never said. He never did say what a “denialist” was, either; but we can guess it is defined as “Whoever disagrees with Phil Plait.”
The WSJ‘s crew said, “Perhaps the most inconvenient fact is the lack of global warming for well over 10 years now.” This allowed Plait to break out the italics and respond, “What the what?” I would’ve guessed that the scientists’ statement was fairly clear and even true. But Plait said, “That statement, to put it bluntly, is dead wrong.” Was it?
Plait then slipped in a picture, one which he thought was a devastating touché. He was so exercised by his effort that he broke out into triumphal clichés like “crushed to dust” and “scraping the bottom of the barrel.” You know what they say about astronomers. Anyway, here’s the picture:
See that red line? It’s drawn on a time series—wait! No it isn’t. Those dots are not what Plait thinks they are. They are not—they most certainly are not—global temperatures. Each dot instead is an estimate of global temperature: worse, most dots are also different kinds of estimates from each other. That is, the first dot was estimated using data X and method A, and the second dot was estimated using data Y and method B, and so forth. Well, maybe the first and second dot were the same, but older dots are different than the newer ones.
With me so far? All you have to remember is these dots are estimates, results from statistical models. The dots are not raw data. That means the dots are uncertain. At the least, Plait should have shown us some “error bars” around those dots; some kind of measure of uncertainty.
Now—here’s the real tricky part—we do not want the error bars from the estimates, but from the predictions. Remember, the models that gave these dots tried to predict what the global temperature was. When we do see error bars, researchers often make the mistake of showing us the uncertainty of the model parameters, about which we do not care, we cannot see, and are not verifiable. Since the models were supposed to predict temperature, show us the error of the predictions.
I’ve done this (on different but similar data) and I find that the parameter uncertainty is plus or minus a tenth of degree or less. But the prediction uncertainty is (in data like this) anywhere from 0.1 to 0.5 degrees, plus or minus. That is, prediction uncertainty is about five times larger.
I don’t know what the prediction uncertainty is for Plait’s picture. Neither does he. I’d be willing to bet it’s large enough so that we can’t tell with certainty greater than 90% whether temperatures in the 1940s were cooler than in the 2000s. And also such that, just as the WSJ‘s scientists claim, we can’t say with any certainty that the temperatures have been increasing this past decade.
In other words, the scientists were right and Plait was wrong. Or, as he might phrase it, he blatantly misinterpreted long term trends. Notice old Phil (his source, actually) starts, quite arbitrarily, with 1973, a point which is lower than the years preceding this date. If he would have read the post linked above, he would have known this is a common way that cheaters cheat. Not saying you cheated, Phil, old thing. But you didn’t do yourself any favors.
Somewhat amusingly, Plait ends his semi-random venting by telling us that Michael Mann has been “tweeting furiously” about this. Good grief! This isn’t helping his case. Mann’s understanding of statistics may be likened to an overly enthusiastic undergraduate who left the lecture early.
———————————————————————————
P.S. Hey, Phil. Since you brought it up: the total consideration I’ve received for my work in global warming from Big Oil (or anybody) is number so small that dividing by it is forbidden. How much do you get for your blog or other environmental work, including government funds?
P.P.S. I didn’t forget about that “warmest years on record” stuff. Those “warmest years” are still estimates and have to be compared to the old data, which itself must be accompanied by uncertainty measures. And anyway, it has been much hotter in the past than it is now. Jurassic anybody?
Update Thanks for all the comments, everybody! 100+ and no signs of slowing. I will read them in all, in time, but for now, since many of them repeat odd claims and misunderstanding of statistical methods, let me point you to the BEST project posts (here and here). BEST had parameter-based error bars, but not predictive ones. But some acknowledgment of uncertainty is better than none! Also look under the Start Here tab and pay attention to the smoothing time series posts, the homogenization of temperature series posts, and read this weeks’ All of Statistics series. You may also read, inter alia, the Probability Leakage post which describes the Bayesian predictive approach I am using. A lot of confusion and frank unfamiliarity for some of you.
Update to the Update See this brand-spanking new post that clarifies some of the statistics some of you couldn’t be troubled to look up.
Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.
Update It is imperative that all read this series, where I describe just how so many people make mistakes. Those below who have been shouting the loudest are most in need.
(B) New data
It might surprise you, but in classical (both frequentist and Bayesian) practice, if we expect to see new X, the procedure is almost always no different than the procedure when we expected no new data. That is, an M = Mθ is proposed, calculations are done, certain θ are set to 0, and Mθ’ is then said to describe X, finis. In the vast majority of cases of statistical analyses, Mθ’ is just assumed true; discussion centers around the parameters, and uncertainty all but disappears.
Contrast this with the procedure physics, chemistry, or even mathematics usually follows. Some evidence E is used to proposed a limited set of M—usually a historical M0 and one or more new theories, M1, M2, etc. These are all, as in classical statistical practice, assessed in light of the historical X. These M also sometimes have unobservable parameters (think of Planck’s constant, etc.) which are guessed using statistical methods. Discussion occurs over these parameters, but only when M has been verified (to some extent) by its “closeness” to historical X.
In many of the physical sciences, the analysis does not stop at discussing the models’ closeness to historical data, nor is the focus just on the parameters (usually). These sciences instead use the models to predict new data: these predictions will say that new X, given each M, will take certain values at such-and-such probability. It is usually the case that the probabilities of these new observations differ for each model (if they did not, the models cannot be distinguished).
Time passes, new data is collected, and the models are assessed in light of the predictions which were made. The models are then ordered by how well they predicted this new data. “How well” is a subjective measure: it can and does differ, meaning that models might be useful to some but not to others. Verification can be done formally, as in statistics, by calculating the probability each model in the set is true, in light of the new X, old X, and the given E. But usually, this ordering is done informally (this informality does not invalidate the findings; when I opened I claimed not all probabilities can—nor should—be quantified).
These new models are not always accepted; often they are rejected (even mathematical proofs are sometimes found to have flaws). Perhaps newer still models arise from the ashes of these rejects, but these phoenixes are subject to the same pitiless confirmation process. This procedure has worked out rather well for these fields (excepting climatology, for its lack of verified forecasts). We are not certain sure each physical model is true, but most of them are very probably sure.
Now consider the so-called softer sciences like sociology where the situation is markedly different; classical statistical procedure (both frequentist and Bayesian) is used as if no new data were expected, as explained. Because the models are never tested to make predictions, the models proposed by individuals are taken as true. The data is used, at best, to say something about the unobservable parameters of M. Over-certainty abounds.
The conjectures in these fields are rarely put to the test of verification. When new data is anticipated or is collected, the statistical procedure begins anew, as if the old data did not exist. The form of the model is the same, and discussion again centers on parameters. Worst of all, the certainty that is felt to lie in the parameters is said to lie in any new data that is expected. If new data is sought it is often collected only to confirm the M. This search is usually rewarded, not necessarily because the M assumed true are true, but more because of the wisdom in the saying, “Seek and ye shall find.” Confirmation bias creeps in and sticks to everything.
Contrast again the situation in the physical sciences. New data is sought that will confirm the M, but also sought is data that would disconfirm or invalidate the M. I need only say the words “cold fusion” to show how rigorous and routine this process is. This search does not happen, or happens rarely, in the soft sciences: people there are comfortable sticking with their preconceptions. Because they expose their models to new data, the physical sciences are usually (a word which implies “but not always”) trustworthy: ships float, cameras take pictures, lasers cut, and so on. The soft sciences do not have such a fund of success to point to.
The one area of statistics in which future data is considered is time series, where it is acknowledged from the start that X is part of a stream of data. Unfortunately, the procedure differs little from ordinary statistics except that it is acknowledged that the models belong to a more limited class than in ordinary statistics. Discussion still centers on (and ends with) the parameters (see this post for what can happen). The models can be, and are to a greater extent than usual, put to the test, but not still not often. The models are just assumed true, the parameters are said to be “it.”
All statistical procedure should be seen as “time series;” at least, when new data is expected, but in the way the physical sciences treat old and new data. Models should be put to rigorous, unforgiving tests of validation. Except when absolutely necessary (which will be rare times indeed), discussion should move away from parameters and focus on uncertainty of actual observables (or testable conjectures). This is the only way to eliminate over-certainty.
(A) No new data (cont.)
If we want to know how that data arose, and we are not satisfied by X itself, we need to propose a model—a fully causal to fully probabilistic, to somewhere in between, M. This puts us in a jam because, for any X, there will not exist a unique model which explains X. That is, for any X, we can always create any number of M which explain X; for any X, we can always invent an explanation M (from fully to partially causal) for why X took the values it did. It matters not how fanciful M is compared to evidence not in X—in relation to some E not used to infer M—it only matters that such M exist (you could always say M = “Venusians caused X”, which to many E is absurd).
Anyway, in classic (frequentist and Bayesian) statistics, an M is proposed. We now have a problem, because if our model is indexed by parameters, M = Mθ, we have to supply a guess for the θ (possibly multidimensional). We usually provide this guess by using the X itself; but this is not necessary and a guess can be supplied via external evidence or subjectively.
Frequentist theory often begins (and ends) with a “plug in” guess of the parameters. The truth of the model is assumed, and inference about X is made indirectly by discussing the parameter guesses as if the guesses were certain. More often, a subset of the parameters is set to a subjectively chosen predetermined value; usually at least one of the θ= 0, but any number besides 0 may be (subjectively) chosen. It then computes
(2) Pr( T(X) | Mθ[0] ),
where Mθ[0] indicates the model with the predetermined value of the parameter(s) supplied and T() is any function of the data (T(X) is also a proposition). The function T() is subjectively chosen and is not unique; for any given Mθ and X, there are any number of T() that can be used, with each T() giving different answers to (2). This equation is called the “p-value”; thus p-values are not unique and are a function of the base model M, the values substituted into the parameters, and the “statistic” T().
Now, if (2) is (subjectively) thought “too large”, the guess of θ is then “confirmed” and then formally substituted into Mθ. Usually this means setting the relevant θ = 0 (but again, any number may be used). Surprisingly, this setting parameters (in the fixed M) to the pre-chosen values is the end result or goal of frequentist analysis. This result of this operation is said to explain X; that is, the discussion focuses on whether the unobservable θ were set to 0 or not.
Bayesian statistics inverts (1) and computes
(3) Pr(Θ | X & M),
where the M is taken to be fixed except for the value of the parameters, and Θ = “θ takes a specified value.” This is called the “posterior” and it may be derived in a formal way.
It is at this point that the typical Bayesian analysis matches frequentist procedure. That is, if in (3) some of the Pr ( |θ| > c | X & M) (where c = 0, typically) are “small”, then these θ are set to some (subjectively chosen) predefined level c (0 usually). Needless to say, what is “small” is subjectively chosen.
Once again, the M is taken as fixed and the goal is to say which of the θ should be set to their predefined levels (usually 0). The slight advantage the Bayesian analysis enjoys are two: (one) it eliminates the arbitrary step of choosing a T(); and (two) it allows probability language in discussing the parameters. But, in practice, at least for common problems, the Bayesian and frequentist end result is the same or similar, an Mθ whittled down to some Mθ’ where cardinality(θ) > cardinality(θ’).
To clean up loose ends, both theories will sometimes “tack on” a guess of the remaining θ, but this is usually a half-hearted effort. Probably because these guesses can never be checked (parameters cannot be observed). Anyway, X is said to be “explained” by Mθ’.
Recall that we are still in the case that we expect that no new X will obtain. We are using M to say how the only X we will have arose. We subjectively pick an M and then, if it is indexed by parameters, we go through a procedure to set some of these parameters to predefined levels, usually 0. We then announce to ourselves that our theory of how X arose is true or false depending on whether certain θ are set to 0 or not. Again, the Bayesian theory enjoys a slight advantage because it allows us to say with what probability these θ are near the predefined levels. Frequentist theory just states they are zero, period.
These analyses both assume the truth of M, which you might recall was what we wanted to know in the first place. Remember we already knew X and we were after the “best” M which explains X. But since there is no unique M that is “best”, we just have to (subjectively) pick some M, and we are left playing with its parameters. We picked an M and set some of its parameters to 0. The Mθ’ we are left with is said to be true. Since we will see no new data, we will never be able to confirm this.
Now this conclusion would be the same if we had started with a different model (necessarily with different parameters). This new model with a reduced set of parameters would also be claimed as the true explanation of X. There would be no way to check this claim, either.
We could on ad infinitum, claiming each new model is the “true” explanation of X. Remember: we can’t use how well any M from this inexhaustible list explains X, because we can always find many M which explains X perfectly, or to any level of closeness we desire.
So unless we are in a “jury trial”-type situation, where we have a strong E which delineates the set of rival models in advance, if we do not expect new data, there is no solution to finding “the” model which best explains X. Or, rather, the solution is to fix E (independently of X) so that the set of models is fixed in advance. But even then, unless we coalesce on one model which, given X, is true “beyond a reasonable doubt” there will always exist, well, reasonable doubt about which model is true.
Next time: new data.
Statistics is the collection and modeling of data. By “modeling” I mean using probability to describe our uncertainty in values that data may take. Statistics, then, is applied probability. Probability is the quantitative branch of epistemology. Data are propositions of the sort, “We observe X to take the value x,” where X is usually some tangible, real-world object. We use probability to quantify the chance these propositions are true, i.e. that X takes the values x.
When we observe data, we assume that something caused this data to take the values it did. We have from no to full knowledge of this causality, depending on the circumstance. We call this knowledge a model, which may be anywhere from purely mathematical-logical to completely probabilistic. If our model is purely mathematical-logical, the values the data will take are rigidly determined; there is no uncertainty. If our model is completely probabilistic, the values the data will take are unknown to a specified extent. Most models are somewhere in between. This is general and applies to models of electrons to electorates.
Call the model for your data (X) at hand M, where X = “The data takes a specified value x.” Probability is used to say things like this:
(1) Pr(X | M);
that is, given the model, this is the probability the data takes certain values. If a rival model is proposed, it is not guaranteed that Pr(X | M1) equals Pr(X | M2) for all the possible values X can take, but even if these probabilities do match it could be that M1 and M2 are not logically equivalent.
It is extremely important to understand that the choice of the model is subjective. That is, there may be external evidence about X (call it E; evidence which is not X) which dictates a form or partial form of M, but in practice people are free to choose whatever M they wish. This is because the E that (supposedly) gives credence to M is also subjectively chosen. That is, we usually reason Pr(M | E) = 1. Nevertheless, however M is decided, the probability statements (1) are fixed, true, and are not subjective.
Now, it is the case in formal statistics models are usually indexed by unobservable parameters. Values have to be supplied for these parameters before equations like (1) can be calculated. That is, for a fixed M, indexed by parameters, equation (1) takes different values for every different value of the parameters.
There are now two situations: (A) no new data will ever be taken, and (B) new data will be taken. By “new” I do not necessarily mean data that will arise in the future, though this is the most usual case; “new” is data that was not used before.
(A) No new data
If no new data will ever be taken, we again have two possibilities. We might want to know how the data arose, or we might have competing models that we want to assess in light of X.
Now it might make sense to ask how X arose. But it might not, either. After all, if no new data is coming, then everything we need to know about X is in X itself. If we want to know how many of the X are less than this number, or greater than that, all we need do is look. Was X increasing? Just look. Was it decreasing. Just look. This approach has been greatly underused.
It is often the case that we have to decide which of a set of competing models is most likely given X. For example, a jury trial. We have two competing models, M0 = “The guy didn’t do it” and M1 = “The guy did it”. We use X (the trial data, evidence, and arguments) to compute
(2) Pr( M1 | X ),
with the assumption that Pr( M1 | X ) + Pr( M0 | X ) = 1 (probabilities of models sum to one over any set of models). Notice that the set of M is chosen by us in advance, supplied by external evidence E (such as “There is a man in the dock who is on trial, and either he did the did or did not”).
The usual mistake here (in science, not courtrooms) is to assume that exact quantifications of (2) always exist. They surely sometimes do exist, but not always. Now, if there are an infinite number of Mi, then it is possible that (2) will equal 0 (the usual case). Thus, in order to make sense of the world, we need to impose finiteness and select from a limited number of explanations for any X.
Next: classical statistical procedure; clarifications, because I’m not entirely happy with the language.
Mariano Grinbank is a Judeo-Christian apologist who knows when to say he’s not sorry. See this video).
Lack of free will is a trope that is growing in importance in scientific circles. Men like Sam Harris travel to lectures and announce, “I have no choice but to tell you that you have no choice. If you don’t believe this, you’re foolish.” Grinbank identifies what he sees are flaws in arguments like this.
Skeptics of free will are welcome to submit rebuttals (if they choose to do so).
In March of 2012 AD Sam Harris will publish a new book titled, Free Will. He and Jerry Coyne have been stoking the fires of polemics in anticipation.
Sam Harris is known for his Atheist activism and is also a biased neuroscientist1. Jerry Coyne is also an Atheist activist and professor of ecology and evolution at the University of Chicago. These gentlemen represent the deleterious effects of turning Darwinism, and science in general, into world-views. Darwinism is supposed to be a theory about biology and science is a tool with which we explore the material realm. However, some turn these into world-views and thereby construct blinders.
The effect of these blinders is a restriction of thought: the opposite of freethinking. This is because anything that goes against, or is outside of the world-view parameters, is simply a priori ruled out. Thus, Sam Harris and Jerry Coyne take an Atheistic, materialistic, mechanistic, reductionist (by any other name) view of life, the universe and everything. On their collective views we and our brains can do none but blindly follow the dictates of the laws of thermodynamics.
Jerry Coyne refers to our brains as “meat computers” and in this case, you do not get a choice as to whether you get yours rare, well done, or anything in between. Now, there is a notable distinction between Coyne and Harris. They both claim that we do not have free will but Coyne claims that we do not make choices whilst Harris claims that we do.
Coyne emphasizes that even though we do not make free will choices, we are still morally culpable and judiciously accountable. He notes that we already make provision for personages who commit crimes whilst being categorized as temporarily insane or having some such mental incapacity (or, as per a Seinfeld episode, “differently advantaged”). Yet, the most interesting, and potentially troubling, conclusion is that while we, ourselves, cannot change our minds, as it where, outside influences can change them.
Jerry Coyne claims that we cannot “step outside of our brain’s structure and modify how it works” because “‘we’ are simply constructs of our brain” and that “We can’t impose a nebulous ‘will” on the inputs to our brain that can affect its output of decisions and actions.”
However, environment can accomplish it. The incarceration of criminals, for instance, “makes it less likely you’ll behave badly in the future.” So, we cannot change ourselves but environment, other people a.k.a. society and/or the government, can change us. How other meat computers can change our meat computers when we cannot change our own meat computer is something which does not compute.
How does incarceration make it “less likely you’ll behave badly in the future”? It would still come down to the individual as they would instigate or otherwise elicit some change in the meat computer that resides within its cranium—perhaps a touch of “Mrs. Dash” would do the trick.
Sam Harris concludes that our subconscious brain merely spits out “determined” data (determined, or predetermined, by the laws of thermodynamics) about which we then make choices via our conscious brain. But how is making conscious choices about unconscious data not free will?
We may experience an instinct to do this or that but we then chose the course of action. But what about instincts, in and of themselves, and/or reflexes? Well, some of these are learned such as when you burned yourself and seek to not do it again. Others appear to be more foisted upon us such recoiling from a hot object. Just where is the line between the rapid reaction of a reflex and a forced action? After all, you can keep your hand over a flame—if you so choose.
There seems to be a vast difference between a reaction such as a reflex, on the one hand, and purposing to ponder a course of action, on the other hand. We may sift through options but we have the experience of choosing between them. The Coyne/Harris retort of claiming that this perceived choice is a mere illusion is merely begging the question. As Sam Harris puts it when referring to the idea of lacking free will, “Most people find that idea intolerable, so powerful is our illusion that we really do make choices.”
This is tantamount to asking someone whether they have ever been abducted by aliens. If they respond that they have not, you then tell them that of course they have but the aliens erased their memory.
The concept of lacking free will is certainly not new nor is it exclusive to the Atheists who hold to it. However, Harris and Coyne are appealing to “science”; particularly, neuroscience. Not surprisingly, they conclude that what we think of as free will is brain stuff because we can see how segments of the brain light up in MRIs when we engage in what we think of as making decisions.
Well, neuroscience is a soft enough science so as to allow for malleable interpretations such as:
As William Briggs rightly noted whilst reviewing Sam Harris’ research “The Neural Correlates of Religious and Nonreligious Belief”:
Ignore religion and answer this: do the brains of the affronted and angry operate differently in those heightened states of emotion than in those who are placid, smug, or contented?
Could it not be that the “emotion centers” of the brain light up for Christians in this experiment not because they are Christians but because they have just been repeatedly poked by a sharp rhetorical stick?
The “sharp rhetorical stick” refers to the questions posed during Harris’ pseudo-experiment.
Now, of course, Harris’ and Coyne’s neuroscientific conclusions are tantamount to concluding that color and shape are merely brain stuff that does not exist out there in the real world because segments of our brains light up when we view color and shape.
What reason, really, is there to deny our common knowledge, our common experience and well, our common sense conclusion that we have free will? In this case, it is that some Atheists are interpreting lights flashing on a screen. Moreover, their interpretations are based upon materialism, mechanism, reductionism in short: based upon their particular, and peculiar, Atheistic world-views. But why should we believe that their world-view is accurate? After all, they claim that it cannot be proven and since they are making extraordinary claims they must provide evidence that is more extraordinary than expecting us to believe their personal interpretations of “data.”
———————————————————–
1He is referred to as biased because before becoming a neuroscientist he was asked “What do you believe is true even though you cannot prove it?” by Edge – The World Question Center and his response was:
What I believe, though cannot yet prove, is that belief is a content-independent process. Which is to say that beliefs about God—to the degree that they are really believed—are the same as beliefs about numbers, penguins, tofu, or anything else…
What I do believe, however, is that the neural processes that govern the final acceptance of a statement as ‘true’ rely on more fundamental, reward-related circuitry in our frontal lobes—probably the same regions that judge the pleasantness of tastes and odors…
Once the neurology of belief becomes clear, and it stands revealed as an all-purpose emotion arising in a wide variety of contexts (often without warrant), religious faith will be exposed for what it is: a humble species of terrestrial credulity. We will then have additional, scientific reasons to declare that mere feelings of conviction are not enough when it comes time to talk about the way the world is.
The only thing that guarantees that (sufficiently complex) beliefs actually represent the world, are chains of evidence and argument linking them to the world…Understanding belief at the level of the brain may hold the key to new insights into the nature of our minds, to new rules of discourse, and to new frontiers of human cooperation.
Thus, he comes into science already believing that which he seeks to prove, “What I believe, though cannot yet prove.”
Watch out Sam Harris, Gordon Hodson and Michael A. Busseri of Brock University are giving you competition for the worst use of statistics in an original paper.
Their “Bright Minds and Dark Attitudes: Lower Cognitive Ability Predicts Greater Prejudice Through Right-Wing Ideology and Low Intergroup Contact” published in Psychological Science1—headlined in the press as Low IQ & Conservative Beliefs Linked to Prejudice—is a textbook example of confused data, unrecognized bias, and ignorance of statistics.
Hodson and Busseri on are track to beat out Harris’s magnificent effort, and they might also triumph over the paper which “proved” brief exposure to the American flag turns one into a Republican and the peer-reviewed work “proving” exposure to 4th of July parade turns one into a Republican.
Let’s see how they did it.
The authors intimate that “individuals with lower cognitive abilities may gravitate toward more socially conservative right-wing ideologies that maintain the status quo and provide psychological stability and a sense of order”. They say that this “is consistent with findings that less intelligent children come to endorse more socially conservative ideologies as adults”.
How did they prove that idiots and conservatives are racists? They gathered two large data sets from the UK, one started in 1958 (NCDS), the other in 1970 (BCS); about 16,000 individuals in total, roughly equal numbers of males and females. The quizzed the groups when they reached 11 and 10 years old on their “intelligence”; they then came back to these individuals when they were 33 and 30 and asked them about their “socially conservative ideology and racism.”
The authors do not say how many people they used in their analysis; how many individuals were lost in the 20 years between surveys is not noted in their paper. My read of the NCDS website (pdf) makes the loss about 30%. That leaves about 11,000.
Intelligence was defined in one database as scoring well on matching the similarity between 40 pairs of words, and on matching the similarity of between 40 pairs of shapes and symbols. On the other database, this changed to drawing 28 missing shapes, recalling digits from 34 number series, identifying the definitions of 37 words, and “generating words that are semantically consistent with presented words” 42 times.
Thus the two samples measure similar but different abilities. The NCDS (pdf) also had available the Peabody Individual Achievement Test Math and Reading sub-scales which were not used as intelligence measures. Why?
When the kids became 33 and 30 year olds, they were asked whether they agreed with 13 or 16 questions like, “Schools should teach children to obey authority”, “Family life suffers if mum is working full-time.”
Another was, “People who break the law should be rehabilitated.” Just kidding! It’s actually, “People who break the law should be given stiffer sentences.” The bias in the question wording is ignored.
Another question was, “None of the political parties would do anything to benefit me.” Is agreeing or disagreeing with that a “conservative” position? What would the Occupy people say? Another, “Being single provides more time to experience life and find out about yourself.” Conservative or liberal?
According to the NCDS (pdf), there were about 50 questions, of which only 13 were used. A “conservative”, then, is whatever Hodson and Busseri say it is. The same thing goes for what a “racist” is.
For these questions “reliabilities ranged from .63 to .68.” This means the questions are imprecise and imperfect, so that if you use the raw results in subsequent analysis, you must “carry forward” the uncertainty in reliability. Did Hodson and Busseri do this? No.
One would have guessed from the title, that the authors looked at how the scores on the intelligence questions correlated with the scores on the attitude and racism questions, taking into account the uncertainty in the reliability. You would be wrong.
They first modeled the intelligence questions to create one “latent” (unobserved) measure, called “g”. The uncertainty in creating “g” is then ignored in all subsequent analysis. They did the same for the attitude questions, creating a “latent” (actually unobserved) variable called “conservative ideology.” Uncertainty in its creation is also ignored. Then the individuals’ education and socioeconomic status and separately their parent’s socioeconomic status (which again were the results of models) were put into a model with “g” and “conservative ideology” to predict “racism” (the uncertainty of which, as was already said, was ignored). The picture below summarizes their findings.
Lo, they found small p-values. The authors appear unaware that samples of this size are practically guaranteed to spit out small p-values.
What makes the study ludicrous, even ignoring the biases, manipulations, and qualifications just outlined, by the authors’ own admission the direct effect size for “g” on “racism” is only -0.01 for men and 0.02 for women. Utterly trivial; close enough to no effect to be no effect, their results statistically “significant” only because of the massive sample size.
The effect size for “conservative ideology” directly predicting “racism” is higher (0.69 and 0.51). But all that means is that the questions the authors picked for these two attitudes are roughly correlated with one another. In other words, “None of the political parties would do anything to benefit me” is crudely correlated with “I
wouldn’t mind working with people from other races” and so forth.
Yet the authors have the temerity to conclude, “These results from large, nationally representative data sets
provide converging evidence that lower g in childhood predicts greater prejudice in adulthood and, furthermore, that socially conservative ideology mediates much of this effect.”
Truly, statistics can “prove” anything.
—————————————————————————
1doi:10.1177/0956797611421206
Thanks to reader Jonathan Woolley who suggested this study.
Update I saw, on one website which linked to my criticism, a criticism of my criticism (get it?): “The subjects in the test were given a fifty question questionnaire and only 13 questions are used, and this jackass is complaining about that?” I am the “jackass.”
This articulate person (language warning on the link) says that social scientists mix in red herring questions with “real” ones so that interviewees can’t figure out what’s going on. This person also says that I was unaware of this. Not true. But even if I was, it would have been irrelevant.
The point I made was we do not know how the questions the authors did use—it doesn’t matter how many others were rejected and why these were chosen—were used to create “conservative” and “racist” indexes. I have given examples of two questions which are at least ambiguous; there are more. “Conservative” and “racist” are defined as how the authors see them, and not necessarily how civilians and other scientists would see them.
See also my comments below: the models fit by the authors result in very small effects. These effects mostly have small p-values, but as I said above, small p-values are practically guaranteed in large samples (> 1000). And remember, none of the uncertainty in creating the latent “g” and other indexes are carried forward in their models: if if was, the effect sizes would decrease further (and p-values would increase).
And for the real kicker, if we then “integrated out” the parameters (the βs) and tried to predict whether a person with a low “g” would be “racist”—the reason given for the study—the effects would be lower still, probably negligible. The “direct effect” was already trivial, the “total effect” barely marginal.
Incidentally, if you don’t know, “latent” means unobservable (and uncheckable). Social scientists love using these kinds of models—structural equation models, factor analysis, etc.—because they are so fertile. Sprinkle a little data on them and publishable p-values a plenty will sprout instantly.
Update This post of such importance, that it remains on top today. See below for more comments.
Presented for your satisfaction, a way to cheat either yourself or others using time series. The patter below is only a suggestion.
Presentation
Just look at these anomalies, which are related to rampant, deadly climate change. Higher anomalies are worse for all of mankind in every imaginable way.
The anomalies are presented as monthly measures, over roughly a 10-year time period. A regression was fit to them and is plotted. The per-decade increase in this not-good anomaly is 0.87 per decade. Why, after 20 years, the anomaly will be almost twice as large as it is now!
The 95% confidence interval for the decadal change is 0.44 to 1.3. That means that the anomalies are surely heading up!
But ignore these sorrowful facts, because there is good news to be had. Here are some more anomalies.
These anomalies are on the way down, thus our spirits should be on the rise. In fact, the anomalies will drop at 0.57 over the next 10 years. And after 20 years, they’ll be down more than one full point!
The 95% confidence interval for the decadal change is -1.1 to 0. That means that the anomalies are surely heading down!
How the trick works
The pictures are the same!
Even if you flash up both pictures, the audience members will never notice that they are seeing the same anomalies. Yes, it’s true. You’ll worry that somebody will catch on, but they won’t! I have seen this done many times and nobody ever notices that the pictures are identical—except, of course, for those colorful straight lines. And the starting date.
Now take a look at these anomalies, which are the same as above, and see if you can spot the difference.
Instead of one regression line, there are 24. The first one is drawn using the entire time series. The second one is drawn using the entire time series except for time point number 1. The third removes time points 1 and 2, and so on. There are 24 lines in total, showing anything from a large increase to a large decrease, and each drawn by choosing a new starting point.
Do you get it? This is the whole trick! Nobody ever asks why you chose a particular starting point. You can tell any story you like and people will never think to ask what would happen if you were to use a slightly different data set.
Of course, very clever magicians will manipulate both starting and end points, but it’s best not to meddle with the end points until you become a master. People will (or should) naturally ask why you haven’t included the most up-to-date data, but they will absolutely never ask why you only used some of the history and not all of it.
Statistics
The time series above was generated by the R armia.sim() function, using a mean 0, standard deviation 1, AR(0.64,0,0,0,0,0,0,0,0,0,0,0.35) process, which mimics many different real-world monthly time series. But try your own model. It works for models of any kind. And it’s fun!
Next thing is to show how reliable this trick is. The true answer—given our evidence E that the model is mean 0, etc.—is that the anomalies neither increase nor decrease over a decade. The slope of any regression line, in other words, should be 0. Or the confidence intervals of any line drawn should include 0. Of course the actual results will vary.
It’s your confidence intervals which are the real convincers in the trick. Did you notice that both confidence intervals (for the first two figures) confirm the hypothesis that things are getting better and things are getting worse? Isn’t that great!
To show the reliability of this, suppose your funding depends on things getting worse: you need the anomalies to increase. Therefore, you’ll pick a starting date which gives you the best evidence. Not every time series that is truly unchanging (as our E says it is) will cooperate such that you can definitely show an increase. But you can limit the damage against yourself by showing the smallest possible decrease.
I simulated 1000 different time series, each time picking the best starting point (to show the largest possible increase). Remember: if no cheating occurred, the mean of these samples should be 0. It isn’t. It’s much higher at 0.21—with a 95% confidence interval of -0.88 to 1.31.
Notice how much wider this (better) interval is. It’s better because it takes into account cheating.
What if you don’t want to cheat? Well, your interval will still be wider than if you just ran the regression on the data at hand. Except if the data at hand is all the data that will every occur (and if it is, there is no real to run a time series), the arbitrariness of the starting (and ending_ point must be accounted for. If it isn’t, then you will go away too confident of yourself.
The lesson is, of course, that straight lines should not be fit to time series.
Update More comments.
Question: why fit a straight (or any shaped) line to a time series like this? There are three reasons: (1) to discover whether there was a trend, (2) to predict the future, and (3) to use the analysis as part of a larger analysis.
(2) is a respectable goal, and should be encouraged. Most who fit lines to time series have this goal in mind, at least tacitly; that is, they at least imply that the line they have fitted will “continue” into the future. Therein lies a problem. For that line is an all-too-sure guess of what the future will be.
Notice that we stated specifics of the line in terms of the “trend”, i.e. the unobservable parameter of the model. The confidence interval was also for this parameter. It most certainly was not a confidence interval on the actual anomalies we expect to see.
If we use the confidence interval to supply a guess of the certainty in future values, we will be about 5 to 10 times too sure of ourselves. That is, the actual, real, should-be-used confidence interval should be the interval on the anomalies themselves, not the parameter.
In statistical parlance, we say that the parameter(s) should be “integrated out.” So when you see a line fit to time series, and words about the confidence interval, the results will be too certain. This is an inescapable fact.
(1) is also a goal, but a shady one. If we want to know if there has been a change from the start to the end dates, all we have to do is look! I’m tempted to add a dozen more exclamation points to that sentence, it is that important. We do not have to model what we can see. No statistical test is needed to say whether the data has changed. We can just look.
I have to stop, lest I become exasperated. We statisticians have pointed out this fact until we have all, one by one, turned blue in the face and passed out, the next statistician in line taking the place of his fallen comrade.
It is true that you can look at the data and ponder a “null hypothesis” of “no change” and then fit a model to kill off this straw man. But why? If the model you fit is any good, it will be able to skillfully predict new data (see point (1)). And if it’s a bad model, why clutter up the picture with spurious, misleading lines?
Why should you trust any statistical model (by “any” I mean “any”) unless it can skillfully predict new data?
Again, if you want to claim that the data has gone up, down, did a swirl, or any other damn thing, just look at it!
(3) If you fit a line and then use the parameter estimates of that line as input into other analysis (as was done in our sample paper, referenced below), your results will be too certain. We all know the dangers of smoothing time series. If you’ve forgotten, I, II, III.
———————————————————————————
This post was inspired by an actual paper—where I do not accuse the authors of cheating; but they do use time series with different starting and ending dates and then combine those time series to make a conclusion. We can see now that they will be too sure of themselves.
Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.
The paper is “Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty” by Norman Loeb and others in the journal Nature Geoscience. I’m pressed for time, so for background on this paper, surf to Roger Pielke Senior’s place.
From the abstract:
We combine satellite data with ocean measurements to depths of 1,800 m, and show that between January 2001 and December 2010, Earth has been steadily accumulating energy at a rate of 0.50 +/- 0.43 Wm-2 (uncertainties at the 90% confidence level). We conclude that energy storage is continuing to increase in the sub-surface ocean.
Most curiously, the authors choose the “90% confidence interval” instead of the usual 95%. Why? Skip the discussion of the meaninglessness of confidence intervals and interpret this interval in its Bayesian sense. Then this means that the coefficient of the regression associated with time is estimated at 0.5 W-2 with a 90% chance of being anywhere in the interval 0.07 to 0.93 Wm-2.
This is an unobservable coefficient in a model, mind. It is not an amount of “energy.” To get to the actual energy, we’d have to integrate out the uncertainty we have in the coefficients.
Anyway, the change from the usual certainty level also means—I’m estimating here—that the coefficient of the regression associated with time is estimated at 0.5 W-2 with a 95% chance of being anywhere in the interval -(some-number) to one-point-something Wm-2. In other words, the intervals have to be widened, and probably such that the lower portion of the interval is negative: it is almost certainly near 0. Like I say, I’m guessing, but with enough gusto to be willing to bet on this. Any takers?
We have to know about the regression. Details? The authors put details in tiny print and in a supplement. Here’s the small print (bolding mine):
Global annual mean net TOA fluxes for each calendar year from 2001 through 2010 are computed from CERES monthly regional mean values. In CERES_EBAF – TOA_Ed2.6r, the global annual mean values are adjusted such that the July 2005–June 2010 mean net TOA flux is 0.58 +/- 0.38 Wm-2 (uncertainties at the 90% confidence level). The uptake of heat by the Earth for this period is estimated from the sum of: (1) 0.47 +/- 0.38 Wm-2 from the slope of weighted linear least square fit to OHCA to a depth of 1,800 m analysed following ref. 26; (2) 0.07 +/- 0.05 Wm-2 from ocean heat storage at depths below 2,000 m using data from 1981 to 2010 (ref. 22), and (3) 0.04 +/- 0.02 Wm-2 from ice warming and melt, and atmospheric and lithospheric warming1,27 . After applying this adjustment, Earth’s energy imbalance for the period from January 2001 to December 2010 is 0.50 +/- 0.43 Wm-2 . The +/-0.43 Wm-2 uncertainty is determined by adding in quadrature each of the uncertainties listed above and a +/-0.2 Wm-2 contribution corresponding to the standard error (at the 90% confidence level) in the mean CERES net TOA flux for January 2001–December 2010. The one standard deviation uncertainty in CERES net TOA flux for individual years (Fig. 3) is 0.31 Wm-2 , determined by adding in quadrature the mean net TOA flux uncertainty and a random component from the root-mean-square difference between CERES Terra and CERES Aqua global annual mean net TOA flux values.
The same 90% intervals are used, notice. The weights mentioned are hidden in another paper (ref. 26; I didn’t track this down). There is no word on whether the authors (or the others they cite) recognized the correlation in time and thus realize that the estimates of the coefficients, especially their confidence limits, will be suboptimal (too certain). In other words, a straight line regression is not the best model—but it is a model (no probability leakage, anyway! under the evidence that these indexes have no natural boundary). The final uncertainty is estimated by “determined by adding in quadrature” some other numbers.
What a complex procedure! The supplementary paper is little help in reproducing the exact steps taken. That is, it is doubtful that anybody could read this paper and use it as a recipe to reproduce the results (joyfully, the authors do make the data available).
But from a scan of the procedure, and given my comments thus far, it would appear the interval is too narrow. Adding all those different sources together and properly taking into account the uncertainty in each individual procedure is enough to boost the overall uncertainty by an appreciable amount. How much is “appreciable” is unknown. The amount one would have to add to the overall uncertainty is greater than 0. This implies that the final estimate of the coefficient of the regression associated with time should be about 0.5 W-2 with a 95% chance of being anywhere in the interval minus-something to just-over-one Wm-2. Consistent with uncertainty indeed.
There is still another source of uncertainty not noticed by Loeb, or indeed by nearly all authors who use time-series regression: the arbitrariness of the starting and ending points. I am sure Loeb did not purposely do this, but it is possible to shift the start or stop point in a time-series regression to get any result you want. For example, in their main paper Loeb et al. show plots from 2001 until 2010. But in the supplement, the data is from mid-2002 through all of 2010. Changing dates like this can booger you up. I’ll prove this in another post.
——————————————————————————-
Thanks to reader Dan Hughes for helping me find the papers.
Update Be sure to see this post on how to cheat with time series.
Today, a technical interregnum, a necessary pause for proof of that claim that each of us must come equipped with knowledge that cannot be learned. Stuff that is only known to be true only through introspection, via what we call intuition or, sometimes, faith; philosophers usually settle on the technical term a priori (or on phrases more technical still).
Here is one (of many) proofs given by David Stove in his The Rationality of Induction1 He made this argument in the support of a priori knowledge in his larger work showing induction is reasonable2. All you need know about Bolzano (named below), is that he disputed the idea that we all of us come with built-in knowledge. The formula numbers are as they appeared in the book.
Reading this passage, as with reading any proof, requires some sophistication. This cannot be avoided. But if you are comfortable with the idea of built-in knowledge, then you can skip this and start after the quote. Careful readers will recognize that Stove’s simple argument is also a proof that empiricism—the belief that all knowledge comes from observation—is false.
First, as to our knowledge of validity. Bolzano says that the validity of barbara, or rather, that the barbara schema always preserves truth, is a hypothesis reasonably believed by us, just because of the extensive experience we have had of never finding a counter-example to it. That is, our grounds for believing (149), or rather, for believing
(166) For all x, all F, all G, either ‘x is F and all F are G is false’, or ‘x is G‘ is true,
consist just of observations we have made, such as
(151) Abe is black and Abe is a person now in this room and all persons now in this room are black.
That is putting it starkly; still it is, in essence, what Bolzano believes. We learn deductive logic by inductive inference.
But now, this is tacitly to concede, to certain propositions of non-deductive logic, precisely the intuitive status which Bolzano expressly denies to any proposition of deductive logic. Our putative logic learner is supposed to be devoid of all intuitive logical knowledge. Yet Bolzano is evidently crediting him with knowing, straight off, at least this much: that
(167): (151) confirms (149).
Of course, he need not be supposed to know that he knows (167); still, he is evidently being supposed to know it. But to know (167) is to have some logical knowledge, even is only non-deductive logical knowledge.
And Bolzano must suppose that (167) is known by our logic learner intuitively. Otherwise he would have to have learnt it, as he is supposed to be learning (166), by experience. And how would he accomplish this?
It must at any rate be from some observation-statements. I do not know what kind of observation-statements Bolzano would regard as confirming (167): let us just call these observation-statements
(167) O1.
But even if our logic learner has found by experience that O1 he will be no further advanced. To learn (167), he needs to know, not only that O1, but that
(169): (168) confirms (167).
But this is a proposition of logic too. If he does not know (169) intuitively, as by hypothesis he does not, then he will have to learn it, too, from experience. No doubt from some observations
(170) O2.
But that is not enough. He will also need to know that
(171): (170) confirms (169);
and so on.
Obviously, he is never going to make it. Experience is not enough.
Especially careful readers—especially those convinced by this proof, as I am—will recognize that in order to interpret this proof, to assimilate it and follow it, requires precisely the kind of built-in knowledge of which the proof speaks. We must have a priori knowledge.
We are finally ready to tackle the notion that some propositions are “just true.” Propositions of this sort are usually the kinds of truths spoken of above, but there are also moral or ethical truths, too (of these, another day). The claim that there exist “universal truths”—propositions which say are just plain true—is consistent with claim that all truth is conditional, because whatever a priori knowledge we have is conditional on our intuition, or is taken “on faith.” To speak of these truths (in a technical sense) we must first affix the condition, “Given my faith or intuition, this proposition is just true.”
This seems to open the way to relativism because, as direct experience tells us, different people will claim a certain proposition true or false just because their intuitions or faith direct them oppositely. Many times, of course, these differences are mistakes in reasoning, or there are other pieces of evidence that are assumed (as in French speaking chickens) that the speaker is not aware of or does not acknowledge. Skip these cases and focus on just those claims where nothing is assumed except intuition or faith.
The claim is that we must have some shared beliefs about what is true, and it is these beliefs which we speak of when we say “there are truths.” One is that “Other minds exist.” Or, more properly, “Given our intuitions or faith, other minds exist.” The only possible escape from believing this shared truth is solipsism, which is “Given my intuition, only I exist.” Deep, or metaphysical solipsists stop here. But other solipsists (there are varieties of the breed) might allow that “Given my intuition, it is true I exist; but I believe it is logically possible that others might exist.”
But to acknowledge “logical possibility” is to acknowledge at least the truth that logical propositions are true. So that if those other minds exist, they must have this knowledge, too because that is what a mind is. And then if one really is a metaphysical solipsist (good luck finding one) it is still true that “all” share beliefs about what is true—it just the case that “all” is one person. (How many solipsists will jump in and tell me, “Briggs, you don’t exist! Stop claiming you do!”)
In practice, of course, we all really do admit that truth (given our intuitions) that others minds exist. This, then, if just one of many truths that exist. The logical knowledge spoken of by Stove are others. Our task is now clear: since truths, via shared faith or intuition exist, we must identify what they are; we must also identify falsehoods, and that which is only probable. More on this later.
————————————————————————————-
1p. 162-163. This book, especially the second half, is a treasure that all statisticians, probabilists, and logicians should read.
2Yes, some people think it isn’t. Bolazno was not one of these: he thought all (as in all) knowledge was known empirically.
We often of a proposition use the term self evident or a similar variant. But we often use it incorrectly. We say for instance, “It is obvious that Mr Obama is a bad president,” when what we really mean is, “Given a certain collection of evidence which I am holding and which I assume you also hold, we infer that Mr Obama is a bad president.”
Error creeps in when the inference doesn’t lead to a certain conclusion, and it is instead only probable, or when the two parties do not agree on the same set of probative evidence. Hence arises politics, with which I assume the reader is familiar with many examples.
Sometimes we use the phrase correctly, as in “We hold these truths to be self evident, that all men are created equal (where men means all human beings)…” Or in the proposition, “For all natural numbers x and y, if x = y, then y = x.” Or in, “Obviously I exist.” Or in a host of other propositions which we say as true.
But notice the difference in these two phrases: “It is true I exist” and “It is self evident that I exist.” It is strictly a mistake to use the first phrase, but fine to use the second. The difference is subtle here and the mistake passes unnoticed, but only because of the example: you will agree that it is true that you exist, so the two phrases might seem equal, but they are not.
The first fails because it is like Alice coming up to you and announcing, “It is true that some chickens are creatures understanding French.” It most certainly is not true—unless we first accept the conditions Alice heard but we did not, but which would have allowed us to deduce this proposition as true.
The reason “It is self evident that I exist” works is because it carries with it the evidence we need to judge the truth of the proposition “I exist.” The evidence, which all arguments need, comes from saying “It is self evident”, which is just a translation of “Given my intuition” or “Given my most fundamental thoughts.”
Thus the second phrase is equivalent to “Given my intuition, it is true that I exist.” This allows us to recast our mathematical axiom “Given my intuition, it is true that for all natural numbers x and y, if x = y, then y = x.”
Now, we often as a harmless shorthand skip the qualifier and just say that “It is true that for all natural numbers x and y, if x = y, then y = x.” We get away without the qualifier in cases like this because the truth is so obvious and non-controversial. And it would be tedious to bring along qualifiers for everything item which we assert is true. For example, “It is true the car is in the garage.” Well, to be perfectly clear, you must say at least, “Given my observation and assuming my senses have not failed me and nothing has left the garage since last I’ve seen it, it is true the car is in the garage.” What a chore! Much easier to state, “The car is in the garage.”
No harm is done in the vast majority of these cases because the evidence which is required to make these propositions true is in fact shared by speaker and listener. But, as in the cases of politics, ethics, and metaphysics, it can be a positive menace to fail to specify the conditioning evidence.
Hold on a minute, did we make a slip? Yes, and a big one. Can you spot it?
Re-examine the “full” garage example. To make the proposition “The car is in the garage” true we had to carry a lot of baggage. Perhaps our slip arises because we did not fully specify the exact conditions which make the proposition true? If so, we can fix it up; after all, we use statements like these all the time without running into difficulties. This isn’t our mistake, so let’s just assume we do have the right evidence.
Our slip is something deeper, more fundamental. Ready? We had to know that, “given the evidence, the proposition is deduced.” We had to have built-in knowledge that lets us take statements of evidence and tie them to propositions. In other words, the operation of going from the evidence to the conclusion is a logical step, the validity of which we must take for granted. We don’t just need the evidence and conclusion, we need the logical glue that binds them.
This “glue” is also in our intuition. Alice needed it, and so do we, for every inference we make.
Next time: a proof of this last claim. It will also be the case that relativism is false.
Statistics show—and as we know by now, these cannot be wrong—that Saturday is the least likely day that people like to ponder statistics. Traffic on this blog is at its lowest ebb on this day. (Wednesday is the most likely.)
Still, a millennium of poor souls stop by to say hi. For this most loyal group, I offer what we used to call the “Polish Cheer” (our town had a high proportion of people, like your author in fractional part, of Polish descent):
Ooh-sa-sa-sa
Ooh-sa-sa-sa
Hit ‘em in the head
With a big kielbasa
M
I
L
K
Milk ‘em!
Mooooooooo.
This was sung by members of the high school pep band when the instruments went silent. I was in this band. (“What? This math-computer-logic geek was in the band? Who would have guessed?”)
You might laugh at this seemingly fangless cheer, but I put it to you: how would you like to be milked? Our cry surely struck fear into the hearts of our rivals.
I’d give the stats on wins, and thus the effectiveness of our fight song, but I was so busy blowing my bass sax and giggling at witty ditties like these, that I can’t remember whether our team won any games.
What I can recall with clarity is that we had to wear uniforms, just as the sporting fellows did. This separated us and announced our purpose. Uniforms were merely an extension of the school’s policy on dress. No jeans, shirts had to have collars, etc. Those in violation were sent home, parents were informed. As a consequence, the hallways, classrooms, and bleachers were pleasant, cheerful places to be.
The Wall Street Journal reports that the latest (de)trend in fashion is for teens to wear pajamas everywhere.
Greedy businesses, like Abercrombie & Fitch, giving no thought to their responsibilities, are encouraging this behavior by manufacturing sloppy-on-purpose clothes (for sale at high price). “A wide neck is key, says Jennifer Foyle, chief merchandising officer, because ‘girls are wanting to show their bra straps.’” Charming, charming. How nice to see what fathers have become. Yes, fathers.
The paper continues:
As with a lot of teen behavior, some adults are annoyed. In Louisiana’s Caddo Parish, which encompasses Shreveport, Commissioner Michael Williams is getting national attention for taking a stand. He plans to propose an ordinance outlawing the wearing of pajamas in public.
“The moral fiber in America is dwindling away,” Mr. Williams says. “It’s pajamas today; what is it going to be tomorrow? Walking around in your underwear?”
“Stay off my lawn! What an old man. Parents have been saying this kind of thing about their kids forever. Even the Sumerians complained about teens slacking off.” (I have a quote from an ancient Sumerian text that I can’t lay my hands on at the moment, which proves that, yes, this ancient society did in fact complain about their offspring. Once I find it, I’ll revisit this topic.)
To which I answer: where are these Sumerians? Where are the Romans? O tempora o mores! Where are all the other civilizations who complained about encroaching decadence? They are no more, that’s where. They each and every one of them failed to heed their prophets. They succumbed.
It is no counter-argument to say that because your parents complained, just as their parents before them complained, that the state of dress1 is therefore not declining. It could be, and obviously is, growing worse with every generation.
There are some areas of this fine country where all that is worn is shorts, a sloppy t-shirt, and some sort of plastic, funky foot gear. Men augment this ensemble with baseball caps. Women, depending on state of their journey toward senescence, either cinch the t-shirt tight or let it billow like a tent. The only reason this style has not triumphed completely is that winter forces extra layers.
Meaning jeans. Here is the truth of jeans. After eighteen, you look ugly in them. You might assume you cut a certain je nais se quois. You do not. You look sloppy. Even the most expensive jeans—which are designed to look as close to non-jeans as jeans can be, so why bother—-do not look nearly as good on you as you think they do.
People have forgotten one purpose of dress: to please others. You are not wearing clothing solely for yourself. You are doing it for your neighbor. Be kind to your neighbor and wear something nice.
————————————————————————
1Or music, architecture, art, literature, movies, public discourse, etc., etc., etc.
“All cats are creatures understanding French,” said Alice’s father. “And some chickens are cats.”
“Wait, I know!” said Alice, chirruping. “That means that some chickens are creatures understanding French.”1
“What you said is true, my dear,” said Alice’s father, his voice full of pride.
What Alice said was true. As true as any another truth, too. True as true can be. But it would still be a mistake for Alice, even if she ventured through the looking glass, to announce triumphantly that “It is true that some chickens are creatures understanding French!” That would be to say what is false, or rather it would be to say a nonsensical thing.
Which was Alice’s father’s specialty. Nonsense of a special sort, that is. For if you haven’t guessed, Alice’s father is Charles Dodgson, a.k.a. Lewis Carroll. Dodgson published several—
—Wait! Hold on. Skip the biography. Didn’t I just say that what Alice first said was true? How can it be that her second phrase, identical to the first, is not true?
Well, her second phrase was not identical, was it? The first time Alice spoke she said, “That means…” and it is that “that” that makes all the difference. The second time she skipped this all-important phrase. One simple word separated truth from falsity. Let’s see why.
Dodgson’s example came from his Symbolic Logic (p. 57). He said that his propositions were
so related that, if the first two were true, the third would be true. (The first two are, as it happens, not strictly true in our planet. But there is nothing to hinder them from being true in some other planet, say Mars or Jupiter—in which case the third would also be true in that planet, and its inhabitants would probably engage chickens as nursery-governesses. They would thus secure a singular contingent privilege, unknown in England, namely, that they would be able, at any time when provisions ran short, to utilise the nursery-governess for the nursery-dinner!)
This distinction is crucial, so I will repeat it. What Alice said the first time was true but only because she accepted the first two statements, the things her father said. She brought those first two phrases along with her when she said, “That means…” She left them out in the second instance, where her audience could not be expected to know that all cats, etc.
The first statement was true because of the provisos she accepted. The second statement was nonsensical, because it was not anchored, it was left floating. The audience could not say why “some chickens are creatures understanding French.” without some kind of evidence.
Those in the audience were free to supply their own evidence, of course. One person might have said to himself, “I know of no chickens who can understand French, but I’ll allow the possibility.” Given that, this person would not say Alice’s statement was exactly true, but he would also not claim that it was exactly false. A second person might have said, “Chickens don’t have lips, which are needed to speak French,” and, given that, he would say Alice was speaking a falsehood.
We have learnt two things from this example that we should never forget. We can’t speak of truth or falsity without reference to evidence, and logic is not the study of propositions but the study of connections between propositions.
A careful reader will have paused over this last sentence and say to himself, “If we can’t speak of truth or falsity without reference to evidence, does that apply to the claim that ‘we can’t speak of truth or falsity without reference to evidence’?” I am, after all, claiming it is true that “we can’t speak of truth or falsity without reference to evidence.” What evidence do I offer?
Well, in order not to go too far afield and not burden us in technicalities, I will let you yourself supply that evidence as homework. Can you think of any claim of truth (or falsity) which does not refer to evidence? If you can, then you have refuted my claim. If you cannot, my claim is not necessarily proved, of course, because just because you can’t think of a counter example doesn’t mean a counter example doesn’t exist. Nevertheless, I do make the claim.
———————————————————————————
1I was reminded of this example by reader Scott Bury.
The EPA came to Idaho and said in a booming voice Stop! to Mike and Chantell Sackett, who were building a home on a plot zoned for residences. The EPA shouted because it had determined that a small portion of the Sackett’s half-acre was—are you ready?—a wetland. And therefore sacred.
Not only did the Sackett’s have to cease building, if they didn’t return the property to the EPA’s vision of purity, they would be fined $37,500 per day. This was stated in the EPA’s compliance order.
And, oh yes, they would be fined an additional $37,500 per day for violating the Clean Water Act, an Act which gave EPA power to issue fines for both violating the Act and for violating compliance orders written under the authority of the Act. Got it?
For our less mathematically gifted readers, that’s seventy-five big ones. A day. How many days? Ah, here is where the story gains interest. Forever: that’s how many days.
Evidently, some slick, unelected, unaccountable, uncharitable and foolish bureaucrat thought that this double-dosing of fines would be a good solution to eliminate our deficit. But the joke’s on them, because even if the EPA gets away with cheating the Sacketts out of their money, it would still take over five-hundred-thousand years of daily fines to pay off the deficit—made large in part by paying the salaries of the windy minds who run the EPA.
The story grows in hilarity when you learn that the Sacketts have no recourse. No one to turn to. They cannot ask their mayor, they cannot appeal to Congress. The police won’t help them. They may not even ask a judge for relief, because the EPA has decided that its compliance orders are subject to judicial review only when the EPA says they are.
They can’t even ask the EPA! That is, they can and did ask, but the EPA did not deign to answer. And it is not required to. Pay the fine, sucker.
Well, this was too much for the Sacketts, who decided to sue anyway. Not just about themselves, but about the EPA’s ability to Lord it over all people. The Sacketts found themselves at the Ninth Circuit court. Which decided that, since the EPA would scarcely make a mistake because they are a branch of the government, told the Sacketts to go packing. The Ninth Circuit, if you haven’t guessed from the evidence, is based in California.
The Sacketts would not settle and pressed their case even unto the Supreme Court. Which heard oral arguments on Monday, 16 January 2012.
Said Malcolm Stewart, attorney for the EPA, regarding the greedy double-fine:
The compliance order is intended to specify the violation that EPA believes to have occurred and the measures that EPA believes are necessary in order to achieve prospective compliance. And the statute does provide separately for penalties for violating the statute and penalties for violating the compliance order.
As an exercise of our duty of candor to the Court, we acknowledged in our brief that the government reads the statute to allow the legal possibility of double penalties, that is up to $37,500 per day for violating the statute, up to 37,500 per day for violating the compliance order. I think that’s really a theoretical rather than a practical -–
He was interrupted by Justice Breyer who had to point out that distinctions of these kinds were irrelevant.
Following so far? Because it’s about to turn strange.
It turns out (how is another question) that the Army Corps of Engineers could travel to the Sackett’s would-be homestead and, if the Corps decides that their land is a wetland, could grant the Sacketts a permit to fill in the wetness so they can build. But if the Corps says that their wee chunk of land was not a wetland, they would not issue a permit. Even if they got the permit, the EPA might not honor it and still fine the Sacketts.
The EPA can issue compliance orders whenever it likes and does not need probable cause. Further, the Sacketts—or you, dear reader—always stand in danger of the EPA swooping down even if your house is already built. There is no statute of limitations under the EPA’s theory of “continuing violation.” If the EPA says it’s a wetland, by golly, it’s a wetland, or was, and pay or please the EPA you must.
The Sackett’s lawyer is only asking for the Supreme Court to allow the EPA’s actions to be subjected to judicial review just as all other actions by the government are. Such a meager request, a pittance! Yet the EPA is fighting hard so that it may remain arbitrary and aloof.
Even if this is granted Justice Scalia made the valid point that “the factual questions that go to whether these are wetlands or not are going to be decided giving substantial deference to the agency’s determination of the facts.”
How can two small people of limited funds battle an array of bureaucrats who have the full majesty and purse of the government behind them? Answer: they cannot.
More to come. This post inspired by HotAir.