a list compiled by Alex Kasman (College of Charleston)
|A statistician sells her soul to the devil in exchange for guaranteed tenure, but redeems herself by creating a cleverly useless confidence interval.
I like the part about the realization during her Ph.D. thesis defense that there were some things she forgot to check. Indeed, that is the way a thesis defense works and feels. The similar scene of her receiving tenure, however, seemed odd to me as I do not think there are any universities where tenure is assigned in that way. (Correct me if I'm mistaken. Indeed, at a Ph.D. defense the committee that has just finished asking you questions comes out into the hall after a brief private meeting and tells you their decision. But, the tenure process is nothing like that. At every university I know tenure is achieved by submitting a huge "packet" of documents which has to go through several different committees and levels of administration before you are notified -- usually by letter -- about the outcome months later.)
The concept of a confidence interval is apparently very difficult. I've never understood why because it seems quite simple to me, but many of my students and even authors of research papers using them seem to misinterpret what they mean. The way I would describe it is as follows: The significance of a confidence interval (a, b) with confidence level C is that some population parameter, like the average of some quantity over all members of the population, has a probability of C of being between the numbers a and b. (See note * below.)
I'm sure the author (and the character) understand what a confidence interval actually is, but here the definition is intentionally twisted, to humorous effect. The alternative definition adopted in this fantasy to save the protagonist's soul is certainly closer than many of the misconceptions I've encountered before. However, I'm not sure the world needed another way to misunderstand this fundamental tool of inferential statistics. Plus, I'm afraid the whole scenario in this story struck me as being far too contrived.
Personally, I preferred Dawson's similar but more recent story "Ladies' Night" in which another female statistician (almost) outsmarts a con artist. But, I'm sure that's just a matter of taste.
This short story was published on LabLit.com in two parts in August 2011. I am grateful to Larry Lesser for bringing Dawson's fiction to my attention.
* I received an e-mail from emeritus professor Furman Smith complaining that my description of the significance of a confidence interval above is not correct. After a few e-mails were exchanged, we both eventually agreed that the way I put it above is both understandable to most people, including non-experts and accurate...as long as it is interpreted appropriately. (However, he added that the term “credible interval” should be used in place of "confidence interval" when the unknown parameter is treated as a random variable.) But, interpreting it correctly involves some subtle points that are usually ignored, as I will attempt to explain below.
A population parameter -- like the average length of all adult, female rattlesnakes -- is a fixed number, but usually one that we cannot know exactly. In those circumstances we often construct a confidence interval (a,b) to give us an idea of where it is. Once the interval has been constructed, then the endpoints a and b are also fixed numbers. It may therefore seem strange for me to say that there is a probability of .95 that the parameter is between a and b. Since those are all fixed numbers, it either is or isn't between them! That, in essence, is what Furman Smith claims is wrong with my description above.
One way to handle this is to adopt a different way of thinking about probability all together. The Bayesian interpretation of probability is one that many professional statisticians and mathematicians prefer for reasons that go well beyond the difficulty in explaining what a confidence interval is. It involves emphasizing the concept of a prior probability and allows one to interpret even a fixed number as a random variable whose probability distribution may only reflect one individual's lack of complete knowledge and hence can even vary from person to person. I have nothing against this approach to probability and statistics, though I am also not convinced it is really objectively better than the alternative (frequentist) interpretation.
Another way to address the problem of interpreting a confidence is to state it as I have above and to interpret it correctly in the frequentist approach. What does it mean to say that "the probability is C that the population parameter is between a and b"? The frequentist approach requires us to imagine repeating something over and over. One might mistakenly think that what we mean is to repeatedly check whether that population parameter is in that interval (a,b), but that doesn't make sense since it either is or isn't and there is no point in checking that repeatedly. Rather, for it to make sense you have to imagine going back to the start of the study that constructed that confidence interval, generate new sample data, and use that to make a new confidence interval (a',b'). If you did that over and over, the population parameter would not be in the interval every time, but the fraction of times that it was in the interval could be made arbitrarily close to C if you repeated that process enough times.
Or, to put it in very practical terms, if you could check all of the 95% confidence intervals that were ever properly constructed using random samples, then you would find that in about 5% of them, the population parameter was not in the stated confidence interval. (For instance, a paper might have concluded that the average length of adult female rattlesnakes is between 16.2 inches and 22.7 inches, but that might not actually not be true.) This does not represent a mistake on the part of the people who made the confidence intervals or a flaw in the foundations of statistics. That's just how confidence intervals work and anyone who wants to use them should understand it.
Finally, a third way to address it (which was mentioned by Professor Smith in his e-mail, though it was clear that he preferred the Bayesian approach), is to say that the procedure used to make the confidence interval had a probability of C of producing an interval containing the parameter when it produced the interval (a,b). By putting it in the past tense, it addresses the point that once the interval has been created the fixed parameter either is or is not inside it.
Wow, it sure takes a lot of words to try to explain this! But, I still think "The significance of a confidence interval (a, b) with confidence level C is that the population parameter has a probability of C of being between the numbers a and b" is a reasonable way to summarize it and that most people will get a basic understanding of it from that statement.
|More information about this work can be found at www.lablit.com.|
|(Note: This is just one work of mathematical fiction from the list. To see the entire list or to see more works of mathematical fiction, return to the Homepage.)|
(Maintained by Alex Kasman, College of Charleston)