
Continuing on in my How To Lie With Statistics series, let’s talk about “The Little Figures That Are Not There.”
We’ll subdivided this part into two sections: sample size and range vs. average.
Sample Size
Very often we see claims about success rates for all sorts of things—“80% of doctors agree that this medication will work better than this other one”—“90% of all consumers prefer detergent X to detergent Y”—and so on. These claims are often used to promote a product or an idea, but can often be very misleading. If 5 doctors are sampled in the first example, is it really fair to say that 80% of doctors agree about anything? The point is that an average is important, but a sample size is equally important.
Let’s take a recent example. I was listening to
The Naked Scientist podcast the other day and heard an interesting story about
Megan Sykes, a doctor and researcher at Harvard who found a new way to prevent rejection of organ transplants. She found that 80% of the patients she tried the procedure with were able to stop using immunosuppressive drugs much sooner than normal and rejection rates were amazingly low (20% actually). This is great news!
Too bad she only tried this with 5 patients. If we look up the
paper that this was published in, we find that her sample size was a whopping 5 patients. Now don’t get me wrong, I think her research is incredible and could lead to wonderful things, but I do think it’s a bit early to be praising this new technique. A larger sample size would be required for that.
This also brings up the idea of statistical significance. Had she reported the appropriate statistical test (
chi-square in this case) we would have learned that the
p-value was just .18, which indicates that, just by chance, she was 18% likely to get this result. For someone looking to get a transplant, that’s a pretty large number. Generally, a p-value less than .05 is considered “good” (this is a debate for another post). The point here is that we should all be wary of statistics that don’t include a sample size!
Range vs. Average
According to the
2000 US Census, the average US household size is 2.59 persons. Last I checked, I don’t know what a .59 person looks likes. Obviously the Census is reporting an average, but wouldn’t a range be much more useful?
If I’m a housing developer and I see this figure, I start building homes for 2.59 people. What I miss out on are all the single person households (about 26% in the US) and the multi-family households (about 4%). This is similar to the issue of a
distribution I talked about in the previous post. Again, the point here is that while averages are quick and simple, ranges are far more informative.
Part 1 - The Sample with the Built-in Bias
Part 2- The Well-Chosen Averge
Part 4 - TMuch Ado About Practically Nothing
Share and Enjoy:
These icons link to social bookmarking sites where readers can share and discover new web pages.
Comments