How To Lie With Statistics - Part 3

How to Lie With Statistics

Continuing on in my How To Lie With Statistics series, let’s talk about “The Little Figures That Are Not There.”

We’ll subdivided this part into two sections: sample size and range vs. average.

Sample Size

Very often we see claims about success rates for all sorts of things—“80% of doctors agree that this medication will work better than this other one”—“90% of all consumers prefer detergent X to detergent Y”—and so on. These claims are often used to promote a product or an idea, but can often be very misleading. If 5 doctors are sampled in the first example, is it really fair to say that 80% of doctors agree about anything? The point is that an average is important, but a sample size is equally important.

Let’s take a recent example. I was listening to The Naked Scientist podcast the other day and heard an interesting story about Megan Sykes, a doctor and researcher at Harvard who found a new way to prevent rejection of organ transplants. She found that 80% of the patients she tried the procedure with were able to stop using immunosuppressive drugs much sooner than normal and rejection rates were amazingly low (20% actually). This is great news!

Too bad she only tried this with 5 patients. If we look up the paper that this was published in, we find that her sample size was a whopping 5 patients. Now don’t get me wrong, I think her research is incredible and could lead to wonderful things, but I do think it’s a bit early to be praising this new technique. A larger sample size would be required for that.

This also brings up the idea of statistical significance. Had she reported the appropriate statistical test ( chi-square in this case) we would have learned that the p-value was just .18, which indicates that, just by chance, she was 18% likely to get this result. For someone looking to get a transplant, that’s a pretty large number. Generally, a p-value less than .05 is considered “good” (this is a debate for another post). The point here is that we should all be wary of statistics that don’t include a sample size!

Range vs. Average

According to the 2000 US Census, the average US household size is 2.59 persons. Last I checked, I don’t know what a .59 person looks likes. Obviously the Census is reporting an average, but wouldn’t a range be much more useful?

If I’m a housing developer and I see this figure, I start building homes for 2.59 people. What I miss out on are all the single person households (about 26% in the US) and the multi-family households (about 4%). This is similar to the issue of a distribution I talked about in the previous post. Again, the point here is that while averages are quick and simple, ranges are far more informative.

Part 1 - The Sample with the Built-in Bias

Part 2- The Well-Chosen Averge
Part 4 - TMuch Ado About Practically Nothing


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb

One Response to “How To Lie With Statistics - Part 3”

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics – Part 2 March 11th, 2008 at 11:09 am

      [...] 1 - The Sample with the Built-in Bias Part 3 - The Little Figures That Are Not There Share and Enjoy: These icons link to social bookmarking sites where readers can share and [...]

Leave a Reply