How To Lie With Statistics – Part 1

Sorry for the lack of posts, but I have been away at a conference. Now that I’m back, I’m ready to get right back into it with a series of posts dedicated to a wonderful book: “How to Lie With Statistics” by Darrell Huff. This introduction to the many ways that statistics can and are used to manipulate reality was first published in 1954 and is still as useful as ever.

How to Lie With Statistics

The book is broken down into the following 10 chapters:
1. The Sample with the Built-in Bias
2. The Well-Chosen Average
3. The Little Figures That Are Not There
4. Much Ado about Practically Nothing
5. The Gee-Whiz Graph
6. The One-Dimensional Picture
7. The Semi-attached Figure
8. Post Hoc Rides Again
9. How to Statisticulate
10. How to Talk Back to a Statistic

As great as the book is, the examples, as you can imagine, are a bit dated. My goal is to go through each of these topics and freshen them up with new examples. I also plan to include some suggestions for how to avoid the problems from the point of view of the compiler and the consumer of statistics.

This post will be dedicated only to part 1 since it is so critically important to statistics.

1. The Sample with the Built-in Bias

Sampling is wonderful. Rather than ask everyone how much they, for example, like something, we ask a small group and, assuming we did our job right, infer the population’s attitudes based on the sample’s responses. This saves time and money. But what happens when that sample is biased? In other words, what happens when the group of people we select from the population doesn’t really represent the population at large?

This problem is quite apparent in any type of consumer satisfaction (or opinion) research. Let’s say, for example, that Apple is interested in how satisfied their customers are with their iPhones. Rather than asking every iPhone user, they could sample a subset of them and derive their conclusions based on this group. Seems simple, right? Wrong.

Let’s think through the logistics for a second. There are several million iPhone users (yours truly is one of them). Are all of these users equally likely to respond to a survey about product satisfaction? Probably not. For example, the executive who barely has time to check his e-mail probably won’t respond to such a survey. In contrast, the Apple Fan Boys who rave and rant about Apple products might be more willing to do so. If this is true than any sampling technique will necessarily under-represent the opinions of business people and over-represent the opinions of fan boys.

Let’s look at a real example. In late 2007, ChangeWave released their results of a customer satisfaction survey of 3,654 consumers. They found that 82% of iPhone respondents reported that they were “very satisfied with their current cell phone” compared with only 51% of RIM (Blackberry) users. On the face of it, these results are compelling. But we should think before we preach the greatness of the iPhone. Let’s take this one step at a time.

Let’s start by looking at the sample size: 3,654. That’s pretty impressive. But wait, in the chart they list 9 different manufacturers. Which means that the number of iPhone users must be quite a bit less than 3,654. In fact, based on the article it seams like they only have 73 iPhone users (2% share * sample size). Do 73 people speak for all iPhone users? I doubt it. If Apple fan boys are overrepresented than satisfaction may be inflated. I’m not trying to say that ChangeWave intentionally manipulated their results, but they certainly didn’t do much to make them transparent.

As a quick aside (this is a topic of discussion for a later post, but worth mentioning here), we should be very wary of the comparisons that are being made. Because the chart above represents manufacturers, we can’t even really make a true comparison between the iPhone and the other products. RIM produces several models of the Blackberry and who knows how many models LG, Motorola or Sony/Ericsson have. The chart is comparing a single phone, the iPhone, with the average of all phones from other manufacturers. I’m sure some of the respondents who use RIM products are using dated Blackberries and might not be so happy. Likewise, some respondents may be using the latest RIM product and be just as satisfied as iPhone users. Because the data are averaged, we’ll never know.

So what should market researchers do to avoid such biases? Unfortunately, as they (hopefully) know, there is no easy fix. Obtaining a representative sample from any population is nearly impossible. The best anyone can do is attempt to identify the different types of respondents that exist in the universe in question and sample equally from each group. This approach, called stratified sampling, has its pros and cons. The biggest pro is that if the stratification is done correctly (BIG ‘if’) than some of the issues I mentioned before can be avoided. However, compared to simple random sampling this could result in a bias if the stratification is incorrect, again under- and over-representing different parts of the population. In short, there is no perfect way to sample, but striving for perfection is a must.

As for the consumer of statistics, it’s critically important to always ask questions like: “who are the respondents in this research?” and “are all members of the population accurately represented?” If you can’t answer those two questions with any sense of conviction, make sure you read the statistics with a big grain of salt.

One final note on sampling: there is a large debate regarding the US Census and sampling. On the one hand, Article I, Section 2 of the US Constitution requires that citizens be counted to determine the number of Representatives to the House of Representatives from a given state. A strict interpretation of the Constitution (and one backed by the Supreme Court) suggests that sampling can not be used to determine the US population. On the other hand, conducting a census (i.e. counting everyone) has some major problem. I won’t go into all the details, but the short version is that many groups of citizens, and especially minorities, are massively under-represented by census taking. For an excellent discussion as to why, I’ll direct you to this article by Ivars Persron in Science Magazine. If you have a few minutes, it’s a worthwhile read.

Next time I’ll discuss the pitfalls of means, medians, and modes (chapter 2) and the need for more information about statistical figures (chapter 3).

Part 2 - The Well-Chosen Average

Part 3 - The Little Figures That Are Not There
Part 4 - TMuch Ado About Practically Nothing


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb

3 Responses to “How To Lie With Statistics – Part 1”

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics – Part 2 February 29th, 2008 at 12:08 am

      [...] from the previous post, let’s get right into [...]

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics – Part 4 April 16th, 2008 at 10:26 am

      [...] Part 1 - The Sample with the Built-in Bias Part 2- The Well-Chosen Averge Part 3 - The Little Figures That Are Not There Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics - Part 3 April 16th, 2008 at 10:27 am

      [...] Part 1 - The Sample with the Built-in Bias Part 2- The Well-Chosen Averge Part 3 - TMuch Ado About Practically Nothing Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

Leave a Reply