How To Lie With Statistics – Part 2

How to Lie With Statistics

I’ve realized that I have way too much to say on each topic to combine them within posts. As such, from now on I’ll focus on a single topic per post.

Continuing from the previous post, let’s get right into it.

2. The Well-Chosen Average

Huff talks about three different ways of representing an average: mean, median, and mode. I’ve never heard someone refer to the mode as an average, but who knows, maybe we’ve all gotten a bit smarter since the ‘50s. The mode, or the number most represented in a distribution, is certainly relevant, but for all practical purposes, the mean and the median are the two statistics that I’d like to talk about.

First the definitions.

Mean: the arithmetic average of a series of numbers.
Median: the middle value in a series of numbers.

The reason we should care about these is because people can manipulate data by reporting a mean, when they really should be reporting a median.

First, it’s important to note that the reason that means are used so much more than medians is because many popular statistical tests work best with data that are normally distributed, and when this is the case, the mean equals the median…so why complicate things? Well, the reason is that often data are not normal.

Imagine that at a company 95% of the employees earn $25,000/year (we’ll call them assembly line workers) and the remaining 5% (we’ll call them management) earn $5,000,000/year. In that case (we’ll assume they have 100 employees), the mean salary is $273,750 while the median is $25,000. Clearly, a big difference. If someone told you that you should work for this company because their average salary is north of a quarter million dollars/year you’d be excited. You’d be less excited when you learned the reality.

Sometimes, however, neither of these statistics are all that useful. Imagine that you have a distribution like that one below.

A bimodal distribution like this one makes means and medians somewhat useless since the two represent central tendencies. With a bimodal (or multimodal) distribution, the central tendency isn’t that interesting. Both values will give you something that is in between the two masses. You often find such distributions in count data where there is a normal distribution around some mean and another around 0 or 1 (often just one of the two).

Imagine if we looked at the number of times per month that a person from a community visits a drug store. There will likely be some people who go semi-regularly (say 4-5 times/month) and they will generally fall into a nice normal distribution. However, there will also likely be a large group of people who never go and will all mass on 0. In this case, means and medians are meaningless since they combine two distinct groups.

The lesson here is simple, make sure you know what the statistics you are looking at represent. If you can, always examine the entire distribution. If you see a skew in the data, dig a bit deeper.

Part 1 - The Sample with the Built-in Bias

Part 3 - The Little Figures That Are Not There
Part 4 - TMuch Ado About Practically Nothing


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb

3 Responses to “How To Lie With Statistics – Part 2”

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics - Part 3 March 11th, 2008 at 11:08 am

      [...] (about 26% in the US) and the multi-family households (about 4%). This is similar to the issue of a distribution I talked about in the previous post. Again, the point here is that while averages are quick and simple, ranges are far more [...]

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics – Part 1 March 11th, 2008 at 11:09 am

      [...] Part 2 - The Well-Chosen Average Part 3 - The Little Figures That Are Not There Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

    1. Anonymous Prof » Blog Archive » How To Lie With Statistics – Part 4 April 16th, 2008 at 10:28 am

      [...] 1 - The Sample with the Built-in Bias Part 2- The Well-Chosen Averge Part 3 - The Little Figures That Are Not There Share and Enjoy: These icons link to social [...]

Leave a Reply