Pop quiz! What’s the difference between the mean and the median in a data set? You probably learned this long ago while learning percentages and decimals (Math-U-See covers these topics in the Zeta level.) But there’s a good chance you haven’t had to calculate median since you were in school. So a quick refresher. Let’s say we have a list of the following numbers: 1, 2, 3, 4, 5 – to find the mean, we add up all the numbers in the list (15) divided by how many numbers there are (in this case 5 numbers): 15 / 5 = a mean of 3. To find the median, we locate the number directly in the middle of the list, sorted in ascending or descending order: in this example, the median is also 3. Okay, so why does the median matter? And how can any of this help us in our daily life?
Imagine you see the following headline: “Economist Projects Higher Wages for Typical American Worker.” You read the article, curious if this means that your boss is preparing to offer you a raise. A year later, you notice another article from the same paper, this one announcing that “As Predicted, Typical American Worker Earned More.” At this point, you’re feeling grumpy. Your company didn’t give you a raise, and when you talk to your friends, they tell you that their bosses didn’t give them a raise either! You decide to do some research and soon you see a headline from a different newspaper that says “For Most US Workers, Wages Remained The Same.” Hmm, you think to yourself, one of these news sources has to be lying!
Is My Newspaper Lying To Me? Not Exactly.
While it is possible that one of the news sources is trying to deceive you, or that one of the studies is factually incorrect, there is a far more likely reason for these discrepancies. It’s very possible that Paper A is basing its understanding of the typical worker on the mean (average) of all earned wages, while Paper B is basing its understanding of the typical worker on the median (center point.) But wait, aren’t mean and median the same number? Unlike the earlier refresher example, often the mean and the median are dramatically different numbers. Let’s explore why this is the case, and why the median can provide us with a more helpful understanding of our data.
Your Coffee Shop Friends Meet Your Wealthy Uncle
Imagine that you are in a coffee shop with 9 of your friends, and you decide to calculate the average annual income. You add up the incomes and divide by the total number of the group (10). Let’s say the income distribution is as follows: 40k, 40k, 30K, 50K, 60K, 70K, 50K, 30K, 20K, and 70K. The average or mean is therefore 46K. While some of your friends make more than that, and some less, 46K seems to be a pretty straightforward indicator of the typical income.
But now suppose that your really wealthy uncle decides to join the group. Your uncle makes a whopping 500K a year. Let’s determine what the new average (mean) income is in your group, which is now composed of 11 people. Suddenly the average income in your group is 87K — which is a higher amount than anyone in your group actually earns, except for your uncle! In statistics terminology, we could call your uncle an outlier, because his income is dramatically different from the rest of the incomes in the data set.
If you told me that the average annual income of your coffee shop group is 87K, you would technically be telling the truth. And yet that number would be misleadingly high. This is where determining the median can provide us with much better insight. To calculate the median, we list out the different incomes from lowest to highest and identify the income that is exactly in the middle. 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 500. Notice that unlike when we calculated the mean, this median income of 50K is not dramatically affected by your uncle’s income (an outlier). If you told me that the median income of your coffee shop group is 50K, that’s a much more helpful metric to aid my understanding of the typical income in your group!
Reading The Newspapers
Now that we’ve seen why median matters for helping make sense of data, we’re ready to answer the question of why those headlines are so different. Paper A tells us that there have been increases in average income, as determined by the mean. But we should be asking ourselves if the “wealthy uncle” is included in their data. In other words, are there outliers pulling the mean higher, as we saw in the coffee shop group? In this case, if Bill Gates (and other millionaires and billionaires) had a dramatic increase in earnings, that would raise the mean, even if no other income bracket experienced an increase in earnings! Paper A is not lying, but the headline is nevertheless not as helpful in understanding the health of the economy or the experience of the typical worker.
On the other hand, Paper B based its claims on the median worker’s income, and told us that the median worker’s income didn’t rise. Remember that the median is the number directly in the middle of the data set. Paper B’s headline is just as truthful as Paper A, but is more helpful in our effort to understand the typical American worker’s experience.
Credible researchers will let you know what they are basing their information on. If using the mean, they will also tell you about standard deviation (telling you if most of the numbers in the data set are close to the mean or not). And good journalists will pay attention to those details and report them in easy-to-understand ways. Of course, the more that you can learn to ask questions about the underlying data, the more you can determine which news articles are more insightful in their reporting.
Next time you see an article claiming something about the “average” or “typical,” ask yourself if the report is based on the mean or median, and see if you can find out from the study that is being reported. Once you have that information, you’ll be in a much better position to judge the accuracy of the claims in the article.
Evaluating claims based on statistics is a skill set within data literacy. Our goal is to learn how to think more robustly about complex information, and to sift through that information in a way that can aid our decision making.
Blog post: A Software Engineer’s Advice on How to Build Understanding