Say you run across a number in the news. Sixty percent prefer blue over green. And let’s say you trust this news source not to outright lie to you: someone did measure color choices, and they did get a result of 60% in favor of blue. Here’s a short, no-math guide to figuring out what that means and what it doesn’t.
The most common catch to numbers in the news is selection bias. A lot of the time there’s some reason why people who got asked in the first place were more likely to give the same answer. If this was a survey of people who happen to be wearing blue right now, then of course more of them like blue than green.
In January 2014, two-thirds of all known planets were bigger than Jupiter. Does this mean small planets are weird exceptions, and most of them are big? No, it just means large planets are easier to find. In February 2014, a new and improved telescope announced a few hundred new planets, almost all of them on the small side. It’s not that the earlier technology was wrong—those planets did in fact exist, exactly as described. The numbers were just selected for only knowing about large planets in the first place.
If you can think of some reason why “of course” the number would look like this, you can’t trust it. People who drink moderate amounts of alcohol live longer than people who never drink? Well, of course they do—absolute teetotalers often have health-related reasons why they can’t drink. The people with health concerns are bringing down the average. Students at private schools get better test scores? Of course they do— they have richer students who are more likely to have advantages like private tutoring or a more academic home environment. Some private schools only admit students who do well on tests, and of course those schools are going to push up the average test result.
None of this proves that alcohol or school has anything to do with health or grades. It also doesn’t mean they don’t. It just means you can’t take the number at face value. It is possible to correct for this kind of thing, by comparing to similar-ish populations. But this is hard and no one does it perfectly. And news articles usually don’t describe how hard their sources tried. So you might just have to assume the worst.
The exception is randomized trials. If someone randomly decided which students went to what school, then you know the two groups of students are pretty much the same. You can probably assume the school really is responsible for differences between the two groups—but remember all they’re studying is people who sign up for random trials in the first place. If parents only sign up for that lottery when they’ve already decided they don’t like public schools, then of course private schools will end up looking better.
In the United States in 2017, there were 60,000 police officers assaulted, including 46 killed. Therefore…what? Is there a War on Cops?
Well, is 46 a lot? Depends how many officers there were. If there are 30,000, then it’s an insanely dangerous job where the homicide rate is over one in a thousand and everyone gets assaulted twice a year. If there are 300,000 officers, then being a police officer is ten times safer than that.
Turns out the US had 600,000 police officers in 2017. That’s a lot. 46 out of a lot is less risky than 46 out of a few. The homicide-against-officers rate was a lot worse than the homicide-against-everyone-else rate (it was 5.3 out of 100,000 that year), but in the same ballpark. Add in a roughly equal number of accidental deaths, and police work actually is in the top twenty most dangerous jobs.
If you read a number with no context, stop. Maybe it’s obvious— if someone says that over 100 members of Parliament have been arrested, you might have a pretty good guess how many members of Parliament there are. Most national parliaments probably have a few hundred people. If it’s “every two minutes, someone dies of Example-itis,” that’s not obvious. You don’t know how many people “every two minutes” means, or how it compares to every other cause of death.
Hopefully, the article you’re reading tells you what the baseline is. If they don’t tell you, or if they don’t even realize that they should tell you, run.
Chocolate helps weight loss, blared the headlines.
Chocolate does not help weight loss. This was a prank. The prankster conducted a study measuring 18 different things. Cholesterol, weight loss, protein, sodium…you get the idea.
When conducting scientific studies, researchers generally aim at reaching a significance level of 5%. Think of that as meaning a 5% risk that their result is actually wrong. (It does not actually mean that; it means a more complicated statistics thing. But if you think of it this way you’ll have the right general idea.) Assume every study has a 5% chance of being wrong.
The chocolate study looked for 18 values. 18 tries at a 5% chance is pretty good, so it’s not too surprising that they “found” something. And then it went viral.
The point of the prank was that you can’t just stop at the headline. Or even the full news article. You might have to look at the source itself for what else they looked for. If it could just as easily have been “chocolate helps cholesterol,” it’s probably random chance.
Note that if you’re reading it, it’s because someone at the newspaper thought it was interesting enough to publish. Did they skip over 19 other surveys with boring, obvious results? Maybe! You kind of have to assume this is what happened any time you see “new study says” anything really interesting. If it’s surprising, exciting enough that it’s worth reporting, there are probably enough studies that one or two support the interesting version just by chance. Even if it’s false. And those are the ones you’re seeing.
The safe way to avoid this is just ignore anything from exciting new studies, and assume you can get the consensus opinion from Wikipedia. If you want to stay up to date, that’s harder. News articles won’t get you there. You’d have to learn to read the studies they cite, and there is no math-free guide to doing that.
Which Number Counts
In 2015, there were about 400 million people in sub-Saharan Africa living in extreme poverty. That’s a lot— per capita it’s more than any other region of the world.
Also in 2015, this number was going up: it crossed 400 million several years earlier and was still rising. That’s more bad news: it means more and more people in extreme poverty. (If you think to ask compared to what, it’s not quite so bad: the population was increasing even faster. People weren’t getting poorer, there were just more people across the board.)
But the number of people in extreme poverty was going up more slowly than it used to. In fact, it had almost stopped. It went from 300 million to 400 million in ten years, and then for the next eight it was barely moving at all. It was starting to turn around. And in most other regions of the world, it already had turned around— the number of people in absolute poverty is plunging. The fact that it was starting to turn around in sub-Saharan Africa meant that the same was happening there too. The number of people in absolute poverty was about to go down.
You could describe this in a lot of ways: “A lot of people.” “It’s getting worse.” “It’s turning around.” If you have a point to make, you can pick whichever one sounds the best. If the article you’re reading has a point to make, it probably did pick whichever one sounds the best. You need to figure out which one matters.
There’s no clear-cut rule. But in general, comparisons to last year are more useful than raw numbers. How fast the change is speeding up usually won’t be as important. But once in a while it really really matters. The number of people living on less than $1.25 a day is headed toward zero, and this is some of the best news in the world. In this case, you really do have to look at “it’s turning around.”
When talking about coronavirus statistics, it really really matters. You have the total number of cases. You have a speed: the number of new cases per day. And it’s the change in the speed that newspapers usually pay attention to. This is for pretty good reasons: they want to be able to say whether some new plan is helping, so they look at whether new cases slowed down afterward.
(Note that “afterward” has to mean long enough that new cases start being detected. This usually means two weeks, since that’s how long it takes to show symptoms. Without a two-week lag, you’re seeing cases contracted before the new plan. Don’t trust that article.)
There are good reasons, but it also makes it hard to compare different places. If one city had 100 new cases and another had 200, that sounds like the second one is doing worse. But if the second city had five times as many cases, “only” increasing by twice as much is pretty good! At least if all you care about is how they’re doing right now.
In general: the total number of cases isn’t what you’re worried about. A high number could mean you’re in the middle of an outbreak or that your area got hit and has already recovered. But when the pandemic is over, this will be the number that ultimately matters. This is the one that directly tracks real human beings.
The number of new cases per day in your area tells you how dangerous it is to go outside right now.
Whether that number is going up or down tells you if what people were doing two weeks ago was helping or hurting.
All of these are important numbers. But you want them for different things.
How to Lie with Statistics is a short book covering this and more. It’s available as a pdf here. It does not promise no math, but keeps it pretty understandable. Note that it’s from 1954 and contains a lot of casual racist stereotypes.
Sources for specific examples: sizes of known planets, alcohol and longevity, number of U.S. police and assaults on police, number of homicides against US police, the chocolate study/prank.
And the number of people in extreme poverty, with sources, in some of the most optimistic graphs in history: