After nine seasons 76 million people tuned into watch the last episode of the TV Show Seinfeld. That episode was a profound middle finger to the audience of Seinfeld, basically telling every dedicated fan that after all those years the show was pointless. It was an extremely polarizing episode, some loved it, some hated it. But 76 million viewers! Do you ever wonder how they came up with that number? Did NBC have some magical way of tracking every TV with a wire or antenna that tuned in for the finale? No, they did not. Instead they used statistical sampling to come up with that number. This is how most ratings and viewership “estimates” work; a company like Nielsen polls a set of people who are representative of the total viewership, and then they multiply it out. If you sample 1000 people, which is supposed to represent a 100 million human beings, you’re going to multiply it by 100,000 to get your best guess at the actual number of viewers. This sort of estimation in statistical analysis is hugely important in understanding what’s going on today with numbers about viruses like the flu and the novel coronavirus.
Let’s take a look at influenza. You might have heard of some of these numbers: During the 2018 - 2019 season, 34,200 died from influenza in the United States. Meanwhile, as I write this on April 19, 2020, Johns Hopkins is reporting 39,090 deaths from the novel coronavirus in the United States. Ok, so at first glance perhaps the novel coronavirus is marginally worse than the flu, but is that an accurate statement?
A number is only as good to you as your understanding of where it came from. It’s like Seinfeld and the 76 million people who tuned in for what many (not me) have argued is one of the worse finales to a TV show ever aired. We already know that 76 million people were not polled about Seinfeld, and the same is actually true of the 34,200 deaths from the flu in the United States. This 34,200 number comes from the Center for Disease Control’s website, and a lot of people miss the methodology the CDC used for coming up with this number. If you click a little deeper into the site and read further you’ll see that the CDC uses a tool to estimate total deaths from reported deaths.1 Here’s their summary to the method:
We used routinely collected surveillance data, outbreak field investigations, and proportions of people seeking health care from survey results to estimate the number of illnesses, medical visits, hospitalizations, and deaths due to influenza during six influenza seasons (2010‐2011 through 2015‐2016). (emphasis mine)
Let’s take a closer look at these numbers for the state I live in, Indiana. For the 2018 - 2019 season Indiana had 1,118 deaths from influenza according to the CDC. It’s important to remember that the CDC does not actually do these tests in each state, rather they get their data from state health departments like the Indiana State Department of Health (ISDH). ISDH reports to the CDC actual tests each week, and you can view these reports online. If you look at the last report of the flu season, which stretches from October to May, you’ll see that there were 113 reported influenza deaths. These are deaths with actual lab tests confirming that the individual who died had influenza. At the same time though, the CDC reports an estimate of 1,118 flu deaths in Indiana, which is derived from the 113 tested deaths reported by Indiana’s health department using the method they described above. Remember, 113 is the confirmed test number (like Nielsen’s sampling of the viewership for the Seinfeld finale), and 1,118 is an estimate (like the 76 million viewers that tuned into the Seinfeld finale).
Most news sources are relying on data from Johns Hopkins (which uses CDC data) to report the number of deaths from the novel coronavirus in the United States. The data in these numbers are from confirmed lab tests of people who have died from the novel coronavirus. These are not estimates. So when you want to compare the flu and the novel coronavirus in Indiana, the 1,118 estimated deaths from the 2018 - 2019 flu season is not the right number to compare to Indiana’s current 562 confirmed deaths from the novel coronavirus2, the 113 number is. Comparing 113 flu deaths to 562 novel coronavirus deaths is comparing confirmed lab tests against confirmed lab tests.
But of course it’s more complicated than that. Notice I’m not referencing the 2019 - 2020 flu numbers. The reason for this is that this year’s flu season is not actually over. Just like the novel coronavirus season, we’re in the middle of it so our confirmed tests are incomplete. They’re going to grow tomorrow and the next day and the day after that until the season for this virus concludes. Right now we don’t know how long that is, but we know that the 113 confirmed deaths spanned a 7 month window while the 562 novel coronavirus deaths only span about 2 months.
The estimated deaths from the flu of 1,118 were just under 10x higher than the confirmed deaths of 113. I have no way of knowing if that multiple will hold true for the novel coronavirus. I hope not. We’re certainly testing far more people for the novel coronavirus than we do for the flu. We won’t know estimated deaths until this virus has run its full course and the CDC can work up a methodology and complete their research. For the time being all we can do is compare confirmed tests, and those are certainly higher. The important thing here is to remember that you have to be comparing the right numbers. If those numbers are not sourced the same way then you can’t simply say one is worse because the number is higher.
None of this is intended to scare you. It’s intended to help you understand why statements like “we lose more people from the seasonal flu”3 are, at best, a misunderstanding of the underlying numbers and how we got them. I think it’s fair to say that the number of confirmed tests show that the novel coronavirus has a higher mortality rate than the flu. How much higher remains to be seen, and we probably won’t know until this is all over. Until that day, stay safe, please.
This mathematical “revelation” is not new or original, but I chose to write this after seeing friends and close family compare the flu and novel coronavirus. If you want a longer and more sophisticated analysis of the number conundrum check out this piece from late March in The Washington Post titled, “Those covid-19 death figures are incomplete”. Just make sure you read the article in its entirety, please.
Disclaimer: I am not a epidemiologist, nor a health care worker, nor a scientist. My only qualifications are that I can read and I once took a statistics class in High School. The best thing you, dear reader, can do, is to do your own research, and learn how these numbers are made yourself.
If you’re wondering why the CDC reports an estimate rather than just raw test data think back to the last time you had the flu, or the time before that. Did you make it to the doctor? If you did, did they test you? The flu is common enough that many of us have had it a few times and we didn’t necessarily get tested each time to confirm it. Doctor visits and tests both cost money and they can curb the decisions that we make, especially during seasons when the flu may not be as bad. Factor in those without health insurance or even those who are naturally averse to visiting the doctor (like my friend Jon) and the confirmed tests really only show us a portion of the picture. This is ultimately why the CDC does research beyond confirmed tests to come up with a more accurate representation of the impact of the flu on the United States. Even if the CDC wanted a 100% confirmed number they couldn’t pull it off, it’s just too hard to do in a country of 328 million people. It’s the same challenge a company like Nielsen is trying to figure out how many people tuned in for the Seinfeld finale. ↩
This number is current as of 4/19/2020, and if you read this anytime after the day I published this article it will be wrong. This is really amazing when you stop and think about it. Prior to the novel coronavirus the state of Indiana reported stats for the flu weekly in a PDF. They had a fancy table, but that was really it. The novel coronavirus has forced state health departments like ISDH to step up their analytics game in a big way, now giving daily updates with all sorts of data segments to analyze. This is great, but it also factors into the challenge we face with data right now as so many media outlets historically have reported estimates rather than raw data. ↩