You may have seen something going around in the wake of the 2020 US Presidential Election where people are claiming they can prove there was fraud in the votes using math.
They are wrong. This is a post you can share to explain why.
ETA: RadioLab did a really excellent episode on this which includes a lot of what’s below including interviews with me and Dr. Mebane who did the election analysis at umich. If you don’t want to read all this, go check that episode here: https://www.wnycstudios.org/podcasts/radiolab/articles/breaking-benford
I am an expert with published research on Benford’s Law, the statistical pattern they are talking about. I’m going to tell you why they are doing it wrong and why, even if they did it right, it wouldn’t indicate fraud.
Benford’s law basically says that the first digit of numbers in some naturally occurring systems follows a pattern. You may intuitively think that numbers that start with 1 are just as common as numbers that start with 9, but in lots of systems, around 30% of numbers start with 1 and the frequency declines to where only like 5% of numbers start with 9. This is seen ALL OVER! I showed that it applied in social networks to friend counts and that it could be used to detect bots. It’s used in financial and accounting investigations and can even be used in court as evidence of fraud. The length of all the rivers on earth follow this pattern. Atomic weights. JPEG coefficients. It’s mindblowing!
If you want to know more about it, Netflix has a series out called Connected and episode 4 (Digits) is all about it. I’m in that documentary, so say hi when I come across your screen.
Probably because of that documentary, lots of people are saying “I can take the election counts from precincts and look at their distribution of first digits and see if there is fraud!”
THIS DOES NOT WORK.
Whether Benford can be used to detect election fraud has been studied for decades. What everyone who studies this knows is that analyzing first digits absolutely DOES NOT WORK! Why?
First, there’s not a big spread of orders of magnitude in precinct sizes. Most places Benford is applied, you have numbers in the 10s, the 100s, the 1,000s, the 10,000s, etc. Precincts don’t have that much variation in them because we don’t want them to be so giant that we can’t count all the votes. That’s one strike against Benford working.
Next, and this is really important, votes in a precinct are (basically) split between 2 candidates in this election. (3rd party candidates make up such a small percentage that they don’t matter for this point). If Trump gets X votes, Biden gets (basically) TOTAL- X.
Say every precinct has 1,000 people. If Trump follows Benford, Biden COULD NOT follow it. If ~30% of Trump’s vote counts start with a 1, and Biden gets the rest, his numbers would start with 8s or 9s (e.g. Trump gets 175 votes, Biden gets 1000–175=825). Even though precincts aren’t all the same size, the fact that one number is dependent on the other violates the basis of Benford (it requires independence). Intuitively, I think we can see that splitting totals means it would be tricky to have both candidates follow the pattern.
Third, we’ve studied this. We know it doesn’t work. People may share some data from past elections, but there are decades of research looking at elections around the world and it’s extremely well-established the first significant digit Benford analysis does not work here. Full stop.
All the people who read a Wikipedia article and put some numbers in Excel are doing the thing I outlined above. We know this doesn’t work. They are lying — not just misinformed. Many of us have been tirelessly correcting their methods over the past 5 days, but they keep coming. They know it doesn’t work. The papers are all public and available. They do not care. It looks good for their argument and they are trying to trick you.
Ok, so if Benford doesn’t work, why is there decades of research about it? That research looks at the distribution of second digits which also happen to follow an expected pattern in a lot of cases. Does that work well for detecting election fraud?
Eh, no. It’s not great. It can do it in some cases, but it also fails a lot. Here’s a quote from a paper on the topic:
“Benford’s Law is problematical at best as a forensic tool when applied to elections…Its ‘success rate’ either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst.”
source: Deckert, Joseph, Mikhail Myagkov, Peter C. Ordeshook. “Benford’s Law and the detection of election fraud.” Political Analysis 19.3 (2011)
The research on it is always full of exceptions. There are places where we know there wasn’t any systematic fraud, but the numbers don’t look like a second digit Benford analysis would expect. We know about this and then look into the reasons it fails. That’s not really interesting for this discussion, but I bring it up just to say that you can’t just do a second digit analysis and then have a totally reliable technique.
But say you don’t care about these exceptions. You’re going to charge ahead with a second-digit Benford-based analysis! Great! You don’t have to! Walter Mebane, one of the foremost experts on using Benford to analyze elections, already did!
His paper is at http://www-personal.umich.edu/~wmebane/inapB.pdf
The title? “Inappropriate Applications of Benford’s Law Regularities to Some Data from the 2020 Presidential Election in the United States”
He does a proper second significant digit Benford analysis of the data and shows there is no evidence of fraud. You can go read it for yourself. (it’s a nice paper and pretty accessible if you skim over the more advanced stats)
So in conclusion: what people are doing analyzing first digits is known not to work. Second digit analysis may have some promise, but it is not reliable. And even if you’re convinced it will get it right, it shows nothing suspicious in this election.
Do not be fooled by people who read an article online and made a chart in excel. Statistics are complex and you need to be informed to use them correctly. The people pushing this Benford’s Law conspiracy are not only ignorant of the correct way to use it, they are intentionally dishonest. Listen to scientists and statisticians. Our method does not provide any evidence of fraud in this election.