A new CNN/ORC poll of the probable Democratic Caucus goers in Iowa had TV talking heads all excited and Bernie Sanders supporters all a twitter (literally) because it indicated that good ole Bernie had moved from 8 points behind Hillary Clinton to 8 points ahead, almost over night. Suddenly it appeared that Sanders had a real possibility if posting a clean sweep in Iowa and New Hampshire providing him much needed momentum going into Nevada and South Carolina and the so called Southeaster Conference primaries March 1st.
However, if you listened closely to the background conversations you would have noticed that both campaigns were downplaying that poll as an outlier and with good reason. Take a look at the table below from realclearpolitics.com showing the most recent Iowa polls and follow along. Click here: Iowa Polls
First note the numbers for each of the eight polls in the column, “Sample”. That data indicates the numbers of interviews use in the polls as samples. (The letters “LV” indicates that all the polls were canvassed using live telephone interviews rather than robotic calls.) Note 4 of the 5 polls that show Hillary ahead were based on more than 500 interviews and the 5th was based on 461 interviews. Then note that the recent CNN/ONC poll showing Sanders ahead by 8 points was conducted with only 280 interviews. All other factors being equal, polls based on more interviews are always considered more accurate.
Then notice three latest polls, including the CNN/ORC poll. That poll and the other two (KBR and Loras College) were all canvassed during the same time frame – 1/13-1/20. Yet two polls show Clinton leading by substantial margins (+9 and +29) while the other CNN/ORC poll shows Sanders ahead by +8). Logically one or more of those polls have to be highly inaccurate because their margin of errors or “MoE” (CNN/ORC – 6.0, KBR – 4.1 and Loras College – 3.1) don’t account for very different results.
The disparate results of three these polls could have occurred for two reasons: 1) One or more of the polls could have been improperly designed, i.e. giving improper weight to sub-populations, or 2) the one or more of the polls used an unlucky sample. A seldom mentioned fact about polling is that the MoE or “Margin of Error” of a poll does not alone accurately describe how inaccurate a poll can be statistically. Taking the CNN/ORC, the pollsters would describe the results as, “Sanders ahead by 8% plus or minus 6%, 95% of the times”. When the media reports poll results they conveniently leave off the phase “95% of the times” – I guess they don’t want to confuse the reading public by giving us too much information – but that “95% of the time” phrase is important.
That phase is important because no matter how well a poll is designed, the pollsters through no fault of their own could possibly draw a bad sample, though the larger the sample the less likely it is for that to happen. For illustration let’s take an example. To make it simple let’s say that we have a barrel filled with 1,000 marbles and that of that number 900 are green and 100 are red and they are all thoroughly mixed together. However, in our role of pollster we know no nothing of the contents of the barrel except that there are 1,000 marbles in it. To estimate the nature of the contents without counting every marble we decide to blindly and randomly pick a sample of 100 marbles out of the barrel. A good sample of 100 marbles would be 90 green marbles and 10 red marbles leaving us accurately to conclude that 10% of the marbles are red. If we did this many times, most results will be close. However, it is possible, though very unlikely, that we could pick all green marbles leading us improperly assume that there are no red marbles in the barrel. It is also possible, but extremely unlikely, that we could pick 50 marbles and 50 red marbles resulting in an extremely bad sample result.
That’s what the “95% of the time” phrase describes – it the probability that a poll being accurate or way off. I won’t go into the reason why, but in the case of the CNN/ORC poll, it means that the result will be in the range from 6 percentage points high to 6 percentage points low 95% of the time. However, it also means that 5% or the time the poll results will be more than 6 percentage points off. The real lesson here is that while it may not happen very often, a particular poll can be more inaccurate than the polls margin of error.
Inaccurate polls can best be spotted by comparing their results to the results other polls taken at roughly the same time period. This eliminates the possibility that the preferences of the polled population may change over time. If one poll seems all out of kilter with the others it is probably an outlier and trust in its results should be limited. I think that is the case with the CNN/ORC poll. Its results may be inaccurate because of its smaller sample size, a bad poll design, a bad sample and/or any combination of these causes. It is far easier to believe that this one particular poll is an outlier then that all of the other recent polls are all displaying inaccurate results.
When multiple polls are available for a particular population all taken within a relative small time period – like the eight polls of the probable Iowa Democratic caucus results compiled from RealClearPolitics.com – it is best to average them. This essentially combines the interview samples of all of the polls resulting in a much bigger, more accurate sample and it has the tendency to balance out the less accurate individual polls results. So in this case I would put my trust in the average result of the last seven polls and conclude that Hillary Clinton is up by approximately 6% over Bernie Sanders, one week out from the caucus. It will be interesting to see the results of the next poll.