Tomorrow night, somewhere near half of the country will undoubtedly be crying foul, because their predictions for who would be the winner and who they voted for fell somewhat short of reality. Before you Democrats scream bloody murder about how Barack Obama could not possibly have lost the election when everyone you knew voted for him, and all of those polls said otherwise, maybe a small lesson as to how those models are put together, and why some are more predictive than others is in order.
This year has been one of the most egregious in my memory as far as pollsters using their publications to sway public opinion rather than recording it. It became so blatant this year, that the Obama Administration actually filed suit against Gallop, for using an unapproved demographic model in one of the polls that they published.
When you read the top line on a poll, you will see something like this:
Based on a survey of 750 likely voters, x percent support Barack Obama, y percent support Mitt Romney, z percent support another candidate, and t percent are undecided.
What you may not know is how that top line is constructed. First things first, it was not put together with a mere 750 interviews. For that model populace of 750 likely voters, our polling company probably conducted between 1500 and 2250 interviews. In order to collect those interviews, our polling company probably logged around 12500 phone calls. Out of those results, a sample population was literally constructed, based on the polling company’s educated guess as to what the demographic make up of the electorate would eventually be.
What is uncanny, is that within the 20 or so subgroups that make up the eventual demographic model, the polling companies will be in very close agreement. That is to say, in a much simpler version of one of their subgroups, they all pretty much agree that 86% of people who self identify as Republicans will be voting for Mitt Romney, and that 85% of those who self identify as Democrats will be voting for Barack Obama. To be certain, those groups are further splintered into categories delineated by age, gender, education, income, profession, and anything else the pollster feels is pertinent.
Where the pollsters disagree is in what that population that finally makes it to the polling booth will ultimately look like. Something that you may have noticed in some of the criticisms are the D/R/I splits. This is the easiest piece of information that will, within seconds give you some idea as to which way a polling company may be skewing its results. During the 2008 elections, the final model had a partisan split that favored Democrats by 8%. The D/R/I split was 39/31/30. For the midterm elections of 2010, that model had changed to reflect this split, 36/35/29.
So, we can see that relatively small changes in the demographics can lead to vastly differing results. A Democrat wave election of 2008 was changed to an historic Republican wave election in 2010 with our model’s makeup shifting in the following manner, 3% smaller representation of Democrats, a 4% increase in the representation of Republicans, and a 1% decrease in the percentage of Independents.
When the pollster contacts a respondent and asks those important questions about support and intentions, he will also ask for some demographic information. At the end of the day, he will know already that he wants his model to be x number of people who are 18 to 29 year old Republican men, and y number of 29 to 40 year old Democrat women, and so forth. That definitely will not be represented exactly in the number of respondents. What he will have is some representation from each of those groups that is included in his sample demographic.
This is how that sample is constructed. The model’s makeup is, within most polls, predetermined, before the first interview is conducted. So, we’ll give our example model a breakdown of 40/40/20, just as an example. The pollsters may have contacted, quite by accident more females then males, more Republicans than Democrats, more younger people than older, or any other possibility. For the ease of use, our example is only worried about total Democrats, total Republicans, and total independents. Out of 750 likely voters, we have determined that there will be 300 Democrats, 300 Republicans, and 150 Independents, regardless of who was actually contacted. If 86% of Republicans responded that they plan of voting for Mitt Romney, then the total added to his top line from that group will be 258. We’ll give Barack Obama the remaining 42. From the Democrats who responded, 255 will be added to Obama’s top line number with 45 being credited to Mitt Romney. Amongst the independents, let’s assume that they’ll split straight down the middle, for 75 each. This gives us a made up poll of a statistical tie, almost exactly 50/50. What happens to our example if we change the model?
Applying the interview results to the 2008 demographic split, we get a result that gives Barack Obama a decisive victory of 52.5% to 47.5%. If however, the demographics for the voting populace looks like the 2010 model, the same exact poll finishes with a top line that reads Obama 50.40% to Romney 49.60%, a much different result.
Bear in mind that in every poll which exists in the real world, Romney is winning the independents by huge double digit margins. In the last 7 Presidential Elections, the demographic split in the voting model reflected the previous midterm election almost exactly, and not the previous Presidential election. The polling companies today are trying their level best to pretend that 2010 plainly did not happen, and so by the way is our President. It is therefor hard to understand on what basis anyone with any sense of history is able to look at this year’s election and determine that not only will this old and tested rule be thrown out completely, but that the percentage make up of Democrats will exceed the record breaking make up of the 2008 elections by a ridiculous margin.
The bottom line is therefore easy to manipulate, base on the ultimate make up of the partisan split. For instance, if the sample chosen reflects something that looks like the latest piece of fiction as published by ABC, with a partisan split that predicts the D/R/I split will be 40/29/31, their numbers within the subgroups may be dead on accurate, but the question any sane person would ask, is where on Earth did you come up with that sample?
During the election of 2004, all of the major networks put together exit polling data based on unbelievably optimistic partisan models and announced around noon on election day that John Kerry would be inaugurated as our 44th President. So believing in the less than perfect science of polling were the Democrats at the time, that cries of shenanigans were shouted through out the land. Court challenges were filed, and subsequently thrown out by judges who are still laughing. (By the way, will somebody please tell the good people at NBC that they can call Ohio for George W. Bush already, it is now 8 full years past the election.) It is necessary to read usually until page 29 or 30 of published polls to see how they derived their top line numbers, and usually those methods of construction are more important than the headline the polling company gives its product.
Normally, I wouldn’t bother with an essay that was designed to prove to an unsuspecting public just exactly how wonkish I am, but when I see people on my side react with the same Eeyoreish hysteria each and every time one of these polls gets released, it makes me want to scream, wake up, we’ve all seen this movie before.
On Wednesday, the rest of us will know what the internal pollsters of each of the campaigns have told the respective candidates, that Mitt Romney will be our 45th President, and Barack Obama will have two years on the lecture circuit campaigning to win his party’s nomination for 2016. Every Democrat and his brother will be screaming bloody murder that the polls could not possibly have been that wrong, even though they’ve never actually gotten it right.