Sam Wang is Keeping Me Sane

My confidence level on holding the Senate is beginning to get just a little wobbly, so I am pleased to see Sam Wang explain his model. I have a lot of confidence in his abilities to predict election outcomes. I also have a lot of confidence in Nate Silver. The difference between their models is in how they treat so-called “fundamentals,” which are non-polling factors like the historic behavior of a state, the prior political experience of candidates, the state of the economy, as well polling factors unrelated to a specific race, like right track/wrong track, Congressional ballot preference, and presidential job approval.

When you factor in these tangential metrics, it causes election estimates to move more favorably in the Republicans’ direction. Sam Wang estimates that it is giving the GOP a two point bump in Silver’s model, which is enough to flip his prediction of control from blue to red. Wang doesn’t use fundamentals because he thinks they are too unpredictable to be helpful.

I have already written about the real skewed polls. This is an affect of the likely voter screens that pollsters use to try to make their polls more accurate. In 2012, all the likely voter screens were too optimistic, and the most accurate poll was a Ipsos/Reuters poll of registered voters. There are different ways to construct likely voter screens. You can ask respondents how likely they are to vote, how enthusiastic they are, or you can model which responses you use to mirror the expected demographic makeup of the electorate. If you screw this up, as every polling outfit did in 2012, then you miss the mark. Miss your mark badly enough and you’ll make the wrong call, as Gallup did in predicting that Romney would win the popular vote.

In an election in which even Wang predicts the most likely outcome as a 50-50 split, the more assumptions you make, the worse you’re likely to do. This is because the more factors you have, the more likely you are to miss badly on one of them.

And, in this election, nearly all of these extra factors work in the same direction, to improve the Republicans’ chances in the models. Yet, the actual poll numbers despite the likely voter screens have persistently showed the Democrats over-performing expectations.

So, my confidence level has gone down somewhat over the last several weeks, and I think we’re in toss-up territory at this point. When I get the chance, I will sit down and go over the field of play, race by race. In the meantime, Professor Wang is keeping me sane.

5 Comments

Tien Le on September 10, 2014 at 12:04 pm

Okay, I’m confused. You write that the likely voter screens were off in 2012, then you write that the likely voter polling favors the Dems thus far. What am I missing?

Frank Schnittger on September 10, 2014 at 1:02 pm

Booman Tribune ~ A Progressive Community

. Yet, the actual poll numbers despite the likely voter screens have persistently showed the Democrats over-performing expectations.

The key word is despite

Registered voter polls presumably show Dems even further ahead. In addition likely voter screens are even more unreliable this far out from an election.

Tien Le on September 10, 2014 at 1:26 pm

Thank you. I knew something was getting by me.

GreenCaboose on September 10, 2014 at 1:31 pm

Sam Wang’s methodology is surprisingly simple. Yet, the simplicity is deceptive. It reminds me of some really great user interfaces I’ve seen created – they seem so simple but the developers will tell you is that the hardest work was choosing which of the 95% of the lines of codes they had written to remove from the final product.

Oversimplifying somewhat, Wang basically takes the polls as they are, throwing out the ones with obvious designed bias (yes, that means you Rasmussen). But keeping those which may have unintentional methodology bias. In Presidential contests he ignores the national polls as noise – part of a general philosophy of just dumping data that has either limited value or may even mislead. Ditto for weighing for historical factors.

The problems with historical factors are multiple. First, polling has changed dramatically over the past 20 years – polling data (as presented after the pollsters apply their filters and weights) from 1994 is radically different in nature from data from 2014, just in terms of the methods used and the sheer quantity of polls.

Second, the electorate has changed dramatically in the same time. How people respond to questions like party identity – and how people think about party identity – is very different now. Partly this is people changing with their culture, partly this is the result of one generation dying off and another taking its place.

Third, and finally, elections are discrete events with many unique factors in each. If we’re dealing with mid-term elections we have only 5 previous elections from which to gather data from for the past 20 years – each of which had dozens of different variables in play including economic, war, scandal, etc. If you were doing social science stats and had a data set with 5 occurrences you’d be extremely hesitant to draw conclusions – and if dozens of variables were in play you’d stop before even bothering to run a regression.

Which is where Silver has screwed up when he screwed up – adding in all those special factors, sometimes literally by considering polling data dating back to 1932! I guess it’s a personality style – if data exists he feels he has to use it (similarly he keeps the biased polls but adds complicated factors to assess and correct for house effects). This results in a bigger, not smaller, error rate for Silver – which is why in the last two presidential elections he was forecasting an Obama victory around 75% while Wang was over 99%.

Marie2 on September 10, 2014 at 2:15 pm

The closer the polls get to the actual election day, the better Wang does and the poorer Silver does because of all his built in fudge factors.

OTOH, I had Obama at 99% by 11/11 because of all the non-quantitative stuff and the polls over the subsequent year never gave me enough pause to reconsider that 99% until 10/12 when I upped it to 100%.

Tien Le on September 10, 2014 at 12:04 pm

Okay, I’m confused. You write that the likely voter screens were off in 2012, then you write that the likely voter polling favors the Dems thus far. What am I missing?
- Frank Schnittger on September 10, 2014 at 1:02 pm
  
  Booman Tribune ~ A Progressive Community
  
  . Yet, the actual poll numbers despite the likely voter screens have persistently showed the Democrats over-performing expectations.
  
  The key word is despite
  
  Registered voter polls presumably show Dems even further ahead. In addition likely voter screens are even more unreliable this far out from an election.
Tien Le on September 10, 2014 at 1:26 pm

Thank you. I knew something was getting by me.
GreenCaboose on September 10, 2014 at 1:31 pm

Sam Wang’s methodology is surprisingly simple. Yet, the simplicity is deceptive. It reminds me of some really great user interfaces I’ve seen created – they seem so simple but the developers will tell you is that the hardest work was choosing which of the 95% of the lines of codes they had written to remove from the final product.

Oversimplifying somewhat, Wang basically takes the polls as they are, throwing out the ones with obvious designed bias (yes, that means you Rasmussen). But keeping those which may have unintentional methodology bias. In Presidential contests he ignores the national polls as noise – part of a general philosophy of just dumping data that has either limited value or may even mislead. Ditto for weighing for historical factors.

The problems with historical factors are multiple. First, polling has changed dramatically over the past 20 years – polling data (as presented after the pollsters apply their filters and weights) from 1994 is radically different in nature from data from 2014, just in terms of the methods used and the sheer quantity of polls.

Second, the electorate has changed dramatically in the same time. How people respond to questions like party identity – and how people think about party identity – is very different now. Partly this is people changing with their culture, partly this is the result of one generation dying off and another taking its place.

Third, and finally, elections are discrete events with many unique factors in each. If we’re dealing with mid-term elections we have only 5 previous elections from which to gather data from for the past 20 years – each of which had dozens of different variables in play including economic, war, scandal, etc. If you were doing social science stats and had a data set with 5 occurrences you’d be extremely hesitant to draw conclusions – and if dozens of variables were in play you’d stop before even bothering to run a regression.

Which is where Silver has screwed up when he screwed up – adding in all those special factors, sometimes literally by considering polling data dating back to 1932! I guess it’s a personality style – if data exists he feels he has to use it (similarly he keeps the biased polls but adds complicated factors to assess and correct for house effects). This results in a bigger, not smaller, error rate for Silver – which is why in the last two presidential elections he was forecasting an Obama victory around 75% while Wang was over 99%.
- Marie2 on September 10, 2014 at 2:15 pm
  
  The closer the polls get to the actual election day, the better Wang does and the poorer Silver does because of all his built in fudge factors.
  
  OTOH, I had Obama at 99% by 11/11 because of all the non-quantitative stuff and the polls over the subsequent year never gave me enough pause to reconsider that 99% until 10/12 when I upped it to 100%.

Sam Wang is Keeping Me Sane

About The Author

BooMan

5 Comments

Recent Posts

Recent Comments