People's Democracy

(Weekly Organ of the Communist Party of India (Marxist)


Vol. XXVIII

No. 22

May 30, 2004

        The Fiasco Of Opinion Polls: 

Bad Psephology Or Bad Psephologists?

Prabir Purkayastha

 

THE elections to the 14th Lok Sabha have been disastrous not only for the BJP led NDA but also to the pollsters. The huge number of opinion polls and exit polls had all predicted that NDA would be well ahead of the Congress alliance, even if some of the cagier ones had hedged their bets and predicted a hung parliament. The question arises how could all of them go so dismally wrong? Is it that opinion polls and exit polls are very much like astrology? Incidentally, the astrologers fared no better: they had obviously been deeply influenced by the opinion polls! With the fiasco of poll predictions, it is worthwhile examining whether there is a science to such polling. Is phsephology --- defined as scientific analysis of political elections and polls --- itself dubious or do we have just bad pollsters?

 

BAD PREDICTIONS

In retrospect, what was amazing about the opinion polls was that while they did predict that NDA would do very badly in Tamil Nadu and certainly worse than last time in Andhra, yet they were willing to predict a gain for NDA varying between 30 to 50 seats. This means that they were predicting that NDA would not only compensate for 30-50 seats it would lose in these two states but also gain another 50 odd seats --- a gain of about 100 seats --- from rest of the country. Looking at the poll arithmetic, these additional 100 seats was always a tall order. So it was not only that the pollsters went wrong in their predictions, they were willing to suspend all critical faculties in their predictions. I can understand that they failed to predict the extent of the NDA defeat; what is not explicable is their prediction of 300+ seats for the NDA, which they all did in one poll or another.

 

What are opinion polls and how do they compare to the real thing? Opinion polls are based on the belief that if we sample a population “scientifically”, then the results of the sample would tell us something about the population. Most data collection in real life is done using sampling methods, as it is too costly in terms of time and money to do an exhaustive survey. But not all. For instance, the Census performed every 10 years does not use sampling but collects data for the entire population.

 

KEY ISSUES IN SAMPLING METHOD

 

The two key issues in using sampling as a method to tell us properties of the entire population is how random is the sample and its size. Obviously, if we collect data from say only our neighbours; it is not a random sample. We may be staying in a locality where people are of one community, have the same income levels and may be influenced by purely local factors. Extrapolating this over the nation would “bias” the results, it might tell us how the neighbourhood would vote or at best how the people belonging to this income level would vote, but not the country as a whole. The randomness of sampling is therefore crucial to the correctness of results.

 

The second issue is how large the sample size should be? If the population is highly homogenous --- they have the same variation anywhere in the country (or any group in the population) --- a small sample would still tell us fairly accurately how the population would vote. The problem occurs when there are different population groups that behave quite differently. The sample size then has to be bigger to catch all the variations occurring in the population.

 

In case of poll predictions, it is possible to have random samples and even large sample sizes and yet go wrong. Even if we sample the population correctly, but some sections of the population vote in smaller numbers than others, then also the sample becomes biased. It might be a representative sample of the population but not of the actual voting population. 

 

An example will make this clear. If we have a population of 100 with 20 of them from the middle and upper classes, then a sample size of 25 should have 5 from the middle and upper classes and 20 from the rest to make the sample representative of the population. However, if only 2 of the 20 actually vote while for the rest it is 64, then the sample of 5 represents 250 per cent of the actual voting figure while the sample of 20 represents still 31.25 per cent of the population. Of course in real life we do not use such large samples. The conclusion however still remains valid: that if a group votes in much smaller numbers than its strength in the population, sampling based on its strength in the population would bias the results.

 

So the first question that we need to address is whether the problem with the pollsters was a common fault in the methodology used, or their own subjective desire for a NDA victory that coloured their predictions? Or was it that the opinion and exit polls were fixed? It is difficult to believe that all the 35-40 polls were fixed to and their predictions rigged by the NDA. Undoubtedly some of the pollsters were making predictions to cosy up to the party that they thought was winning any way. But to accept that all the pollsters were fixed would stretch credulity. So where did they all go wrong?

 

ASSUMPTIONS OF THE EXIT POLLS

Before we go into analysis of the possible errors, we will have to make some assumptions. Here assumptions are borne out by the exit polls, even though the exit poll samples themselves are biased as can be seen from their results. What we are using here are the trends that the exit polls show in order to draw some conclusions.

 

The first assumption is that different segments of population have voted differently: the rich and upper middle class were more inclined to vote for NDA while the poor and the underclasses have voted against the NDA. It is not a rural versus urban divide: NDA has done badly in most urban areas, but a rich versus poor divide. The second assumption, again borne out by past data, is that while the poor vote in large numbers in India, the middle and upper classes are voting less and less in numbers: their votes cast have been declining in each successive election. It is too much of an effort for them to go out and cast votes; it is much easier to talk about it. In the upper middle class and richer colonies in Delhi, the polling figures are now less than 10 per cent. The third is that different areas in the country showed different swings: there were large variations from region to region.

 

SYSTEMIC BIAS IN THE SURVEY

The most obvious flaw in why a survey gives wrong results is that the sample is not representative of the population. This undoubtedly holds good for pollsters in this election also. The agencies such as Marg and Nielson probably sampled the middle and upper middle classes far more heavily than the poor. If the initial sample was bad, this was compounded even more as the rich vote considerably less than their numbers in the population: if their samples were biased to start with, they were even more so when we take into account the actual voting population and not just the population.

 

The second reason --- the sample size --- is also an issue. Traditionally, pollsters fare better if there are a few parties with relatively even distribution of strength.  In India, there are strong regional parties and a number of smaller parties that have strengths in pockets. To compensate for this, the sample size needs to be much bigger than what the pollsters are prepared to spend. If we know in advance where the smaller parties have pockets of influence, perhaps the sampling can be improved to reduce the sample size. But this presupposes stable voting patterns, which are unlikely in any election. Further, if parties such as the BSP and SP put up candidates in new areas, the pollsters may miss completely their impact on the votes. Poll predictions always tend to underestimate the strength of smaller parties: they are generally predicted to get a lower number of seats than they actually get. It is the smaller sample size that leads to an underestimation of the strength of the smaller parties.

 

Generally, statistical mistakes tend to cancel each other out. When all pollsters err only in one direction, then a statistical error is not the culprit. It is the systematic bias of the survey that is at fault. We have identified one such systemic fault: the over representation of the rich and the upper middle class and the under representation of the poor. But this is not enough to explain the huge difference in seats that were predicted and finally secured by the NDA. So apart from the systemic bias in the survey, what else went wrong?

 

SUBJECTIVE FACTORS

All pollsters convert sampled vote shares and swing into seats using various subjective factors. Some will call it the factor of lying: in Gujarat, the Muslims will be unlikely to say they are voting against the BJP; or the Dalits would quite often hide their real choice. The ability to use the right subjective factors: how to weigh the sample by past experience and understanding of the elections, is the key parameter not talked about but at the heart of good poll predictions. The good ones just use better guesstimates than the bad ones, who mechanically translate samples into votes. It is here that the pollsters went completely haywire. Coming from the segment, which was shining, even if rest of India was not, they translated their prejudices into seats. While they perhaps did try and correct this later, the initial high NDA predictions made it difficult to effect a drastic change in their prediction of final results. The net result was egg on their collective faces. Even worse, they had nowhere to hide: they were in all the 24*7 channels explaining why they were wrong as the results came streaming in. If the NDA leaders had glum faces, with their shine having come off, the media was not shining either.

 

Our analysis above is of course based purely on conjectures about the kind of mistakes the pollsters might have made. While it is difficult to quantify the impact that opinion and exit polls have on the elections, there is little doubt that they have an impact. Even if such polls are not banned, there is a need to examine the basis of such predictions. Given the massive failure that we have witnessed, there is a legitimate ground for demanding that all the pollsters submit their raw data and methodology for an independent examination. Was it just mistakes or were more sinister forces at work? The Election Commission should take steps in this direction; otherwise shoddy pollsters can vitiate future elections.