People's Democracy(Weekly Organ of the Communist Party of India (Marxist) |
Vol.
XXVIII
No. 22 May 30, 2004 |
The Fiasco Of Opinion Polls:
Bad
Psephology Or Bad Psephologists?
THE
elections to the 14th Lok Sabha have been disastrous not only for the BJP led
NDA but also to the pollsters. The huge number of opinion polls and exit polls
had all predicted that NDA would be well ahead of the Congress alliance, even if
some of the cagier ones had hedged their bets and predicted a hung parliament.
The question arises how could all of them go so dismally wrong? Is it that
opinion polls and exit polls are very much like astrology? Incidentally, the
astrologers fared no better: they had obviously been deeply influenced by the
opinion polls! With the fiasco of poll predictions, it is worthwhile examining
whether there is a science to such polling. Is phsephology --- defined as scientific
analysis of political elections and polls
--- itself dubious or do we have
just bad pollsters?
In
retrospect, what was amazing about the opinion polls was that while they did
predict that NDA would do very badly in Tamil Nadu and certainly worse than last
time in Andhra, yet they were willing to predict a gain for NDA varying between
30 to 50 seats. This means that they were predicting that NDA would not only
compensate for 30-50 seats it would lose in these two states but also gain
another 50 odd seats --- a gain of about 100 seats --- from rest of the country.
Looking at the poll arithmetic, these additional 100 seats was always a tall
order. So it was not only that the pollsters went wrong in their predictions,
they were willing to suspend all critical faculties in their predictions. I can
understand that they failed to predict the extent of the NDA defeat; what is not
explicable is their prediction of 300+ seats for the NDA, which they all did in
one poll or another.
What
are opinion polls and how do they compare to the real thing? Opinion polls are
based on the belief that if we sample a population “scientifically”, then
the results of the sample would tell us something about the population. Most
data collection in real life is done using sampling methods, as it is too costly
in terms of time and money to do an exhaustive survey. But not all. For
instance, the Census performed every 10 years does not use sampling but collects
data for the entire population.
KEY
ISSUES
The
two key issues in using sampling as a method to tell us properties of the entire
population is how random is the sample and its size. Obviously, if we collect
data from say only our neighbours; it is not a random sample. We may be staying
in a locality where people are of one community, have the same income levels and
may be influenced by purely local factors. Extrapolating this over the nation
would “bias” the results, it might tell us how the neighbourhood would vote
or at best how the people belonging to this income level would vote, but not the
country as a whole. The randomness of sampling is therefore crucial to the
correctness of results.
The
second issue is how large the sample size should be? If the population is highly
homogenous --- they have the same variation anywhere in the country (or any
group in the population) --- a small sample would still tell us fairly
accurately how the population would vote. The problem occurs when there are
different population groups that behave quite differently. The sample size then
has to be bigger to catch all the variations occurring in the population.
In
case of poll predictions, it is possible to have random samples and even large
sample sizes and yet go wrong. Even if we sample the population correctly, but
some sections of the population vote in smaller numbers than others, then also
the sample becomes biased. It might be a representative sample of the population
but not of the actual voting population.
An
example will make this clear. If we have a population of 100 with 20 of them
from the middle and upper classes, then a sample size of 25 should have 5 from
the middle and upper classes and 20 from the rest to make the sample
representative of the population. However, if only 2 of the 20 actually vote
while for the rest it is 64, then the sample of 5 represents 250 per cent of the
actual voting figure while the sample of 20 represents still 31.25 per cent of
the population. Of course in real life we do not use such large samples. The
conclusion however still remains valid: that if a group votes in much smaller
numbers than its strength in the population, sampling based on its strength in
the population would bias the results.
So
the first question that we need to address is whether the problem with the
pollsters was a common fault in the methodology used, or their own subjective
desire for a NDA victory that coloured their predictions? Or was it that the
opinion and exit polls were fixed? It is difficult to believe that all the 35-40
polls were fixed to and their predictions rigged by the NDA. Undoubtedly some of
the pollsters were making predictions to cosy up to the party that they thought
was winning any way. But to accept that all the pollsters were fixed would
stretch credulity. So where did they all go wrong?
ASSUMPTIONS
OF
Before
we go into analysis of the possible errors, we will have to make some
assumptions. Here assumptions are borne out by the exit polls, even though the
exit poll samples themselves are biased as can be seen from their results. What
we are using here are the trends that the exit polls show in order to draw some
conclusions.
The
first assumption is that different segments of population have voted
differently: the rich and upper middle class were more inclined to vote for NDA
while the poor and the underclasses have voted against the NDA. It is not a
rural versus urban divide: NDA has done badly in most urban areas, but a rich
versus poor divide. The second assumption, again borne out by past data, is that
while the poor vote in large numbers in India, the middle and upper classes are
voting less and less in numbers: their votes cast have been declining in each
successive election. It is too much of an effort for them to go out and cast
votes; it is much easier to talk about it. In the upper middle class and richer
colonies in Delhi, the polling figures are now less than 10 per cent. The third
is that different areas in the country showed different swings: there were large
variations from region to region.
SYSTEMIC
BIAS
The
most obvious flaw in why a survey gives wrong results is that the sample is not
representative of the population. This undoubtedly holds good for pollsters in
this election also. The agencies such as Marg and Nielson probably sampled the
middle and upper middle classes far more heavily than the poor. If the initial
sample was bad, this was compounded even more as the rich vote considerably less
than their numbers in the population: if their samples were biased to start
with, they were even more so when we take into account the actual voting
population and not just the population.
The
second reason --- the sample size --- is also an issue. Traditionally, pollsters
fare better if there are a few parties with relatively even distribution of
strength. In India, there are
strong regional parties and a number of smaller parties that have strengths in
pockets. To compensate for this, the sample size needs to be much bigger than
what the pollsters are prepared to spend. If we know in advance where the
smaller parties have pockets of influence, perhaps the sampling can be improved
to reduce the sample size. But this presupposes stable voting patterns, which
are unlikely in any election. Further, if parties such as the BSP and SP put up
candidates in new areas, the pollsters may miss completely their impact on the
votes. Poll predictions always tend to underestimate the strength of smaller
parties: they are generally predicted to get a lower number of seats than they
actually get. It is the smaller sample size that leads to an underestimation of
the strength of the smaller parties.
Generally,
statistical mistakes tend to cancel each other out. When all pollsters err only
in one direction, then a statistical error is not the culprit. It is the
systematic bias of the survey that is at fault. We have identified one such
systemic fault: the over representation of the rich and the upper middle class
and the under representation of the poor. But this is not enough to explain the
huge difference in seats that were predicted and finally secured by the NDA. So
apart from the systemic bias in the survey, what else went wrong?
SUBJECTIVE
All
pollsters convert sampled vote shares and swing into seats using various
subjective factors. Some will call it the factor of lying: in Gujarat, the
Muslims will be unlikely to say they are voting against the BJP; or the Dalits
would quite often hide their real choice. The ability to use the right
subjective factors: how to weigh the sample by past experience and understanding
of the elections, is the key parameter not talked about but at the heart of good
poll predictions. The good ones just use better guesstimates than the bad ones,
who mechanically translate samples into votes. It is here that the pollsters
went completely haywire. Coming from the segment, which was shining, even if
rest of India was not, they translated their prejudices into seats. While they
perhaps did try and correct this later, the initial high NDA predictions made it
difficult to effect a drastic change in their prediction of final results. The
net result was egg on their collective faces. Even worse, they had nowhere to
hide: they were in all the 24*7 channels explaining why they were wrong as the
results came streaming in. If the NDA leaders had glum faces, with their shine
having come off, the media was not shining either.
Our analysis above is of course based purely on conjectures about the kind of mistakes the pollsters might have made. While it is difficult to quantify the impact that opinion and exit polls have on the elections, there is little doubt that they have an impact. Even if such polls are not banned, there is a need to examine the basis of such predictions. Given the massive failure that we have witnessed, there is a legitimate ground for demanding that all the pollsters submit their raw data and methodology for an independent examination. Was it just mistakes or were more sinister forces at work? The Election Commission should take steps in this direction; otherwise shoddy pollsters can vitiate future elections.