“47.6% of statistics are made up on the spot.”
Even if they are nothing more than a bunch of numbers, data sets are supposed to be something sacred. They are at the heart of scientific progress: Nicolaus Copernicus’ observation of the heavens helped substantiate his heliocentric theory; clinical trials determine the viability of a potential cure or treatment; and large data sets on individuals can help pinpoint correlations between variables like wages and education. Every major study and every finding is backed by data, which represent new and future truths about the subject in question.
Data sets bear an important role in discovery, and being able to collect, analyze and interpret data properly is a necessity. In data collection, having a representative sample is paramount. In data analysis, awareness of things like omitted variable bias or outliers is key. In interpretation, it is always necessary to be wary of alternative explanations. And indeed, there is a very strict protocol to this long, long process of data excavation, preparation, and interpretation. Things can get ugly fast when the protocol is breached. By presenting misleading or faulty data, one can advocate for unsubstantiated claims. Combined with the power of social media, which can spread misinformation like a deadly virus, these mistakes can yield irreparable damage.
Obviously data must be legitimate. This requirement is stressed often in experimental design. My school teachers always warned us not to “fudge the data” just because the measurements we make didn’t correspond with the theory. They told us that even the best experiments have some sort of error, but that does not stop us from making inferences. Yet it may be tempting – and certainly easier – to just make stuff up. As a student, academic, or professional researcher data collecting isn’t always a priority. One of my best friends faked his data in a psychology paper (because he waited until the last minute to complete the project) and he received a better mark than I did.
This leads us to the question of truth. If done well, fake data can be presented and seen as real data. The psychology professor didn’t spend much time verifying our data, but one imagines that researchers, unlike students, are held more accountable, considering the funding that goes into providing and collecting the data.
The issues of truth and ethics get harder when data manipulation is deliberate or systematic. For the past decade, economists have noticed a consistent discord with government data and private estimates. In 2013, Argentina became the first country to be censured by the International Monetary Fund (IMF) for showing inaccurate data. Between the 2004 and 2015 various statistics were misreported, such as inflation, GDP growth, and currency valuation. Consequently, the new government revised its data. A report in Bloomberg noted that, at times, the numbers were staggeringly different, such as in 2009 when the economy contracted by 6%, in contrast to a previously reported 0.1% growth. At the same time, the government reported a 70% inflation from 2007 to 2012, and private companies reported an inflation of 200%.
These inaccuracies, though, have serious consequences for consumers and investors. The unreliability of statistics induces uncertainty into the economy, increasing the variance of return of investment, for example. Interest rates then increase because lenders have to charge a premium — given a borrower might default — and this hurts investors even more. Consequently, output decreases and affects real wages adversely. This cycle, exacerbated by the faulty data, seriously puts Argentina’s long-term potential growth in question. Now, Argentina’s new government — with help from the IMF — is working on providing reliable data to build trust in the economy again.
Yet even when numbers are not fabricated, we need to be careful that we understand what they mean. For instance, when we talk about the Gross Domestic Product, we need to be aware that the figure represents the output of the entire country; for increased accuracy and transparency, it is important to factor in a measure that considers the population size. A plausible measure would be GDP per capita. For example, according to the CIA Factbook, Luxembourg’s GDP is about 58 billion, which is ranked 107th in the world. However, their GDP per capita (in PPP) is $102 000, making them the second wealthiest country on the list in per capita terms. The latter number better reflects the reality in Luxembourg, but e aware that the per capita measure is an average of the population. This means that a handful of rich people will skew the data, making poorer people appear better off than they really are. The United States boasts 540 billionaires according to Forbes, and this figure will certainly affect USA’s impressive position as the 18th wealthiest country in terms of GDP per capita.
This idea may seem obvious to some, but the fact that data has so much to hide is very subtle and massively under-disclosed on a daily basis. For example, during the transition of power in the recent US election, one of Trump’s major talking points was Barack Obama’s “disastrous” effect on unemployment. According to CNN, the United States had a 4.9% unemployment rate for last October, which was below the target threshold. Normatively, this number is very strong but, as always, we should be wary of the statistic. The Bureau of Labour Statistics, which publishes these data, rigorously defines who counts as employed, unemployed, and not in the labour force. The unemployment rate is the rate of the employed among the labour force. However, a person needs to be looking for a job to be considered a part of the labour force. If someone were not employed, but hasn’t looked for a job in the past few months, he or she would be removed from the labour force and omitted in the calculation. As a result, other measures such as labour force size, employment-population ratio, or the labour force participation rate offer important comparisons. And even then, we should consider the quality of the jobs to see if the workers are well off, especially where underemployment (where a worker is technically employed, but with either very low wages or extremely part-time work) is concerned.
In short, even though most of the time the numbers are all there, interpretations of those numbers can tell a different story, which calls into question whether there exists a concrete, definable truth in data. Of course, this brings up the very delicate, endlessly debated philosophical questions surrounding “truth” but, at the end of the day, statistics and data are meant to find relationships or trends, nothing more.