We are seeing a rapid expansion of the use of metrics (quantitative methods) for assessing performance and directing behaviour of states and public institutions. This is not a new idea but one that has many adverse effects. Use of metrics can (1) encroach on matters (such as human values) not properly assessable by quantitative means, and can (2) be misused even in traditionally quantitative matters (such as accounting fraud). Thus metrics can be a source of growing disinformation, can facilitate deception and can undermine democratic processes.
Much has been written about use, overuse and misuse of metrics. There are books and articles about impacts on education, policing and other public services in the US, UK and Canada. But the trend continues, devouring resources and damaging the public interest.
Quantitative and qualitative analyses can of course be combined in appropriate contexts. Gunnar Myrdal’s study of race relations in the US was an exemplary instance. The effects of American racism went far beyond economic, demographic or other appropriately quantifiable matters. Myrdal’s study included essentially qualitative matters, such as moral issues, justice and social values. 
Among disciplines prominently involved in metric assessments are accounting, finance, economics and statistics. They may overlap in some applications but statistics is often prominent in quantitative analyses, despite being of more recent origins. Salomon Bochner wrote: “Statistics founded on probability is perhaps the most exclusive characteristic of our civilization since 1600; and it would be difficult to find even a trace of it anywhere before.” The two mathematical disciplines of probability and statistics emerged in the seventeenth century from diverse motivations such as games of chance, demography and logic, and developed in mutually reinforcing ways.
The resulting combination, mathematical statistics was developed as a method of inference from quantitative data, and the number and variety of its successes in the natural sciences, social sciences, technology and public policy continue to increase rapidly. Nevertheless, there have been scandals, tragedies and crises associated with misuse of statistics and other quantitative disciplines.
The global crisis of democracy has been developing since the late 1970s along with the ascendency of neoliberalism. In the early 1990s Alain Desrosières identified a central feature by clarifying why quantitative disciplines are not applicable to important aspects of the social sciences and public policy. Expanding this work, Alain Supiot observed: “In our attempts to transform every singular quality into a measurable quantity … belief in quantitative representations gradually supplants any contact with the realities to which these representations are supposed to refer.” Specifically:
Desrosières’s ground-breaking work has shown that economic and social statistics do not measure a pre-existing reality, unlike statistics in the natural sciences, but construct a new reality by positing equivalence between heterogeneous beings and forces.
Supiot elaborated: “Evaluation is not simply measurement, since it refers measurement to a value judgment which gives it meaning. There is necessarily a dogmatic dimension to how this meaning is defined, since our categories of thought are not a gift of nature, but rather a means of comprehending it.” Consequently, “If one treats systems of values as quantifiable objects, one will make the measuring instruments give false readings and claim a scientific objectivity for one’s system of values which it cannot possibly possess.”
Such evidentiary unreliability has major consequences, not least that “public policy is driven by quantitative targets rather than concrete results” thus undermining democratic decision-making processes and contributing to the democracy crisis. There is an ever-growing number and variety of instances, such as those cited here.
Even in properly quantitative matters, measuring instruments can be made to give misleading readings by the device of altering definitions in government policy. Whether for political or financial purposes, the results can contribute to creation of national crises, then international crises as in 2007-2008.
A second type of crisis concerns methodological issues intrinsic to statistical analysis. Because of the widespread and necessary use of mathematical statistics, the resulting problems are not confined to social sciences and public policy, but extend to fields such as biomedical sciences. This is a “crisis in science,” as Roger Peng and others explained. Its causes include: researcher, sponsor or publisher bias; faulty experimental design; simplistic use of data analysis methods; expert use of inadequate methods; or omission of significant subsets of the data from the analysis.
Any of these factors can be present in a particular study and generate disinformation, with or without intention to deceive. Of special note: “articles that report a positive new discovery have a great competitive advantage over articles with a negative result. Thus there is a strong selection bias in favour of positive results, even before a paper sees print.”
The methodological issues have a long history, but beginning around 2000 such difficulties have been compounded by technological advances that facilitate rapid collection and inexpensive storage of very large amounts of data (“big data”). Inadequacies and controversies pertaining to existing methods of statistical inference are thus amplified. Controversies often occur in intellectual disciplines, but the extent and depth of the current crisis in statistics can be seen from commentaries cited below.
Statistics aids inference from data by quantifying uncertainty, and there are different methods (“models”) for this. Prominent are those originating with Ronald Fisher (1920s), Jerzy Neyman and Egon Pearson (1930s), and Thomas Bayes (mid-1700s) with their subsequent developments. Scientific rivalry and controversy between Fisher and Neyman-Pearson began in the 1930s and continued with their respective supporters. The Bayes approach has been in and out of favour; for example, dismissed by Ian Hacking in 1965 but later successfully revived by others. The differences among these models with their advantages and disadvantages were explained by Ronald Christensen in a 2005 expository article, where he performed the relevant probability calculations in an elementary example.
In a 2015 Opinion article Jeffrey Leek and Roger Peng explained the methodological crisis has two aspects, “reproducibility” and “replicability.” The former refers to the ability of a researcher “to recompute data analytic results” from the data of a particular study, the latter to the prospects “an independent experiment targeting the same scientific question will produce a consistent result.” The first aspect underlies the second, so the article’s title, “Reproducible research can still be wrong” is apt.
This serves as a reminder of Desrosières’s warning that the use of “statistical reasoning as a mode of abstraction” in social or political questions is a technique that “has become virtually synonymous with proof, with almost uncontestable standards of reference.” As a result, “statistical tools have helped to fashion a ‘public sphere’” in which ostensibly open “collective debate” may be misinformed and thus policies developed on an unsound basis.
In November 2017 Nature published a Comment article on the methodological crisis by five statisticians (American, British and Dutch), titled “Five ways to fix statistics.” Jeffrey Leek opened his contribution with the observation: “the literature is now full of papers that use outdated statistics, misapply statistical tests and misinterpret results.”
The contributors to the 2017 Nature article addressed both aspects of the methodological crisis, commenting on causes and giving suggestions for relieving it. Several are repeated here to illustrate the challenges. Jeffrey Leek emphasized “data analysis is not purely computational and algorithmic—it is a human behaviour.” He suggested better training could “help researchers to explore data without introducing bias,” while better explanations could help the public understand the implications and limitations of a research finding.
Other contributors discussed the widespread simplistic use of the Fisher model, wherein a single number, the “P value” is produced in order to decide between two outcomes, for example, in a clinical trial whether a drug has “an effect” or “no effect.” Although the Fisher approach “was supposed to protect researchers from over-interpreting noisy data,” it now “has the opposite effect” so that “in practice, this often amounts to uncertainty laundering.” Thus “researchers must accept uncertainty” so as to “move beyond the alchemy of binary statements” that are based on the P value.
They also noted one way to reduce the risk of “false-positive” findings would be to augment the Fisher model with Bayesian techniques, and another would be to ensure “openness” by means of a publicly accessible register of study plans and data so as to facilitate independent review. An additional problem is that scientists who use statistics in other fields are reluctant to allow statisticians to teach students anything other than “standard methods.” Thus “norms of practice must be changed from within” the relevant fields.
In a January 2018 survey article on this crisis, Susan Holmes reviewed the issues in more detail, giving technical discussions and providing worked examples. She made several recommendations to address challenges, some of them similar to those in the 2017 Nature article but in more specific terms. For instance, noting “there is more to the data than P values,” she outlined ways in which “more of the available data” can be used to supplement the standard Fisher approach. She also illustrated how “a clear Bayesian formulation” can enable “clarification of the correct procedures” for an inference problem.
Holmes went beyond the Nature article in proposing a more rigorous approach to statistical research and publication. Noting that “mathematicians have set an example and a publication record where transparency is key and theorems are proved,” she wrote “mathematicians show their work and so should statisticians,” to the extent feasible. Thus in statistics, “publishing code, data, and complete workflows should be standard practice.”
Holmes closed by noting her proposed approaches addressed “errors that are made in good faith” and cannot work in the presence of “intent to dupe.” She added “No statistical procedure can counter data manipulations done in bad faith.” On this basis she concluded “true scientific replication by independent groups and funded by science foundations and institutes should find their place in mainstream publication.” In the words of Onora O’Neill, this would assist in “limiting deception.”
In a 2015 commentary, Roger Peng illustrated the “epidemic of poor data analysis” with examples of studies re-analyzed and exposed by other investigators after publication, one in biomedical science and the other in economics. In each case, the study authors made significant claims and thus attracted attention. In each, the independent investigators had access to the study data and attempted to reproduce the published analysis, but in the course of doing so discovered the study authors committed significant errors. Thus in each case the study findings had no basis in the data, hence not replicable.
In the first case, a research group at Duke University led by Anil Potti claimed in a 2006 Nature Medicine article “that they had built an algorithm using genomic microarray data that predicted which cancer patients would respond to chemotherapy” (Peng’s summary). Two statisticians at the University of Texas conducted analyses from the original data and in 2009 published a description of the errors. In 2011 the article by Potti and his group was retracted from the journal, and in 2015 the US Office of Research Integrity issued a finding of research misconduct against Potti.
In the second case, Kenneth Rogoff (Harvard University) and Carmen Reinhart (then at University of Maryland, now at Harvard) published a 2011 paper (with National Science Foundation financial support), suggesting “countries with very high debt—GDP ratios suffer from low growth” (Peng’s summary). Such “findings” in their paper were widely cited by politicians and others in the US, UK and Europe to support austerity agendas: the authors’ “most influential claim was that rising levels of government debt are associated with much weaker rates of economic growth, indeed negative ones.” Three economists at the University of Massachusetts Amherst attempted to reproduce the analysis by Rogoff and Reinhart but found “serious errors” and concluded these “contradict” their central claim.
Governance by Numbers
During the past decade Supiot developed a framework for understanding and contending with the crisis of democracy, using law and justice as his lens. His work illuminated historical trends in culture, politics, law, economics, accounting and statistics underlying current transformations in the way states are governed and international order controlled. Among causes of democracy’s crisis is ever-growing prestige of and reliance on quantitative bases for decisions. Among quantification instruments statistics is “the most important.” Supiot emphasized that quantitative methods “give immense power to those who construct the figures, because this is conceived as a technical exercise which need not be exposed to open debate.”
The lack of open debate is due in part to weak government regulation of corporate conduct and products in fields such as finance, banking and manufacturing, and to insufficiently rigorous professional standards in disciplines such as statistics, economics and accounting—common phenomena in the neoliberal era. Changes in definitions of government statistical categories also weaken the basis for open debate. Supiot mentioned the Enron accounting scandal (2001) and the Libor banking scandal (2015) as two such instances. My previous CFE post discussed instances in pharmaceutical and other chemical manufacturing with their related health and environmental risks. Issues regarding professional standards in mathematical statistics are discussed above and, as noted, these influence standards in other disciplines.
In her 2003 talk on US accounting and finance scandals, Mary Poovey opened with the question “Can Numbers Ensure Honesty?” Observing that for much of the public, “numbers carry an aura of precision simply because they are numbers,” she explained the answer through discussion of a series of detailed examples: numbers cannot by themselves ensure honesty.
In addition, proliferating resort to private consulting firms by governments and public institutions such as universities increasingly distorts public priorities. Private consultants are in a position to construct the figures in order to suit the agendas of the administrators or managers hiring them—agendas that may not be in the public interest. For example, in describing a 2017 accounting scandal involving an international consulting firm, Eric Reguly asked “Who’s auditing the auditors?” and wrote “there is no doubt that KPMG South Africa was too close to the hands that fed them.”
Central to Supiot’s framework is the distinction he drew between government and governance, with the latter taking the specific form of “governance by numbers” and now increasingly supplanting the former. The resulting adverse redirection of public policy is mentioned above. Supiot’s explanations of the two terms and their operational differences are briefly summarized here.
“Government implies a commanding position above those governed, and the obligation for individual freedoms to observe certain limits.” Canada is an example: a democratically elected government, laws and constitutional protections for rights and freedoms, and specification of limits. In contrast, “Governance starts out from individual freedoms, not to limit them but rather to program them.” Supiot means that the human individual becomes internally programmed by ideology in a manner analogous to the biological regulatory system of an organism, or the computer program controlling a machine.
For people the internal program is the now widely inculcated neoliberal ideology. In this system: quantification is paramount; human values are reflexively subordinated to economic “utility”; “economic policy is not the outcome of political debate, but of scientific calculation;” and “laws figure merely as legislative products which compete in a global market of norms.” The move from government by laws to governance by numbers is reflected in common discourse where, for example, “freedom” is replaced by “flexibility,” “justice” by “efficiency,” and “worker” by “human capital.”
A subsequent post will discuss additional effects and implications of governance by numbers. It will include Supiot’s contention that allegiance to the state is being supplanted by feudal-like allegiances, his exhortation that social solidarity be revived, and a summary of other writers’ perspectives.
I thank Charlene Mayes, Vladimir Tasić and the late Donald C. Savage for discussions and references.
 The Concise Oxford Dictionary (1990) says the word ‘value’ is derived ultimately from the Latin word “valēre” the meanings of which include: be strong, be healthy, prevail (http://www.latin-dictionary.net/definition/38320/valeo-valere-valui-valitus )
 Nicholas Eberstadt, The Tyranny of Numbers: Mismeasurement and Misrule (Washington: American Enterprise Institute,1995); Theodore M. Porter, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton: Princeton University Press, 1995); David Boyle, The Tyranny of Numbers (London: Flamingo (Harper Collins, 2001); William Bruneau and Donald C. Savage, Counting Out the Scholars (Toronto: Lorimer, 2002); Jerry Z. Muller, The Tyranny of Metrics (Princeton: Princeton University Press, 2018)
 Gunnar Myrdal, An American Dilemma: The Negro Problem and Modern Democracy (New York: Harper & Row, 1962 ).
 Salomon Bochner, The Role of Mathematics in the Rise of the Sciences (Princeton: Princeton University Press, 1966), 352
 Sheldon S. Wolin, Democracy Incorporated: Managed Democracy and the Specter of Inverted Totalitarianism (Princeton: Princeton University Press, 2010 ); Philip Mirowski, Never Let a Serious Crisis Go to Waste: How Neoliberalism Survived the Financial Meltdown (New York: Verso, 2013); Wolfgang Streeck, Buying Time: The Delayed Crisis of Democratic Capitalism (New York: Verso, 2014); Wendy Brown, Undoing the Demos: Neoliberalism’s Stealth Revolution (Brooklyn: Zone Books, 2015)
 Alain Desrosières, The Politics of Large Numbers: A History of Statistical Reasoning (Cambridge, MA: Harvard University Press, 1998 [published in French in 1993]), 323-337
 Alain Supiot, The Spirit of Philadelphia: Social Justice and the Total Market (New York: Verso, 2012 [first published in French, 2010), 61
 Supiot (2012), Ibid.
 Supiot (2012), Ibid. 62, 64
 Supiot (2012), Ibid. 63
 Ian Hacking, Logic of Statistical Inference (Cambridge: Cambridge University Press, 2009 ), Chapter XII
 Desrosières, op. cit., 324
 In the context of a drug study, the P value is a number computed from the data and represents the probability that these data could occur under the assumption the “null hypothesis” is true (i.e., the hypothesis that drug has no effect). If P is smaller than a pre-assigned number for the particular study, then the data are considered to be “significant” and the null hypothesis “rejected as improbable.” The resulting inference is that the drug under study probably has an effect. (http://www.ams.org/journals/bull/2018-55-01/S0273-0979-2017-01597-2/S0273-0979-2017-01597-2.pdf , page 32)
 Alain Supiot, Governance by Numbers: The Making of a Legal Model of Allegiance (Oxford: Hart Publishing, 2017 [first published in French, 2015]), 78, 163
 Supiot (2017), op. cit., 29, 116; see also Supiot (2012), ), op. cit., 57, 58
 Supiot (2017), op. cit., 109, 121, 29