[ABSTRACT FOR A PAPER IN PREPARATION: comments welcome...]
In this paper I look at data science through the historical lens of phrenology. I take data science seriously in it's claim to be a science, and examine its parallels with the methodological and social trajectories of phrenology as a scientific discourse. My aim is not to dismiss data science as pseudo-science but to explore the interplay of empirical and social factors in both phrenology and data science, as ways of making meaning about the world. By staying close to the practical techniques at the same time as reading them within their historical contexts, I attempt some grounded speculations about the political choices facing data science & machine learning.
In contrast to the philosophy and anatomy of the early nineteenth century, phrenology offered a plausible account of the connection between the mind and the brain by asserting that 'the brain is the organ of the mind'. Phrenologists believed that the brain is made up of a number of separate organs, each related to a distinct mental function, and the size of each organ is a measure of the power of its associated faculty. There were understood to be thirty seven faculties including Amativeness, Philoprogenitiveness, Veneration and Wit. The operations of phrenology were based on assessing the correlation between the topology of the skull and the underlying faculties, whose influence corresponded to size and therefore the specific shape of the head. It was used as a predictive empirical tool, for example to assist in the choice of servant.
The data science that is emerging in the second decade of the twenty-first century offers a plausible connection between the flood of big data and models that can say something meaningful about the world. The most widely used methods in data science can be grouped under the broad label of machine learning. In machine learning, algorithmic clustering and correlation are used to find patterns in the data that are 'interesting' in that they are both novel and potentially useful . This discovery of a functional fit to existing data, involving an arbitrary number of variables, enables the predictive work that data science is doing in the world. While data mining was originally used to predict patterns of supermarket purchases, the potential to pre-empt risk factors is leading to the wide application of data science across areas such as health, social policy and anti-terrorism.
The newly developed technique of phrenology was most actively studied in Britain in the years 1810-1840. One of the factors that made it popular was the accessibility of the method to non-experts. For leading exponents such as George Combe it was a key principle that people were able to learn the methods and test them in practice: 'observe nature for yourselves, and prove by your own repeated observations the truth or falsehood of phrenology'. Some historians, such as Steven Shapin, have interpreted British phrenology as a social challenge to the the elitist control of knowledge generation, with a corresponding commitment to broadening the base of participation . Shapin saw this as evidence that social factors as well as intrinsic intellectual factors help explain the work done by early phrenology, which 'enabled the participation in scientific culture of previously excluded social groups'.
A stronghold of historical phrenology in Britain was Edinburgh, where it was strongly associated with a social reformist agenda. Phrenologists there believed that the assessment of character from the shape of the skull was not the final word but a starting point for self and social improvement, because 'environmental influences could be 'brought to bear to stir one faculty into greater activity or offset the undesirable hyper-development of another. Not just the size but the tone of the organ was responsible for the degree to which its possessor manifested that behaviour' . Advocates of phrenology such as Mackenzie asserted that 'until mental philosophy improves, society will not improve' and many felt that their science should influence policies on broad social issues such as penal reform and the education of the working classes.
As it stands now, data science is a highly specialised activity restricted to a narrow group of participants. The fact that data science is seen as a strategic expertise, combined with the small number of trained practitioners, has led to the demand far outstripping the supply of data scientists and its identification by the Harvard Business Review as 'the sexiest job of the 21st Century'. Most data scientists outside of academia are employed either by large corporations and financial institutions or by entrepreneurial start-ups. In terms of its social and cultural positioning, data science as we know it is a hegemonic activity.
Using the predictions of data science to drive pre-emptive interventions is also seen as having a social role. However, the form of these social interventions is shaped by the actors who are in a position to deploy data science. The characterisation of data science as a tool of the powerful derives not only from the algorithmic determination of parole conditions or predictive policing, but from its embedding within a hegemonic world view. The forms of algorithmic regulation promoted by people like Tim O'Reilly have become algorithmic governance. Predictive filtering dovetails with the 'fast policy' of behavioural insight teams, as they craft policy changes to choice architecture of everyday life.
In the 1840s phrenology ran in to problems, with increasingly successful empirical challenges to its validity. In particular, critics questioned whether the external surface of skull faithfully represented the shape of the brain underneath. If not, as came to be accepted, phrenology could no longer claim a correspondence between observations of the skull and the faculties of the individual. Supporters continued to defend phrenology on the basis of its utility rather than using measurement as a criteria: 'we have often said that Phrenology is either the most practically useful of sciences or it is not true'. But by the mid 19th century both specific objections and the general advance of the scientific method left phrenology discredited.
Unfortunately, phrenology underwent a revival in the late C19th and early C20th as part of a broad set of ideas known as scientific racism. This field of activity used scientific techniques such as craniometry (volumetric measurements of the skull) to support a belief in racial superiority; 'proposing anthropologic typologies supporting the classification of human populations into physically discrete human races, that might be asserted to be superior or inferior'. It was used in justifying racism and other narratives of racial difference in the service of European colonialism; for example, during the 1930s Belgian colonial authorities in Rwanda used phrenology to explain the so-called superiority of Tutsis over Hutus.
In 1950 UNESCO statement on race formally denounced scientific racism, saying "For all practical social purposes 'race' is not so much a biological phenomenon as a social myth. The myth of 'race' has created an enormous amount of human and social damage." However, the concept of race has been re-mobilised inside genomics, one of the crucibles of data science. Rather than Human Genome Project closing the door on the idea of race having a biological foundation, as many had hoped, some studies suggest that 'racial population difference became a key variable in studying the existence and meaning of difference and variation at the genetic level'.
The jury is still out on the long term validity of data science as an empirical method of understanding the world. Certainly there is a growing critique, largely based on privacy and ethics but also on the substitution of correlation for causation and the over-arching idea that metrics can be a proxy for meaning. I have written elsewhere about the potential already immanent in algorithmic governance to produce multiple states of exception . However, my purpose here is a different one; to see the unfolding path of data science as propelled by both methodological and social factors and to use the completed trajectory of phrenology as a heuristic comparison.
Instead of being disheartened that, despite the bigness of data and the sophistication of machine learning algorithms, empirical activity is still imbricated with social values, we should recognise this as a continuing historical dynamic. This can be mobilised explicitly to offer a more hopeful future for data science and machine learning than one that derives only from the financial or governmental hegemony. Like the phrenologists of nineteenth century Edinburgh, we can choose to see in the methodologies of machine learning the opportunity to increase participation and social fairness. This can be imagined, for example, though the application of participatory action research to the process of data science. As Mackenzie wrote about phrenology "the most effectual method" (of error checking) was "to multiply, as far as possible, the number of those who can observe and judge". It is as yet a largely unexplored research question to ask how data science can be democratic, and how we can develop a machine learning for the people.
 Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques: concepts and techniques. Elsevier, 2011.
 Shapin, Steven. "Phrenological knowledge and the social structure of early nineteenth-century Edinburgh." Annals of Science 32.3 (1975): 219-243.
 Cantor, Geoffrey N. "The Edinburgh phrenology debate: 1803–1828." Annals of Science 32.3 (1975): 195-218.
 McQuillan, Dan. ‘Algorithmic States of Exception’. European Journal of Cultural Studies 18.4-5 (2015): 564–576. ecs.sagepub.com.