Women at the Table

Enabling the development of inclusive standards – Understanding the role of data and data analysis.

We are thrilled that Women at the Table’s CEO was selected to be the technical author for the British Standards Institution (BSI) and the UK Department for Business, Energy & Industrial Strategy (BEIS) Office for Product Safety & Standards and an extraordinary Advisory Group consisting of Ada Lovelace Institute, Association of Convenience Stores, Centre for Data Ethics & Innovation,  Consumer and Public Interest Network, Data 2X, United Nations Foundation. Engineering Design Centre, University of Cambridge, Kingston University, Market Research Society,  Open Data Institute,  Internet Institute, Oxford University, Prospect Union, Women’s Engineering Society.

Background

All standards are based on some form of data. The analogue and digital data we collect and/or use, whether in memory, numerical, text, images, audio, associated processing, or lived experience, the models we build from that data and the standards we make have a profound impact on the lives of individual people and groups. Our understanding of linkages and impacts between humans, data, code and systems and the role of data and data analysis, helps us to create more inclusive standards that better reflect our shared values.

The first steps in this understanding begin with questions. Who defines the problem that the data is intended to solve? Who decides what data to collect and/or use? Who collects the data and how? Who analyses and questions the data? Who uses the data and for what purposes? Indeed, data is an integral part of almost every step of modern processes. Data is embedded everywhere in the new economy, and is the analogue foundation for protocols and guidelines, as well as newer machine learning models ranging from automated decision-making to neural networks. However, the data upon which this new economy is based is not neutral, nor does it stand alone as “reality”. Much or most data is historical data which has the context of having been gathered on often small homogeneous subsets of the global population.

This historical data is incomplete, and socially and technically biased due to its incompleteness. Bias can occur either intentionally or unintentionally. Scarcity of useful representative data is a major issue, having been found to compromise the quality of health information available to women, for example, as well as the healthcare they receive. Critical healthcare provision and safety are made precarious by the consistent lack of representative data for minority group members.

Older, unrepresentative data forms the basis of many guidelines that continue to drive data decisions, from metabolic rate, airplane cockpit design safety and radiology protocols to newer, larger, yet also unrepresentative benchmark data used in the accelerating world of algorithmic decision-making and artificial intelligence (AI). These decisions range from low- to high-risk applications with the ability to harm, from transport and trade, to finance, health, medicine and criminal justice. Data works at a velocity and scale in this century that touches on all corners of modern life. Part of AI’s utility is in the machine’s ability to perceive patterns in data, and then derive a set of rules from those patterns. However if the original data has excluded all but a small generally homogenous (e.g. young, white, male, educated, heterosexual) dataset, the AI perceives that data to be the only reality that exists (because it is the only reality the machine has been exposed to and trained on). This exclusion at scale has dangerous consequences for the entire population. Of great concern for high-risk applications, it should also be noted that low risk applications also can have profound social impact on the quality of life and well-being of individuals, beginning with forms of access to information from targeted advertisements to search algorithms to financial access. Data, even if “debiased”, behaves differently in different machine learning models, and identical machine learning models behave differently from one another in real world situations “in the wild”. Debiased data is also interpreted differently in different human decision-making processes. This is not exclusive to machines. Who or what does the debiasing influences the outcome of any such exercise.

A growing body of research has shown how exclusion of representative or inclusive data has had negative consequences for the development of hiring algorithms, university allocation, credit scoring, medical decision-making, bail decisions and facial recognition. Similarly, data can affect the standards we develop, from the selection, assessment and usage of incomplete data to create provisions and requirements, to the unquestioned usage of received wisdom, unchallenged assumptions and unconscious bias. AI technologies that build on such data used for standards are already widely used in real-world applications. Given the velocity of AI uptake and the risk of using non-inclusive, incomplete, discriminatory and unrepresentative datasets, there is a risk of bias being replicated at scale and embedded in the systems of our daily lives and governance. In addition, our notions of what is “representative” is itself evolving, such that data is always lagging behind our evolutions in understanding and application. The processes of standards bodies determines whether standards are developed and so the representative nature of these groups could determine whether a standard is even considered for development.

Good practices relating to data in the development and implementation of standards alone are not enough – they need to be embedded in broader good practice, including shared human judgement on how to achieve societal objectives.

Aims and objectives

With the use of inclusive data in standards development, there is an opportunity to not only mitigate, but correct for historic inequities buried in previous data. The opportunity, as new systems proliferate and are being designed, as new inclusive standards are created, is to revisit old assumptions and conceive of standards, datasets and data models with inclusion, efficacy and equity at the core, and to expand mono-notions of good practice to ways of making the data life cycle, including the presentation of data, and interpretation of data at the point of decision and its utility, more effective, inclusive and transformational, including the way we represent and share standards. 

BSI Flex 236 is intended to build the capability of standards developers to work with data with inclusion in mind, to understand the limitations of the data that they rely on and meaningfully apply that understanding to standards. This BSI Flex suggests a process to help standards makers leverage a series of critical questions and analyses aimed at understanding and interacting with data and the data life cycle. Understanding the role of data and the data life cycle is critical so that data and the standards we produce become representative and include communities traditionally excluded from benchmark datasets or disregarded during the standards development process. Representative datasets and consideration of a wider range of communities and individuals can help to minimize harm, result in more robust products and services and serve a more diverse population more inclusively.

 

Last modified: May 15, 2022

Comments are closed.