1
HAYNES, David. A View of the Interface between Ethics and Metadata. Brazilian Journal of Information
Science: Research trends. vol. 17, Dossiê: Transversalidade e Verticalidade na Ciência da Informação,
publicação contínua, 2023, e023053. DOI: 10.36311/1981-1640.2023.v17.e023053.
A VIEW OF THE INTERFACE BETWEEN
ETHICS AND METADATA
David Haynes (1)
(1) Edinburgh Napier University, Scotland, d.haynes@napier.ac.uk
Abstract
Despite its advantages in improving access to information, metadata has the potential to cause harm, if
used inappropriately. For instance, mass surveillance of phone calls can be used by intelligence
agencies to target individuals. Language use can reinforce prejudices; poor language control
undermines the efficiency of subject searches; and the opacity of discovery systems reduces the
effectiveness of retrieval systems. There are also concerns about who owns the intellectual property
associated with metadata creation. The talk concluded with a description of two proposed initiatives:
by ISKO to investigate ways of improving metadata use in information discovery systems; and the
recruitment for a fully-funded PhD studentship at Edinburgh Napier University to investigate the ethics
of metadata.
Keywords: Metadata; Ethics; Privacy; State Surveillance; Information Retrieval; Controlled
Languages; Information Discovery System.
1 Introduction
Talk given at the Transversalidade e Verticalidade na Ciência da Informação
symposium (Transversality and Verticality in Information Science) held at São Paulo State
University - UNESP, Marília city, 10-11 August 2023.
It is a great pleasure to be back in Marília for the second time. My first visit was in 2016
and it was a great opportunity to meet colleagues and engage in a dialogue about research,
particularly in Knowledge Organization. Thank you for inviting me to join you for this exciting
event. I have spent the last few days working with some amazing students as well as with
respected research colleagues exploring avenues for joint research and other academic
engagements.
2
HAYNES, David. A View of the Interface between Ethics and Metadata. Brazilian Journal of Information
Science: Research trends. vol. 17, Dossiê: Transversalidade e Verticalidade na Ciência da Informação,
publicação contínua, 2023, e023053. DOI: 10.36311/1981-1640.2023.v17.e023053.
The interface between metadata and ethics has been a recurrent theme in my
professional and academic life. There were several excellent papers at ISKO 2016 in Rio
(Nascimento et al., 2016), ISKO UK in 2019, and others since that have highlighted some of
the issues I wish to address today (Haynes, 2018; Haynes & Vernau, 2019). Today’s talk will
look at the interface between ethics and metadata by considering some specific scenarios. Some
ethical responses will be considered before pointing towards current and proposed future
research.
I’ll start by stating some of the problems associated with metadata.
2 Mass Surveillance
“We kill people based on metadata.
This quote from the former director of the United States of America Central Intelligence
Agency (CIA), General Michael Hayden, was widely reported in the press and was the starting
point for a panel discussion that I participated in at a Dublin Core Metadata Initiative (DCMI)
virtual meeting (Haynes et al., 2021). This quote makes us think about the role of metadata and
how it intersects with privacy concerns.
The invasion of privacy by state institutions is not new. Journalists and human rights
campaigners have been warning the public for some time. As far back as 1980, Duncan
Campbell, an investigative journalist, raised the alarm about mass surveillance by the British
intelligence community (Campbell, 2015). In 2013, Amnesty International published a report
on the US drone strikes in Pakistan that resulted in scores of civilian deaths (Amnesty
International, 2013). Edward Snowden revealed the extent of surveillance of US citizens
communications by the US National Security Agency (NSA) (Greenwald, 2013).
Using metadata is more efficient than monitoring the content of calls. Metadata is
structured and provides information on the identities of the callers, who their contacts are, their
location, and the timing of their activities. If someone is a person of interest, it is relatively
easy to spread the net to capture data about their associates and their contacts. Once an
association has been made between an individual and a device, the device’s location can be
used to target that individual. They might be a terrorist or freedom fighter, depending on your
3
HAYNES, David. A View of the Interface between Ethics and Metadata. Brazilian Journal of Information
Science: Research trends. vol. 17, Dossiê: Transversalidade e Verticalidade na Ciência da Informação,
publicação contínua, 2023, e023053. DOI: 10.36311/1981-1640.2023.v17.e023053.
perspective, or an uninvolved bystander. There is a big difference between arresting a suspect
and extra-judicial killing.
3 Embedding Prejudice
The controlled languages used to describe information resources reflect specific
perspectives and worldviews. Holstrom (2022) describes how classification language can be
discriminatory and highlights the power imbalance between authorities and user communities.
I started my career as an information scientist abstracting and indexing. I am
particularly interested in the way that indexing languages evolve. For instance, when we look
at the terminology used to describe indigenous peoples in the Americas (including some of my
ancestors), we rapidly get into difficulties:
AMERICAN INDIANS is used as a term in older literature;
FIRST NATIONS is widely used as a term in Canada;
FNMI (First Nations, Metis, Inuit) is also referred to in Canadian literature;
INDIGENOUS PEOPLE is used as a term around the world, including the United
States and Australia;
NATIVE AMERICANS is still used as a term in the United States of America, and;
AMERINDS is also used as a term in the Americas.
Words change their meaning depending on the geographical region and evolving views
about what is acceptable. Some of these terms overlap and are used interchangeably. There is
also a transition between different usages as older terms fall out of favor. Another example is
the distinction in Brazil between índios indigenous peoples (related to the Amerinds in the
Caribbean) and indianos, people from India. Our use of language reflects a particular
worldview. Recognizing diversity is good, but labels can also alienate or other people.
4 Problems with Searches
Metadata permits access to resources based on different characteristics. I’ll focus on
subject access. One of the principles of a fair society should be citizens’ access to information.
4
HAYNES, David. A View of the Interface between Ethics and Metadata. Brazilian Journal of Information
Science: Research trends. vol. 17, Dossiê: Transversalidade e Verticalidade na Ciência da Informação,
publicação contínua, 2023, e023053. DOI: 10.36311/1981-1640.2023.v17.e023053.
That allows people to make informed decisions about their daily lives, what products or
services to use, or who they wish to elect to govern their country.
The International Society for Knowledge Organization (ISKO) recently set up a
working group to consider some of the issues of lack of access to metadata and its effect on
subject retrieval. The premise of that group is that the current generation of discovery systems
used in academic and research libraries largely ignore the very rich subject metadata that has
been created by specialists. The working group is currently consulting academic and research
librarians and it invites participation by members of the knowledge organization community
and the wider library and information science community (Haynes et al., 2023).
Controlled vocabularies play an important role in retrieval. An example of the value of
controlled vocabularies and applied indexing may help to address well-known problems of
spelling variations, ambiguity, and changing use of terminology. In English, there are spelling
differences between some words in British and North American usages. If there is no controlled
vocabulary, we find that the following examples could give different search results:
“SULPHUR DIOXIDE” and “SULFUR DIOXIDE”;
ALUMINIUM and “ALUMINUM.
The other purpose of controlled vocabularies is to disambiguate words so that there is
a distinction for instance between the following usages of the word Train:
TRAIN (teaching) treinamento;
TRAIN (railway vehicle) trem;
TRAIN (process) processo.
5 Opacity of information retrieval systems
Another aspect of metadata usage that can limit access to information is the opacity of
search systems. Algorithms for selecting and ranking items tend to be proprietary. They are
also deliberately kept hidden to prevent overt manipulation of results. The figure below shows
a simplified schematic of a discovery system. We do not know what the criteria are for the
selection and ranking of the search results. The lack of transparency makes it difficult to refine
and improve searches.