1
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
PRIVACY AND ONLINE SOCIAL NETWORK:
a model for analysis of collecting personal data
Privacidade e Rede Social Online: um modelo para análise de coleta de dados pessoais
Fernando de Assis Rodrigues (1), Ricardo Cesar Gonçalves Sant'Ana (2)
(1) Federal University of Pará (UFPA), Brazil, deassis@ufpa.br
(2) São Paulo State University (UNESP-Tupã), Brazil, ricardo.santana@unesp.br
Abstract
The popularization of Information and Communication Technologies allowed new forms of interaction,
including Online Social Network (OSN) services. These services could share personal data with third-party
organizations through Application Programming Interfaces (API), making it a complex task to observe the
data flow. This research proposes a model to identify levels and elements available in the flow of personal
data via API between organizations holding OSN services and external agents. It converged the recurrent
mechanisms into providing data sets from the OSN services to external agents from a systematized reading
of the technical documentation to establish a database model to fetch unitedly the possibility of storing
information about the levels in which API provided personal data to external agents, applying the Entity-
Relationship Model. Subsequently, it presents a Data Mart as a proof of concept for the proposed database
model, intending to compare the ways of accessing the personal data attributes stored and, following, shared
by the OSN services to external agents. It showed that the model could clarify the elements inherent in the
data flow, allowing a more structured analysis, including the possibility of monitoring changes in the API
by organizations over time.
Keywords: Data models; Database models; Privacy; Social networking
Resumo
A popularização das Tecnologias de Informação e Comunicação permitiu novas formas de interação,
incluindo os serviços de Rede Social Online (RSO). Esses serviços podem compartilhar dados pessoais
com organizações terceirizadas por meio de Interfaces de Programação de Aplicativos (API), tornando uma
tarefa complexa observar o fluxo de dados. Esta pesquisa propõe um modelo para identificar níveis e
elementos disponíveis no fluxo de dados pessoais via API entre organizações detentoras de serviços RSO
e agentes externos. Convergiu os mecanismos recorrentes em fornecer conjuntos de dados dos serviços
RSO para agentes externos, a partir de uma leitura sistematizada da documentação técnica para estabelecer
um modelo de banco de dados para buscar de forma conjunta a possibilidade de armazenar informações
sobre os níveis em que as API forneceram dados pessoais a agentes externos, aplicando o Modelo Entidade-
2
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
Relacionamento. Posteriormente, apresenta um Data Mart como prova de conceito para o modelo de banco
de dados proposto, pretendendo comparar as formas de acesso aos atributos de dados pessoais armazenados
e, em seguida, compartilhados pelos serviços RSO para agentes externos. Demonstrou que o modelo pode
esclarecer os elementos inerentes ao fluxo de dados, permitindo uma análise mais estruturada, incluindo a
possibilidade de monitoramento de mudanças na API pelas organizações ao longo do tempo.
Palavras-chave: Modelos de Dados; Modelos de Banco de Dados; Privacidade; Redes Sociais
Resumen
La popularización de las Tecnologías de la Información y la Comunicación permitió nuevas formas de
interacción, incluidos los servicios de Redes Sociales en Línea (RSO). Estos servicios pueden compartir
datos personales con organizaciones de terceros a través de interfaces de programación de aplicaciones
(API), lo que hace que sea una tarea compleja observar el flujo de datos. Esta investigación propone un
modelo para identificar niveles y elementos disponibles en el flujo de datos personales vía API entre
organizaciones que poseen servicios de RSO y agentes externos. Convergieron los mecanismos recurrentes
en la prestación de conjuntos de datos de servicios de RSO a agentes externos, a partir de una lectura
sistemática de la documentación técnica para establecer un modelo de base de datos para buscar en conjunto
la posibilidad de almacenar información sobre los niveles en los que las APIs proporcionaron datos
personales a agentes externos, aplicando el Modelo Entidad-Relación. Posteriormente, presenta un Data
Mart como prueba de concepto para el modelo de base de datos propuesto, con la intención de comparar
las formas de acceder a los atributos de los datos personales almacenados y luego compartidos por los
servicios de RSO a agentes externos. Demostque el modelo puede aclarar los elementos inherentes al
flujo de datos, lo que permite un análisis más estructurado, incluida la posibilidad de monitorear los cambios
en la API por parte de las organizaciones a lo largo del tiempo.
Palabras clave: Modelos de datos; Modelos de bases de datos; Privacidad; Redes sociales
1 Introduction
The popularization of Information and Communication Technologies allowed new
interaction forms between individuals, communities, and public and private organizations, such as
those mediated by Online Social Network (OSN) services. These complex services are an integral
part of reflections on the new characteristics of society and the functioning of the social fabric,
with new possibilities of interaction and relationship between different types of entities, mediated
by applications, computers, and networks, as verified in the concepts of Cyberculture (Lévy 2001)
and Network Society (Castells 2009).
The OSN services are available globally (with few exceptions) as an integral part of a new
business model for organizations that usually offer these services free of charge to users but with
multiple considerations to enable profitability. Therein lies the duality and the antagonism of this
environment: the OSN services are free, accessible, with great potential to reduce geographic
3
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
barriers to interrelationships between individuals (Lengyel et al. 2015), but they are developed and
maintained as products that depend on profitability to ensure the viability of the business model of
the organizations who support them (Zhang et al. 2016).
The OSN organizations compete for public attention, bringing constant innovation to the
services offered, resulting from a highly competitive and profitable global market. A total of 4.2
billion people accessed an OSN service at least once in January 2021, representing 53.6% of the
population (Statista 2021), an increase of 490 million users between January 2020 and January
2021 (We Are Social 2021). It is also important to emphasize that the largest OSN holding
organizations are listed in the NASDAQ Top 100 Stock Index, except for some of the Chinese
organizations (NASDAQ 2021), in addition to being part of the most accessed applications and
websites (Alexa 2021).
Based on the news broadcast by the media, part of society understands that one of the forms
of profitability of organizations holding OSN services is related to the sale of advertising space for
their services (Wall Street Journal 2021). However, one of the main success factors that set OSN
services apart from other advertising spaces resides in the fact that they may be more assertive to
delimit the target audience expected by the advertiser since individuals exchange information
about their personal and professional activities in the OSN services, including natural and artificial
attributes, entertainment options, cultural options, among other details. In synthesis, individuals
tend to share personal data with the services (Rodrigues and Sant’Ana 2016).
These personal data are collected and stored in the Database Management Systems of the
organizations holding the OSN services. The collected data are strictly systematized, applying data
structure models and transforming them into data sets that, among other activities, allow a
systematic, continuous, uniform, and customizable recovery process for each type of informational
demand required (Rodrigues et al. 2018). Also, it is important to foreground the existence of
channels to data access for partners, called external agents (Rodrigues et al. 2018; Rodrigues
2017), representing a vital part of the revenue to OSN services (Lanier 2018). The OSN services
extend access to personal data to a set of organizations where the OSN service offers a dataset
supply channel (OSN service → External Agent) and determines which datasets will be available
to external agents without the requirement for human interaction in the process. The organizations
provide data through interfaces specifically for those exchange operations, widely adopting a
4
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
concept from Computer Science, developed in the 1960s, the Application Programming Interface
(API) (Manikas 2016). The APIs have become an integral component of the software that manages
OSN services and part of the model for building large software integrator solutions (Manikas
2016).
The OSN services became a complex research domain since the 1990s, analyzed from the
most varied perspectives and bringing new opportunities and challenges to researchers (Baatarjav
and Dantu 2011; Boyd and Ellison 2007). In this context, it is essential to build analysis' structures
that make it possible to clarify the elements of the data flow between OSN services and external
agents and to reduce the user's lack of awareness about shared personal data among these entities.
The motivating problem of this research is the difficulty in perceiving the levels at which the OSN
services deal with access to the personal data of their users and, consequently, support the
diagnoses of possible actions that potentiate breaches of the users' privacy from personal data that
is collected, stored and shared with external agents.
Therefore, this research proposes a model to identify levels and elements available in the
flow of personal data via API between organizations holding OSN services and external agents.
2 Literature Review
One of the leading research fields about OSN services use is related to the context of
potential data access actions that impact personal data privacy. The discussions on the subject of
personal data privacy are plural and addressed from different perspectives, such as the absence of
privacy guarantees in the data transferred between two nodes of the network, as in the sharing of
characteristics of the use of OSN services by men and women (Fogel and Nehmad 2009; Schneider
et al. 2009), by teenagers (Barnes 2006; Boyd 2013) and students (Acquisti and Gross 2006;
Ellison et al. 2007; Tufekci 2007); the lack (or the limited) of knowledge about how it works the
OSN services and its relationship with the privacy of personal data (Krasnova et al. 2009); self-
disclosure and publicity of personal or professional activities in OSN services (Trepte and
Reinecke 2011; Young and Quan-Haase 2009); the classification of activities with potential harm
to personal data's privacy in OSN services (Rodrigues and Sant’Ana 2016); the ethical use of
personal data from OSN services in scientific research (Zimmer 2010); the exposure and the
5
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
invasion of personal data stored in OSN services (Boyd 2008); and the leakage of personal data
(Krishnamurthy and Wills 2009).
However, the context for this research is related to scientific studies involving proposals of
analysis models that identify transmitted data on the network, besides the relationships between
content and actors, including service users, OSN services maintainers, and external agents. The
dynamics between content and actors integrated into the digital social fabric present complexity
for analysis of what happens to personal data, essentially in virtue of data and functionalities
constantly changing (Watts 2004). Stand out the research as the proposals for the analysis of
Uniform Resource Identifiers (URIs) used for the formation of the digital social fabric (users and
groups/communities) and the relationships between entities and contents transferred in the OSN
services (Mislove et al. 2007); the nucleation model to clustering users with common interests
through personal data (Zhang et al. 2017); the privacy aspects of personal data when shared with
OSN service partner platforms via third-party servers (Krishnamurthy and Wills 2008); the
analysis of user behavior and different uses of OSN services (Penni 2017); the data accessibility
model to elucidate the privacy and security risks and concerns in using OSN services (Creese et
al. 2012; Lankton et al. 2020); the content structuring in OSN services via the semantic web (Mika
2007; Rodrigues et al. 2018); and the detachment of OSN services spaces into user, social and
technological domains (Caviglione et al. 2014).
Hence, the leading global OSN services form the OSN supernetworks (Donath 2007),
overcoming the geographic barriers (Lengyel et al. 2015), bringing new marketing and
community-building possibilities (Zhang et al. 2016; Weber 2007), implying new ways of
obtaining social capital (Ellison et al. 2007), enabling new sentiment analysis techniques based on
the conveyed content (Khan et al. 2016; Pozzi et al. 2017), and personal data mining for behavior
analysis (Singla and Richardson 2008). Besides, OSN supernetworks bring new concerns, such as
security aspects of the information stored in these services (Altshuler 2013) and the development
of defense systems against web crawling in OSN interfaces (Mondal et al. 2011).
6
RODRIGUES, Fernando de Assis; SANT'ANA, Ricardo Cesar Gonçalves. Privacy and Online Social Network: a
model for analysis of collecting personal data. Brazilian Journal of Information Science: research trends, vol.
17, publicação contínua, 2023, e023005. DOI: 10.36311/1981-1640.2023.v17.e023005
3 Material and Methods
The method adopted consists of an exploratory analysis of OSN services, by direct and
non-participant observation, of a quantitative and qualitative nature, with the use of combined and
convergent methods (Sandelowski 2000; Brannen 2005), from the exploration of the technical
characteristics of its APIs and the reading of document collections available. It was divided into
three perspectives, starting from a study of the steps available on OSN documentation (called
levels in this research) required to provide personal data from the OSN services to external agents,
followed by detailing the characteristics of personal attributes at the moment of data collection by
external agents, and a proposal of a database modeling to support query processes.
This research studied APIs from OSN services: Facebook, Twitter, and LinkedIn,
respectively, the Graph API (Meta, Inc. 2021), the Twitter API (Twitter, Inc. 2021), and the
LinkedIn API (Microsoft, Inc. 2021). Three factors established the eligibility criteria: the
availability of technical material in English, services free of charge to API users, and no access
limitation. Besides, those OSN services have the highest monthly access (Alexa 2021; Statista
2021). Were discarded to analyze other OSN services offered by the same organization, opting to
select only the service with the highest number of monthly users (e.g., Meta Inc. owns more than
one service).
It adopted a systematized reading of the technical documentation about the operation of
APIs made by OSN services and available to software developers. Afterward, it converged the
recurrent mechanisms to provide data sets from the OSN services to external agents. These two
steps are interrelated and described in the fourth section of this article. In the third stage, was
established a database model to fetch unitedly the possibility of storing information about the
levels at which APIs provided personal data to external agents, using the specialization
generalization process and applying the Entity-Relationship Model (Silberschatz et al. 2011).
Subsequently, it presents a proof of concept for the proposed database model, intending to compare
the ways of accessing the personal data attributes stored and, following, shared by the OSN
services to external agents.
About the material, was used i) electronic spreadsheets to systematize the readings of
technical documents, including the collection of information about the components that are part of