ONTOFORINFOSCIENCE: A DETAILED METHODOLOGY FOR CONSTRUCTION OF ONTOLOGIES AND ITS APPLICATION IN THE BLOOD DOMAIN

Ontologies are instruments of knowledge organization that have been developed by using several methodologies. These methodologies are well-known and established, but their steps often are not well explained. Thus, only knowledge engineers, in general with a major in computer science, are able to perform all steps required in ontologies development. In the present paper, we describe a methodology that details each step of the ontology development cycle. The goal of this methodology, which is called OntoForInfoScience, is to overcome issues that arise from the use of technical jargon and from the application of logical-philosophical principles in the ontology development. In order to identify the afore-mentioned issues, our methodology was produced by information scientists during the development of ontologies in the domain of human blood. Such domain ontology, called Hem-onto, is part of an on-going scientific biomedical project. In this paper, we present a brief description of OntoForInfoSci-ence, as well as some results of its application in the development of the blood ontology. Finally, we conclude that the new methodology is useful for information scientists creating formal ontological representations. In addition to the methodology per se, we also provide partial results of Hemonto development.


Introduction
Ontologies have been widely employed in knowledge representation and knowledge organization within several scientific fields. In Biomedicine, ontologies are used to improve medical vocabularies, insofar as they make use of logical formalisms able to provide more expressivity representing knowledge. In addition, ontologies enable the creation of accurate models of reality, which is a desirable characteristic in biomedicine (Freitas and Schulz 2009).
The growing use of biomedical ontologies can be noticed, for example, by the emergence of specialized repositories and by the use of ontologies to improve vocabularies in relevant medical domains. The former is illustrated by initiatives as the Open Biomedical Ontologies (OBO) (Smith et al. 2007), a public online repository of biomedical ontologies, which aims to foster integration and standardization of medical terminology. The latter is illustrated by efforts in formalizing fundamental medical vocabularies such Gene Ontology (GO) (Smith, Williams and Schulze-Kremer 2003) and Foundational Model of Anatomy (FMA) (Rosse and Mejino 2003).
The present paper describes a new methodology to ontology construction, called OntoForInfoScience, which provides a higher level of detail concerning the steps of the ontological development cycle (Mendonça 2015). The goal of OntoForInfoScience is to enable experts in Knowledge Organization -often from the Information Science field -to overcome the use of both technical jargon, and logical and philosophical principles involved in the ontology development. In order to identify the difficulties, the methodology was created by researchers of Information Science during the development of an ontology for human blood components. This domain ontology, called Hemonto, is part of a scientific project called the Blood Project (1) that aims to create a formal language about blood components in hematology and blood transfusion subdomains.
In using OntoForInfoScience to build Hemonto, we obtain a formalization of technical terms in the domain the Blood Domain. // Brazilian Journal of Information Studies: Research Trends. 10:1 (2016) 12-19. ISSN 1981ISSN -1640 of blood. While creating the new methodology, we manage to provide more details about several steps and issue, for example: the use top-level ontologies, the characterization of different ontological relations, the creation of formal definitions, and so forth. The detailed steps proved to be useful to guide the ontology developers in creating domain ontologies.
The remainder of this paper is organized as follows. The second section presents biomedical domain ontologies that represent and describe concepts of the blood domain. The third section presents well-known methodologies for ontology construction currently in use, in addition to a brief description of the OntoForInfoScience methodology. The fourth section describes the construction of Hemonto using the new methodology. The fifth section presents partial results of the domain ontology. Finally, the sixth section offers final remarks and expectations for future works.

Biomedical ontologies and the blood domain
Surveying repositories of vocabularies for knowledge representation in the domain of human blood, one can notice that the level of coverage of current instruments, like terminologies and even ontologies, is still insufficient. Indeed, one can observe that entities and properties of human blood are still underrepresented in the aforementioned instruments. So, the development of ontologies about blood components is a defensible enterprise. Even though current initiatives are not so suitable, the existing ontologies and terminologies about blood were useful in the construction of Hemonto, since the reutilization of terms is a well-recommended principle in ontological engineering.
In this section, we present a literature review about biomedical ontologies that deal with aspects of the human blood. These ontologies were used as sources for reutilization of terms during the Hemonto construction. It is noteworthy that the literature review is not exhaustive and lists only the main and best-known biomedical ontologies rather than the large amount of them available on the web. Ontology of chemical entities emphasizing chemical components or "small molecules"; in opposition to "macromolecules" (nucleic acids and proteins), which are not represented (Degtyarenko et al. 2008 Clinical ontology containing definitions for the field of anatomy, in addition to contents related to human physiology, pathology and symptoms (Rector et al. 2003).
Logical formalisms for anatomy, physiology and pathology concerned to the human blood.

Gene Ontology (GO)
Ontology describing genetic products and their functions in any organism (Smith, Williams and Schulze-Kremer 2003).

OntoForInfoScience stages
In addition to the biomedical ontologies, there is another essential topic required to compose our theoretical background: methodologies for ontology construction. Then, we present here some references to the bestknown and cited methodologies, which have been currently employed in the development of domain ontologies. We also describe results concerning the new methodology, called OntoForInfoScience, which aims to provide more details about the activities involved in the ontology development cycle. OntoForInfoScience uses a more simple language, free of jargon, suitable for non-experts in Logic, Computer Science and Philosophy (Mendonça 2015 Even though there are a significant number of methodologies available for ontology developers, many difficulties are still found. An ordinary difficulty is that methodologies do not explain in details the steps to be followed during the activity. In general, methodologies suppose that developers are experienced knowledge engineers and that they already know how to perform all tasks (Mendonça 2015). For example, some methodologies suggest that one needs to create formal definitions, but they do not explain how to accomplish this task. In general, when building ontologies some background in logic, philosophy and computer science is required. The presence of technical terms without the suitable elucidation of their meanings has posed barriers to professionals that deal with knowledge organization in Information Science. Examples of such terms are: multiple inheritance, data types, abstract class, and disjoint class, to mention a few.
OntoForInfoScience is a methodology that explains in details all the steps required for the ontology development in a simple language suitable for a variety of professionals. In order to do this, the methodology reuses and refines phases of existing methodologies (for example, Methontology, 101 Method and Neon) with the aim of overcoming some limitations. OntoForInfoScience consists of nine phases depicted in Figure 1. By limitations of space, each phase cannot be fully described here. A complete description of each phase can be obtained in Mendonça (2015).
In phase 0, the developer performs a previous evaluation regarding the need of constructing an ontology. On the one hand, if the goal is indexing and information retrieval from documents, one would rather to develop a thesaurus. On the other hand, if the goal is the repre-sentation of real world entities by using a richer set of relations, the construction of an ontology should be considered.
Once the need of an ontology is confirmed, the developer begins the steps that comprise the ontology lifecycle. In phase 1, the developer specifies the ontology using a template, which contains at least information about: the ontology domain and scope, the general purpose, the audience, applications and degree of formality. In addition, the developer establishes the ontology coverage by delimiting it through competency questions.

Figure 1. Phases OntoForInfoScience Methodology
Phase 2 involves knowledge acquisition. It encompasses the selection of materials to be studied (about the domain) and the selection of methods to extract knowledge. Within OntoForInfoScience, these activities mixed different methods, like: books and papers textual analysis, automatic terms extraction, and semiautomatic methods for concepts identification, to mention a few. The intermediary artifacts obtained in phase 2 are glossaries of concepts, of verbs and relations. All of them are employed in phase 3.
Phase 3 concerns conceptualization. This involves identification and analysis of concepts candidates to be classes. In addition, developers perform knowledge organization to obtain relations, properties and constraints. In OntoForInfoScience, these processes occur with the transformation of glossaries (built in phase 2) into conceptual artifacts, for example: i) a concepts and properties table containing identified concepts, definitions, synonyms, allowed values and properties; ii) a verbs dictionary containing identified verbs candidates to be relationships, as well as their textual definitions; iii) a graphical conceptual model that represents conceptual relations between concepts using graphs and similar structures.
Phase 4 favors reuse and integration through the activity of ontological grounding. Developers analyze toplevel ontologies that can be used as a starting point. Then, they choose the more suitable top-level ontology considering the underlying philosophical approach for modeling decisions. From the operational point of view, the top-level ontology is imported into an ontology editor tool for implementation.
In phase 5, the ontology formal representation is created using a logical language. Departing from the conceptualization (phase 2), developers produce formal descriptions. The domain knowledge is then organized considering ontological-formal principles, which implies in refining prior conceptual structures in order to meet ontological and logical constraints. Some main activities in phase 5 are: i) to construct a general taxonomy based on the selected top-level ontology; ii) to define descriptive properties of classes, which involves to describe textual attributes as names, synonyms, definitions and annotations; iii) to create formal definitions of classes, which are derived from the textual definitions; iv) to define classes properties, which involves to describe attributes as data types, cardinalities, existential and universal quantifiers; v) to create instances of ontological classes; vi) to specify ontological relations in applying a set of rules and principles to transform conceptual relations into formal relations. Among these rules, one can mention as an example the constraints of including only relations connecting entities in reality. Another example is the characterization of relationship types is_a and part_of. vii) to define properties of ontological relationships, which includes the relation name, its semi-formal definition, its logical properties, its domain and range, to mention a few.
Phase 6 involves ontology evaluation. OntoForInfoScience considers the application of a set of criteria to perform both ontological validation (correspondence between ontology and real world) and ontological verification (analysis with respect of the building correctness). Examples of validation criteria are: nonrecursivity in definitions, specification of different types of part_of relations, definition of inverse relations, and creation of cardinalities.
In phase 7, all activities of the life cycle are documented and organized. Documentation is produced along with the ontology construction. Documentation encompasses the specification document (phase 1), reference documents related to the domain (phase 2), set of conceptual models (phase 3), reused ontologies (phases 4 and 5), ontological and formal content (phase 5), and other useful notes.
Finally, in phase 8 the developer makes the ontological artifact available to be downloaded and visualized by a community of users. The steps of this phase are: i) to export the ontology content from the editor to web repository; ii) the ontology presentation in a graphical format, which can facilitate the understanding of the structure by users.

OntoForInfoScience in Hemonto construction
With the aim of both obtaining the formal representation of human blood components and simultaneously testing OntoForInfoScience, we developed Hemonto following the phases established to OntoForInfoScience. In this section, we describe the Hemonto construction as a test bed application to the new methodology.
In OntoForInfoScience, phase 0 consists in evaluating the real need of an ontology to represent a domain. Three main aspects justified the construction of Hemonto: i) the lack of formal representations in the blood domain; ii) the need of a unique, free of ambiguities, specialized vocabulary; and iii) the need that this vocabulary reflects the reality.
The next phase, phase 1, is the phase of specification. The relevant elements defined in this phase concerned the purpose of the ontology: Hemonto is an instrument to support professionals of hematology and blood transfusion in their clinical practice. This support aims to provide means of creating interoperable information systems, generating a digital knowledge repository about products derived from blood, describing processes employed in extracting, manipulating and storing blood components.
In addition to this general purpose, in phase 1 we also delimited the scope of Hemonto: it begins with both BFO and FMA top-level classes. Then we defined blood components and derivatives by employing a middle-out strategy. The limit of the ontology coverage was defined by terms related to proteins and enzymes, which are part of other biomedical ontologies. Finally, we referenced these other biomedical ontologies, which do not belong to the blood domain like: Gene Ontology, Cell Ontology, Chemical Entities of Biological Interest, and so forth. Limits in finding both the suitable scope and granularity of the ontology are issues that always pose difficulties in the ontology construction.
In phase 2, knowledge acquisition, we consulted technical documents related to the blood domain in order to extract knowledge. Examples of these documents are: i) the guidelines of hemocomponents from the Brazilian Ministry of Health; ii) the terminological standard ISBT 28 (ICCBBA 2010); and, iii) a well-known textbook about clinical hematology called Wintrobe's Clinical Hematology (Greer et al. 2009). These documents were pointed as relevant references according to hematology and blood transfusion experts. With respect to the activity of knowledge extraction, we employed both automatic and manual methods: i) Sketch Engine (2), a software of linguistic analysis used to identify and extract most frequent terms found in documents; ii) a collaborative framework called Conceptualization Modeling Environment (ConceptME) (Sousa, Pereira and Soares 2013), which semi-automatically extracts concepts and relations; iii) human texts analysis to identify candidate terms and relationships.
In phase 3, conceptualization, the glossary of concepts created in phase 2 was first turned into a dictionary of concepts, and then assume the form of a table of concepts and properties. This table comprises elements at the conceptual level, namely: identified concepts, their definitions, synonyms, allowed values and restrictions. Also in this phase, the glossary of verbs was turned into a dictionary of verbs, which contains the identified verbs and their textual definitions; the glossary of relationships was turned into a set of conceptual relations. All of these tasks were carried out along with experts in hematology and blood transfusion. Finally, the outcome of phase 3 was applied to create graphical conceptual models. In such a task, we employed tools like: ConceptMe (3), DiagramEditor (4) e OmniGraffle (5).
In phase 4, ontological grounding, we followed tenets of two top-level ontologies: i) Basic Formal Ontology (BFO) (Grenon and Smith 2004), which acted as the starting point to Hemonto in providing the most of general classes; ii) Relation Ontology (RO) (Smith et al. 2005), which provides the meaning of ontological relationships between classes. Both ontologies are designed based on principles of what has been called "scientific ontological realism" (Smith and Ceusters 2010). To model Hemonto we applied top-level ontologies importing their code to the ontology editor and maintaining their formal principles.
The formalization of ontological content of Hemonto, activity planned to phase 5, was performed following these steps: i) construction of the ontology general taxonomy from BFO entities by classifying blood-related entities under more general entities; ii) definition of descriptive properties of classes and the equivalent textual attributes like ID, label, imported_from, has_Synonym, definition, comments, etc.; iii) creation of formal definitions of classes in logical language (for example, the class "portion of plasma" in Figure 2); iv) definition of class properties to represent the blood component characteristics, for example: proper volume, percentage of hematocrits, temperature of storage, to mention a few; v) creation of classes instances from a set of rules that involves the instances identification through competence questions; vi) specification of ontological relationships from a set of rules and ontological principles with the aim of turning conceptual relations into ontological-formal relationships, for example: kinds of is_a and kinds of part_of relations; vii) definitions of descriptive and logical properties from equivalent attributes in Protégé (label, definition, characteristics, inverse_of, domains, ranges, etc).

Portion of Plasma: formal definition
Portion of Plasma is_a portion of body substance and (has_quality liquid) and (part_of portion of blood) and (contains blood cell)

Figure 2. Example of class definition in the ontology
In phase 6, we performed the ontological content evaluation of Hemonto according to a set of criteria defined to validation and verification. Such criteria cover aspects related to ontological commitments, ontology specification, expansibility, correctness, integrity, consistency, accurateness, validation by experts and documentation. The Hemonto documentation was created during the process of ontology development and finished in phase 7 with a final document.
Finally, in phase 8, we exported the ontology from Protégé to an OWL/XML format, uploaded it to a reposi-tory and made it available. Regarding the availability on the web, we used a Protégé´s plug-in called OWLDoc, which allows creating a general view of the ontology.

Hemonto content
From the application of OntoForInfoScience in the construction of Hemonto, it was generated a formal and expressive representation of the human blood components to be used in healthcare and medical attendance.
In this section, we describe the part of Hemonto that was developed with the aim of creating and testing OntoForInfoScience.
Hemonto is a biomedical domain ontology, which in the current version has 113 terms and 54 relations. Following BFO´s nomenclature (Grenon and Smith 2004), there are 113 continuants and 41 occurrents. The origin of the classes and relationships is presented in Table 2.

Class Relationships
Origin ( The classes and relationships of Hemonto, presented in Table 2, are described in the class dictionary and relation dictionary, which were produced along with the Hemonto development. Both instruments included formalisms and logical axioms for entities defined in the ontology. Table 3 shows examples of those entities extracted from the Hemonto´s dictionaries.  Table 3.

Formalisms and axioms of Hemonto entities
In addition to these dictionaries, the content of Hemonto encompasses structures for graphical representation depicting the relation between entities. Those structures are represented through taxonomies, partonomies or a hybrid structure, which involves more than one relation. Taxonomies, partonomies and other structures were developed in the conceptualization and formalization phases.
The structures defined in the conceptualization phase include: i) a blood partonomy; ii) a leukocytes taxonomy; iii) a partonomy of products derived from blood; iv) a partonomy of basic processes to obtain blood samples; v) a taxonomy of pathological blood cells; to mention a few. These structures defined in the formalization phase comprise: i) a continuants taxonomy; ii) a ocurrents taxonomy; iii) an anticoagulants taxonomy; iv) a taxonomy of basic parameters of blood components; v) a representation of the venous punction process; vi) a representation of processes of obtaining red blood and platelets components; vii) a representation of the processes of obtaining plasmatic components; vii) a representation of the process of obtaining the concentrated granulocytes components; to mention a few.

Final Remarks
One of the most striking difficulties that professionals and researchers of Information Science face in developing ontologies is the presence of technical terms from Computer Science, Logic and Philosophy. This paper presented OntoForInfoScience, a methodology that aims to provide more details about phases of ontology development. We intended to contribute to the expansion of ontology development beyond the expertise of knowledge engineers. In order to create and test OntoForInfoScience, we developed part of an ontology in the hematology and blood transfusion domain.
The use of OntoForInfoScience in the construction of ontologies proved to be useful with respect to the purpose of obtaining a proper formal representation. Indeed, the methodology uses an ordinary and simple language, which facilitates the understanding of activities performed in the construction of ontologies while maintaining the high expressivity of logical languages. Some features of OntoForInfoScience that are useful to ontology development are: i) the meaning of logical and descriptive properties of classes and relations; ii) the use of top-level ontologies as a starting point to ontology development; iii) the furnishment of details about the process of formal and textual definitions creation; iv) explanation about the meaning of ontological relations and about how is possible to characterize the kinds of ontological relations; v) the promotion of collaboration as a key aspect in the conceptualization phase; vi) explanations about how to find terms related to the domain and how to import them into the ontology. Considering features and practical applicability, OntoForInfoScience proved to be useful to the purpose of building Hemonto. For obvious limitations of space, the most important characteristics of OntoForInfoScience, namely, the furnishment of activities details, could not be properly described here. For a complete coverage of the new methodology, see Mendonça (2015).
Considering the lack of formal representations in the domain of blood and its components, we believe that our research in Hemonto is fully justified. The ontology contributes to the blood components description and to better understanding of how these components are extracted, manipulated and stored for purposes of healthcare. Hemonto was developed using OntoFor-InfoScience in order to generate a formal representation of blood domain. Accordingly, one of the goals of this paper was to show how the process of ontological development occurred in Hemonto.
Regarding the Hemonto content, it is noteworthy that the developed ontology is characterized by presenting a formal representation more comprehensive than other currently available ontologies, like: FMA, GALEN, UMLS, GO, to mention a few. Even so, Hemonto is an ongoing project since the blood domain is large and complex. We are aware that the ontology development is an interactive process, and that Hemonto needs additional validation in different communities comprised by specialists, physicians, and other healthcare professionals.