Knowledge Graph Application in Education: a Literature Review

: In modern and developing economic systems, Knowledge Management (KM) is considered to be one of the most important activities of almost any organisation. The KM process at universities includes didactic processes, among which we distinguish the process of individualisation of education. Such a process requires a large amount of information to be processed both by university workers and students. This paper suggests that Knowledge Graphs are a technology that facilitates and enhances KM processes at universities and provides an extended analysis of a Knowledge Graph phenomenon.


Introduction
The importance of Knowledge Management (KM) technologies has been swiftly growing recently, along with the volumes of data processed by organisations. Modern KM approaches often suggest new data organisation architectures and promote their implementation in organisations. One of such architectures leverages semantic technologies, i.e. the technologies the Semantic Web 1 is based on, in order to allow machines to understand the meaning of the data they work with (Galkin et al., 2017). One of formal models for the embodiment of semantic technologies is a Knowledge Graph (KG).
In general, graph-based representations can be used to solve a wide variety of problems in diverse fields, such as: large network systems, semantic search and knowledge discovery, natural language processing, cyber security, social networks, chemical compounds, etc. (Velampalli, Jonnalagedda, 2017). Compared to other knowledge-oriented information systems, the distinctive features of Knowledge Graphs lie in their special combination of knowledge representation structures, information management processes and search algorithms (Gomez-Perez et al., 2017).
The objective of the paper can be divided into two parts. Firstly, the author would like to analyse the phenomena of Knowledge Graph and the process of its construction. Secondly, the author considers a KG to be an efficient tool for assistance and facilitation of the process of education individualisation at universities. By individualisation the author means giving students the possibility of selecting the courses they would like to study based on their personal preferences and, therefore, of forming their individual set of courses that are necessary for their future career. In accordance with the two major objectives, the paper is divided into 4 major parts (further -P.): 1. P. 1 generally describes the phenomena of a Knowledge Graph and a Knowledge Schema. 2. P. 2 contains a derived aggregated definition of a KG. This part also includes a set of research questions on KG that require further study. 3. P. 3 is dedicated to one of the arguable issues connected with the topic -Knowledge Graph embedding. Some related works are discussed. 4. P. 4 presents an example of a Knowledge Schema constructed for a university didactic process. 5. Conclusions are drawn on the application and construction of a KG. 1 The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries (Wikipedia).

Knowledge Graph phenomenon
The phenomena of a Knowledge Graph first became known worldwide when, in 2012, Google (Singhal, 2012) started to use such a Graph in their search engine, allowing users to search for things, people or places (Gomez-Perez et al., 2017). Inspired by Google, Knowledge Graphs are being developed by the world's leading information companies. For instance, DBpedia (wiki.dbpedia.org/about) -an open knowledge graph, available to anyone on the Web. According to the statistics (DBpedia, 2017), the last DBpedia release (2016) consists of 13 billion pieces of information (RDF triples) out of which 1.7 billion were extracted from the English edition of Wikipedia, 6.6 billion were extracted from other language editions and 4.8 billion -from Wikipedia Commons (2019) and Wikidata (2018).
Knowledge Graph is a model of information entities interrelation. It is a database which stores knowledge in accordance with the particular Knowledge Schema (KS). Such a Knowledge Schema, in turn: 1) constructs the meta-layer of a KG and provides its internal structure; 2) defines classes as abstract containers for similar types of entities; 3) contains a set of potentially available describing elements and potential relationships between classes; 4) serves as a reference point for integrating new data or constructing new queries; 5) contains only structural information; 6) does not contain data about real units of the chosen domain of knowledge; 7) can be considered as a visual representation of a Knowledge Graph.
One of the concepts of storing information in a Schema, which has been widely used recently, is publishing and interlinking data on the Web in the relational form -using the Resource Description Framework (W3C, 2014) developed by the World Wide Web Consortium (W3C, 2019).
In accordance with the RDF standard, information is represented in so-called "triples", or "triplets" (subject-predicate-object), where the two entities (subject and object, also called "head" and "tail" entities - Wu et al., 2017) are related to each other by a predicate. Each triple indicates a particular fact, showing interrelations of two selected entities.
To build a Knowledge Schema (and then a Knowledge Graph), the triples should be combined together into an actual multi-graph which will have entities as nodes, relations as edges and predicates as edge labels for each particular relation.

Definition of a Knowledge Graph
Presently, a variety of definitions for Knowledge Graphs can be found in the literature. Table 1 summarises definitions the most important for this work based on which the author has formulated the following integrated KG definition: A Knowledge Graph is a knowledge base that 1) replicates the model of information flow in an organisation, 2) stores complex structured and unstructured knowledge, 3) is presented in the form of entities and relations between them, 4) covers a multitude of topical domains, 5) acquires and integrates knowledge, 6) and enables interrelation of arbitrary entities. KGs could be envisaged as a network of all kinds of things which are relevant to a specific domain or to an organisation. They are not limited to abstract concepts and relations but can also contain instances of things such as documents and datasets. Masuch, Muszynski, Raethlein, 2014 An [enterprise] KG is a disruptive platform that combines emerging Big Data and graph technologies to reinvent knowledge management inside organisations […] replicates the unique network of an organisation with most of its relevant entities into a graph database. Wang et al., 2014 A KG is a multi-relational graph composed of entities as nodes and relations as different types of edges. Xu et al., 2014 KG is a kind of knowledge base […] used to store complex structured and unstructured knowledge. It is usually in the form of a directed or undirected graph that leverages vertices and edges to represents entities […] and their relationships, respectively. Lin et al., 2015 KGs encode structured information of entities and their rich relations. Nickel et al., 2015 Knowledge graphs model information in the form of entities and relationships between them. Pujara et al., 2015 KGs are structured knowledge bases where nodes represent entities and edges represent relationships between these entities. Paulheim, 2016 A KG (i) mainly describes real world entities and their interrelations organised in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains. Socher, 2013 A knowledge graph consists of a set of interconnected typed entities and their attributes […] is a knowledge representation technique […] has an ontology as its schema defining the vocabulary used in the knowledge graph.

Author(s)
Definition Wu et al., 2017 KGs constitute a new knowledge representation method and data management model […] useful for natural language processing, question answering, information retrieval […]. In the KG, knowledge is represented by a directed graph where the nodes represent entities or concepts, while the directed edges represent the relations.
It can be stated that the phenomenon of a Knowledge Graph has been studied rather in detail and is applied in different branches of science. Yet, recent research has revealed some controversial issues (the so-called "bottlenecks") in the KG construction and application -the aspect that may require specification and finalisation. Such issues, which the author considers to be the most valuable for further discussion, are presented in Table 2.

Author(s)
Research question(s) Socher et al., 2013 Prediction of the likely truth of additional facts based on existing facts in the knowledge base. Nickel et al., 2015 a) Link prediction in KGs -probability of existence or correctness of relations in a graph. b) Entity resolution (object identification) -identification of an object that actually refers to a particular entity (especially in the case of objects that have similar names). Pirrò, 2015 a) Relatedness in KGs -how relations between entities are suggested. b) KG's querying (search) capabilities. Wang et al., 2014;Xu et al., 2014;Cui et al., 2015;Pujara et al., 2015 Approaches to knowledge embedding -filling a KG with the necessary data. Tian et al., 2016;Wu et al., 2017 Heterogeneity of subjects and objects -the fact that subjects are usually more concrete, while object entities are more abstract. Chowdhury et al., 2017 Speed and accuracy of large graphs search.
Gomez-Perez et al., 2017 a) Consistency of entities in a KG. b) KG correctness. c) Scalability of the service based on a KG. Jetschni, Meister, 2017 Creation of a knowledge schema which represents the structure of knowledge in the KG.
The sample of research questions (Table 2) allows us to draw a conclusion that one of the major aspects of Knowledge Graph construction which requires further (re)work and improvement is Knowledge Graph embedding. In the studied sample of research, four authors have made knowledge embedding improvement their primary objective.

Knowledge Graph embedding
When talking about embedding a KG, we mean the process of inserting components of a Graph (entities and their relations) into vector spaces 2 which are supposed to simplify further data manipulations.
The issue of Knowledge Graph embedding has been widely studied in the recent years. The timeline of research on this topic can be presented as follows: 1. In 2011, in the work (Bordes et al., 2011), the authors suggested a flexible and compact embedding model based on a neural network. 2. The same year in (Nickel, Tresp, Kriegel, 2011), the authors presented the RESCAL model, which performed collective learning on multi-relational data. 3. In 2013, in (Mikolov et al., 2013a;2013b), the word representation model (Skip-gram model), which allows us to process large amounts of unstructured text data, was introduced. 4. In the year 2014, two studies were based on the idea of (Mikolov et al., 2013a; 2013b): -Neural networks form the basis for the work of (Xu et al., 2014), where relational and categorical knowledge is modelled as regularisation 3 functions which are further incorporated into the representation Skip-gram model. -The authors of (Yu, Dredze, 2014) proposed a new learning objective that incorporated both a neural language model and prior knowledge from semantic resources to learn improved lexical semantic embedding. 5. At the same time, in the year 2014, the authors of  suggested a mechanism of embedding both entities and words into the same vector space: applying the knowledge model to embed fact triplets, a word model for pairs of words, and an alignment model to impel the two models together. 6. In 2014, the work (Chang et al., 2014) presented TRESCAL -an improved (based on Nickel et al., 2011) knowledge base embedding model for relation extraction. 7. The work (Chang et al., 2014) in 2017 inspired the work , where the authors suggested improvements to the previously developed embedding models: TransE (2013, in Bordes et al., 2013) TransH, TransR and TransD (2014Lin et al., 2015;Ji et al., 2015). The TransC model predefines the sets of head and tail entities for each relation to ensure that the constructed triplets are consistent with the semantic relation type.
2 A vector space is a collection of objects called vectors which may be added together and multiplied by numbers (Wikipedia).

3
Regularisation is a process of introducing additional information in order to solve an ill-posed problem (Wikipedia). 8. The same year, the work  suggested another version of the Trans model -TransT. Being a type-based multiple embedding model, TransT fully utilises the entity type information which represents the categories of entities in most KGs. Despite the variety of suggested solutions, Knowledge Graph embedding is a pressing topic for research, as embedding models are still considered imperfect and require enhancement. It can be expected that existing research will inspire many more new works on this issue to appear in the literature.

Knowledge Graph in education
It is necessary to highlight that any of the above-mentioned embedding models to a certain extent refers to the Knowledge Schema as the basic structure of any Knowledge Graph. That is why, in order to visualise the initial step of the KG creation and RDF structure application, the author uses an example from the educational sphere. Figure 1 illustrates the Knowledge Schema, built to describe a concept of a university didactic course. The application of the Graph, developed on the basis of this particular Schema, is supposed to be a step towards individualisation of education at universities -by enabling students to make a conscious, thoughtful choice of the courses they would like to study (it should be especially efficient for the free-choice courses).
Thus, the Schema should contain all the possible data about courses (e.g.: assessment terms, ECTS points, the main topics and objectives, etc.) and enable the Graph to provide brief and valuable search results to students.
The Course Concept KS was built using the Cmap Tools 4 software. When creating triples for any piece of information (as well as for the whole database of a KG), it is necessary to have formed vocabulary of entities and relations. It is recommended to use standard vocabularies, since they have already solved certain problems in many iterations and will be advantageous in integration with other systems and data sources. Where no suitable vocabularies are found, new elements must be defined, which should, if possible, be set in a semantic relation to elements from the existing standard vocabularies (Jetschni, Meister, 2017). Following this rule, the vocabularies from Schema.org 5 were applied to build the Schema (Figure 1). In addition, Table 3 contains a sample of triples extracted from the Course Concept KS.  Source: own elaboration based on the idea of Nickel, Tresp, Kriegel, 2015 It is also necessary to add that all the knowledge in a KG can be divided into two forms (Xu et al., 2014): 1) relational knowledge (e.g.: partOf, hasType) encodes the relationship between entities so as to differentiate word pairs with analogy relationships; 2) categorical knowledge (e.g.: gender, location, profession) encodes the attributes and properties of entities according to which similar words can be grouped into the meaningful categories. Table 4 represents the knowledge extracted from the Course Concept KS (in Figure 1) and divided into groups of the two above-mentioned forms.  Xu et al., 2014 When developing an actual Knowledge Graph for practical purposes, the author recommends taking into consideration the following procedure: 1. Vocabulary for the whole Graph should be formed on the basis of existing vocabularies and (if necessary) some specific terms should be added. 2. A draft structure of RDF sets (triplets) should be built -in order to obtain a clear picture of all the entities and their relations in a Graph. Table 3 can be a version of such a draft structure.
3. Knowledge Schema, containing all the triplets, should be constructed (similar to the one in Figure 1). It will serve as the basis for the future KG. 4. The Knowledge Graph should be developed with the help of an information system and presented, for instance, as an application or a web-page. The construction of such a Graph for the purposes of a university didactic process is the major objective of the author's further research.

Conclusions
Knowledge Management is considered to be one of the key activities of almost any organisation in modern economics. That is why the importance of technologies assisting knowledge managing processes has been growing quickly. One of such technologies, or better to say -tools, is a Knowledge Graph. A Knowledge Graph is a database which stores complex structured and unstructured knowledge of an organisation in the form of entities and relations between them (called "triplets"), covers a multitude of topical domains and, actually, replicates the model of the information flow in an organisation. The knowledge in such a Graph is stored in accordance with a particular model -a Knowledge Schema, which is a meta-layer of a Knowledge Graph (also its visual representation); provides its internal structure, defining classes as abstract containers for similar types of entities, and serves as a reference point for integrating new data or constructing new queries.
The Knowledge Graph phenomenon has been studied rather in detail and is applied in different branches of science, yet recent research has revealed the so-called "bottlenecks" in its construction and application. The major aspect of Graph construction, which requires further improvement, is Knowledge Graph embedding -the process of inserting components of a Graph (entities and their relations) into vector spaces which are supposed to simplify further data manipulations.
As an example of Knowledge Graph initial embedding, the Knowledge Schema describing a concept of a university didactic course was built. The Graph, developed on the basis of this particular Schema, is supposed to be a step towards individualisation of education at universities, which means giving students the possibility of selecting the courses they would like to study based on their personal preferences and their future career plans. The presented Schema contains some of the possible data about courses (e.g.: assessment terms, ECTS points, the main topics and objectives, etc.) and should enable the Graph to provide brief and valuable search results to students.
The major objective of the author's further research is to construct a Knowledge Graph for the purpose of university education individualisation, applying some information technologies to present the Graph, for instance, as an application or a web-page.