ABSTRACT. Big Data technologies have made significant progress in addressing problems related to the volume and velocity of data, but they are less effective at dealing with data variety and heterogeneity; this so-called “variety challenge” is the main barrier to effective data access in many industry applications. Semantic Technologies offer a potential solution to the variety challenge, and in the Ontology Based Data Access (OBDA) approach they do so in a way that layers on top of existing infrastructure and exploits its scalability. In this talk I will explain the OBDA approach, and show how it is being used to address the variety challenge in two large companies: Siemens and Statoil. I will also highlight some of the problems and limitations of OBDA, discuss how these can be mitigated, and present some recent research that shows how semantic data access can go beyond what is possible with OBDA.
ABSTRACT. Billions of short texts are produced every day, in the form of search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Unlike documents, short texts have some unique characteristics which make them difficult to handle. First, short texts, especially search queries, do not always observe the syntax of a written language. This means traditional NLP techniques, such as syntactic parsing, do not always apply to short texts. Second, short texts contain limited context. The majority of search queries contain less than 5 words, and tweets can have no more than 140 characters. Because of the above reasons, short texts give rise to a significant amount of ambiguity, which makes them extremely difficult to handle. On the other hand, many applications, including search engines, ads, automatic question answering, online advertising, recommendation systems, etc., rely on short text understanding. In this talk, I will go over various techniques in knowledge acquisition, representation, and inferencing has been proposed for text understanding, and will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.
A Joint Embedding Method for Entity Alignment of Knowledge Bases (Full Paper)
SPEAKER: unknown
ABSTRACT. We propose a model which jointly learns the embeddings of multiple knowledge bases(KBs) in a uniform vector space to align entities in KBs. Instead of using content similarity based methods, we think the structure information of KBs is also important for KB alignment. When facing the cross-linguistic or different encoding situation, what we can leverage are only the structure information of two KBs. We utilize seed entity alignments whose embeddings are ensured the same in the joint learning process. We perform experiments on two datasets including a subset of Freebase comprising 15 thousand selected entities, and a dataset we construct from real-world large scale KBs -- Freebase and DBpedia. The results show that the proposed approach which only utilize the structure information of KBs also works well.
A Multi-dimension Weighted Graph-based Path Planning with Avoiding Hotspots (Full Paper)
SPEAKER: unknown
ABSTRACT. With the development of industrialization rapidly, vehicle has become an important part of people's life. However, with a large number of population, the transportation system is becoming more and more complicate in the world. The core problem of the transportation system is how to avoid hotspots. While these current path planning systems, because they plan paths by applying one dimension weighted graph, cannot always describe hotspots in an exact way. In this paper, we present a graph model based on a multi-dimension weighted graph for path planning with avoiding hotspot. Firstly, we extend one dimension weighted graph to multi-dimension weighted graph where multi-dimension weights are used to characterize more features of transportation. Secondly, we develop a framework equipped with many aggregate functions for transforming multi-dimension weighted graphs into one dimension weighted graphs in order to reduce the path planning of multi-dimension weighted graphs into the shortest path problem of one dimension weighted graphs. Finally, we implement our proposed framework and evaluate our system in some interesting practical examples. The experiment shows that our approach can provide “optimal” paths under the consideration of avoiding hotspots.
Position Paper:The Unreliability of Language-A Common Issue for Knowledge Engineering and Buddhism (Short Paper)
SPEAKER: unknown
ABSTRACT. The core of \emph{knowledge engineering} is to apply
different kinds of formal languages (or models) to
represent and manage human languages (or knowledge).
However, according to the studies of Kurt G\"{o}del
and Ludwig Wittgenstein, both of formal languages
and human languages are unreliable.
This finding inherently influences the development of artificial
intelligence and knowledge engineering.
On the other hand, their finding, i.e., the unreliability of languages,
was early discussed by Gautama Buddha who founded Buddhism.
In this paper, we discuss the issue of the unreliability of language
by bridging the perspectives of G\"{o}del,
Wittgenstein and Gautama. Based on the discussion,
we further
give some philosophical thoughts
from the perspective of knowledge engineering.
construction of domain ontology for engineering equipment maintenance support (Short Paper)
SPEAKER: unknown
ABSTRACT. According to the problem in the domain of engineering equipment maintenance support, such as too many knowledge points, broad scope, complex relationships, difficult in sharing and reuse, this paper put forward the category and professional field of engineering equipment maintenance support, and analyzed the knowledge sources, extracted eight core concepte such as case,product, function, damage, enviroment, phenomena, disposal and resource, and formed concept hierarchy model further, and then analyzed data properties and object properties of core concepts, and tried to construct the engineering equipment maintenance domain ontology with protege4.3, which put a solid foundation for the knowledge base and engineering equipment maintenance appication ontology.
Knowledge Representation Learning based on Chinese Knowledge Graph in Vegetable Domain (Full Paper)
SPEAKER: unknown
ABSTRACT. Knowledge graph is stored as a graph where each node represents entity and each edge represents relation between entities. Due to the problems of high complexity of the graph algorithms and severe data sparsity, it becomes very important for the researches and applications of knowledge graph that to achieve effective representation the entities and relations on the basis of knowledge graph construction.In this paper, we use vegetable entries of Baidu encyclopedia and HDwiki as data source to study the knowledge representation learning models base on vegetable knowledge graph construction.we adopted TransE model to represent vegetable triples, embedded the entities and relations into a continuous low- dimensional vector space. Thirdly, faced with the complex attribute relations of 1-N, N-1 and N-N, we came up with PTA model, constructed the relation path by combining attribute relations and hyponymy relation, and embedded the relation path into vector space as well. The result showed, without taking into account the relations classification, the link prediction results of PTA model is better than TransE models. And the total value of Hits@10 is higher than TransE model.
Space Projection and Relation Path based Representation Learning for Construction of Geography Knowledge Graph (Full Paper)
SPEAKER: unknown
ABSTRACT. Human-like intelligence has developed rapidly and it benefited from the complete knowledge graph especially primary education knowledge graph represented by geography. The traditional knowledge graph is represented by network knowledge and it is high computational complexity and can’t measure or make use of semantic association between entities effectively. This paper puts forward a new algorithm based on deep learning of knowledge representation--PTransW (Path-based TransE and Considering Relation Type by Weight). It combines the space projection with the semantic information of relation path, and consider the semantic information of relation type for further improvement. The experiment results on the FB15K and GEOGRAPHY data sets show that the ability of deal-ing with complex relation in knowledge graph is improved greatly for PTransW model. For small data sets, the training of TransE and TransR which are low complexity will be more enough. However, PTransE model and PTransW model utilize the semantic information of relation path and reverse relation and perform more outstanding in relation prediction than TransE model and TransR model.
Boosting to Build a Large-scale Cross-lingual Ontology (Full Paper)
SPEAKER: unknown
ABSTRACT. The global knowledge sharing makes large-scale multi-lingual knowledge bases an extremely valuable resource in the Big Data era. However, current mainstream Wikipedia-based multi-lingual ontologies still face the following problems: the scarcity of non-English knowledge, the noise in the multi-lingual ontology schema relations and the limited coverage of cross-lingual "owl:sameAs" relations. Building a cross-lingual ontology based on other large-scale heterogenous online wikis is a promising solution for those problems. In this paper, we firstly propose a cross-lingually boosting approach to iteratively reinforce the performance of ontology building and instance matching. Experiments on English Wikipedia and Hudong Baike output an ontology containing over 3,520,000 English instances and 800,000 Chinese instances. The F1-measure improvement of Chinese "instanceOf" relation prediction achieve the highest 32\%. At last, over 150,000 cross-lingual instance "owl:sameAs" relations are constructed.
Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link between Wikipedia and Semantic Network (Full Paper)
SPEAKER: unknown
ABSTRACT. Wikipedia has been the largest knowledge repository on the web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer pro-cessing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: 1) discover and charac-terize a large collection of relations from Wikipedia by exploiting the relation pat-tern regularity, the relation distribution regularity and the relation instance re-dundancy; and 2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 rela-tion patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.
Biomedical Event Trigger Detection Based on Hybrid Methods Integrating Word Embeddings (Full Paper)
SPEAKER: unknown
ABSTRACT. Trigger detection as the preceding task is of great importance in biomedical event extraction. By now, most of the state-of-the-art systems have been based on single classifiers, and the words encoded by one-hot are unable to represent the semantic information. In this paper, we utilize hybrid methods integrating word embeddings to get higher performance. In hybrid methods, first, multiple single classifiers are constructed based on rich manual features including dependency and syntactic parsed results. Then multiple predicting results are integrated by set operation, voting and stacking method. Hybrid methods can take advantage of the difference among classifiers and make up for their deficiencies and thus improve performance. Word embeddings are learnt from large scale unlabeled texts and integrated as unsupervised features into other rich features based on dependency parse graphs, and thus a lot of semantic information can be represented. Experimental results show our method outperforms the state-of-the-art systems.
GRU-RNN based Question Answering over Knowledge Base (Full Paper)
SPEAKER: unknown
ABSTRACT. Building system that could answer questions in natural language is
one of the most important natural language processing applications. Recently,
the raise of large-scale open-domain knowledge base provides a new possible
approach. Some existing systems conduct question-answering relaying on hand-
craft features and rules, others work try to extract features by popular neural
networks. In this paper, we adopt recurrent neural network to understand questions and find out the corresponding answer entities from knowledge bases based
on word embedding and knowledge bases embedding. Question-answer pairs are
used to train our multi-step system. We evaluate our system on FREEBASE and
WEBQUESTIONS. The experimental results show that our system achieves com-
parable performance compared with baseline method with a more straightforward
structure.
An Initial Ingredient Analysis of Drugs Approved by China Food and Drug Administration ( (Short Paper))
SPEAKER: unknown
ABSTRACT. Drug is an important part of medicine. Drug knowledge bases that organize and manage drugs have attracted considerable attention, and have been widely used in human health care in many countries and regions. There are also a large number of electronic drug knowledge bases publicly available. In China, however, there is hardly any publicly available well-structured drug knowledge base, may due to two different types of medicine: Chinese traditional medicine (CTM) and modern medicine (ME). In order to analyse components of drugs approved by China Food and Drug Administration (CFDA), we developed a preliminary ingredient drug analysis system. This system collects all drug names from the website of CFDA, obtains their manuals from three medical websites, extracts the ingredients of drugs, and analyses the distribution of the extracted ingredients. Totally, 12,918 out of 19,490 drug manuals were collected. Evaluation on randomly selected 50 drug manuals shows that the system achieves an F-score of 95.46% on ingredient extraction. According to the distribution of the extraction ingredients, we find that ingredient multiplexing is very common in medicine, especially in herbal medicine, which may provide a clue for drug safety as taking more than one type of drug that contains partially the same ingredients may cause overtaking the same ingredients.
ABSTRACT. Forgetting is a useful tool for tailoring ontologies by reducing the
number of concepts and roles. The issue of forgetting for general ontologies in
more expressive description logics, such as ALCQ and SHIQ, is largely unexplored.
In particular, those problems of characterizing the forgetting-based reasoning
and computing the result of forgetting are still open. In this paper, we develop
a decidable, sound and complete tableau-based algorithm to implement the
forgetting-based reasoning. Our tableau algorithm is technically feasibly extended
to explore the forgetting in more expressive ontology languages. Furthermore,
we employ the rolling-up technique to compute the resulting of forgetting based
on the complete forest after forgetting.
ABSTRACT. The huge amounts of linked data available on the web are a valuable resource for the development of semantic applications. However, these applications often meet the challenges posed by flawed or incomplete schema, which would lead to the loss of meaningful facts. Association rule mining, as a successive way to discover implicit knowledge in RDF data, has been applied to learn many types of axioms. In this paper, we first make use of a statistical approach based on the association rule mining to enrich OWL ontologies. Then we propose some improvements according to this approach. Finally, we describe the quality of the automatically acquired axioms by evaluations on DBpedia datasets.
A Mixed Method for Building the Uyghur and Chinese Domain Ontology (Short Paper)
SPEAKER: unknown
ABSTRACT. As the increasing demands of multilingual semantic query on the World Wide Web, the research on multilingual ontology has gradually become a hot spot. But the study of multilingual ontology on professional field is relatively rare, and a few of the many existing are about the public domain. This paper describes and designs the mixed method for building a new multilingual ontology. By using the above mixed method, construct Uyghur and Chinese bilingual ontology about University management field, through alignment and mapping the concepts and the relations between the different language ontology then merging into one body - multilingual ontology. Finally, preliminary realized semantic query about multilingual ontology using SPARQL, so that will provide basic support for minority languages cross-lingual information retrieval from the perspective of the professional field.
Link Prediction via Mining Markov Logic Formulas to Improve Social Recommendation (Full Paper)
SPEAKER: unknown
ABSTRACT. Social networks have been a main way to obtain information in recent years, but the huge amount of information obstructs people from obtaining something that they are really interested in. Social recommendation system is introduced to solve this problem and brings a new challenge of predicting people’s preferences. In a graph view, social recommendation can be viewed as link prediction task on the social graph. Therefore, some link prediction technique can apply to social recommendation. In this paper, we propose a novel approach to bring logic formulas in social recommendation system and it can improve the accuracy of recommendations. This approach is made up of two parts: (1) It treats the whole social network with kinds of attributes as a semantic network, and finds frequent structures as logic formulas via random graph algorithms. (2) It builds a Markov Logic Network to model logic formulas, attaches weights to each of them to measure formulas’ contributions, and then learns the weights discriminatively from training data. In addition, the formulas with weights can be viewed as the reason why people should accept a specific recommendation, and supplying it for people may increase the probability of people accepting the recommendation. We carry out several experiments to explore and analyze the effects of various factors of our method on recommendation results, and get the final method to compare with baselines.
Graph-based Jointly Modeling Entity Detection and Linking in Domain-Specific Area (Full Paper)
SPEAKER: unknown
ABSTRACT. The current state-of-the-art Entity Detection and Linking (EDL) systems are geared towards general corpora and cannot be directly applied to the specific domain effectively due to the fact that texts in domain-specific area are often noisy and contain phrases with ambiguous meanings that easily could be recognized as entity mention by traditional EDL methods but actually should not be linked to real entities (i.e., False Entity mention (FEM)). Moreover, in most current EDL literatures, ED (Entity Detection) and EL (Entity Linking) are frequently treated as equally important but separate problems and typically performed in a pipeline architecture without considering the mutual dependency between these two tasks. Therefore, to rigorously address the domain-specific EDL problem, we propose an iterative graph-based algorithm to jointly model the ED and EL tasks in domain-specific area by capturing the local dependency of mention-to-entity and the global interdependency of entity-to-entity. We extensively evaluated the performance of proposed algorithm over a data set of real world movie comments, and the experimental results show that the proposed approach significantly outperforms the state-of-the-art baselines and achieve 82.7% F1 score for ED and 89.0% linking accuracy for EL respectively.
LD2LD: Integrating, Enriching and Republishing Library Data as Linked Data (Full Paper)
SPEAKER: unknown
ABSTRACT. The development of digital library increases the need of integrate, enrich and republish library data as linked data. Linked library data could provide high quality and more tailored service for researchers as well as for the public. However, even though there are many data sets containing metadata about publications and researchers, it is cumbersome to integrate and analyze them, since the collection is still a manual process and the sources are not connect-ed to each other upfront. In this paper, we present an approach for integrating, enriching and republishing library data as linked data. In particular, we first adopt duplication detection and disambiguation techniques to reconcile researcher data, and then we connect researcher data with publication data such as papers, patents and monograph using entity linking methods. After that, we use simple reasoning to predict missing values and enrich the library data with external data. Finally, we republish the integrated and predicted values as linked data.
Object Clustering in Linked Data using Centrality (Full Paper)
SPEAKER: unknown
ABSTRACT. The volume of linked data is growing continuously. Large-scale linked data, such as DBpedia, is becoming a challenge to many Semantic Web tasks. While clustering of graphs has been deeply researched in network science and machine learning, not many researches are carried on clustering in linked data. To identify this meta-structure in large-scale linked data, the scalability of clustering should be considered. In this paper, we propose a scalable approach of centrality-based clustering, which works on a model of Object Graph derived from RDF graph. Centrality of objects is calculated as indicators for clustering. Both relational and linguistic closeness among objects are considered in clustering to produce coherent clusters.
Research on Knowledge Fusion Connotation and Process Model (Full Paper)
SPEAKER: unknown
ABSTRACT. The emergence of big-data brings diversified structures and constant growths of knowledge. The objective of knowledge fusion (KF) research is to integrate, discover and exploit valuable knowledge from distributed, heterogeneous and autonomous knowledge sources, which is the necessary prerequisite and effective approach to implement knowledge services. In order to apply KF practice, this paper firstly discusses KF connotations in terms of analyzing the relations and differences among various notions, i.e. knowledge fusion, knowledge integration, information fusion and data fusion. Then, based on knowledge representation methods using ontology, this paper investigates several KF implementation patterns and provides two types of dimensional KF process models oriented to demands of knowledge services.
E-SKB: A Semantic Knowledge Base for Emergency (Short Paper)
SPEAKER: unknown
ABSTRACT. Although the number of knowledge bases in Linked Open Data has grown explosively, there are few knowledge bases about emergency, an important issue in the area of social management. In this paper, we introduce a semantic knowledge base of emergency, extracted from an authoritative website. According to the characteristics of the website, a framework is suggested to convert web into RDF. In order to help researchers acquire more knowledge, we follow the publishing rules of Linked Open Data—not only using URIs to label the objects in the semantic knowledge base, but also providing links to DBpedia. Finally, we employ Sesame to store and publish the semantic knowledge base, and develop a query interface to retrieve the knowledge base with SPARQL.