Data Knowledge Management


Data and Knowledge are two essential cornerstones for building AI systems. From the one hand, the increased level of complexity, sophistication, and pervasiveness of the information technology in our everyday life, demands for IT applications that smoothly adapt to many (never seen) situations. As a consequence applications require an accurate representation of the knowledge about “the world” where they are suppose to operate. From the other hand, nowadays pervasive internet of things in combination with the world wide (semantic) web make available an extraordinary amount of data and knowledge under a multitude of forms, spamming from completely unstructured content like, text, images, audio and video, to well structured data as databases, linked data, ontologies and knowledge bases

The main objective of the DKM research unit is to develop well founded approaches to data interpretation, knowledge acquisition, knowledge representation, knowledge integration and reasoning, and implement them in realistic applications that can support the management of knowledge and data in real IT systems.

Contributions to Smart Cities and Communities

The DKM units contributes to the City Sensing project of the SCC high impact initiative, by extending the technologies for analyzing multimedia documents that describes what is happening in the community. The main outcome of this analysis is an event graph, i.e., a graph where nodes correspond to events with their attributes (participants, roles, locations, time, …) and arcs describe the relations between events (temporal, causal, etc.) The Event Graph of a community is a structured definition of the main behaviour of such community and constitutes the bases for developing algorithms for community analysis and decision support.

Contributions to Artificial Intelligence

Knowledge based Multimedia semantic interpretation. Develop systems which are capable to read text and look at images and video, and provide an integrated semantic representation of their content.

Integration of Logical Reasoning and Machine Learning. It is well known that a fully cognitive capability of artificial agents can be obtained only by integrating the inference capability with the learning capability.  We are interested in developing approaches that integrate neural networks with fuzzy logics, and bayesian reasoning with logical reasoning.

Multi Contexts Systems and Defeasible Reasoning. The amount of knowledge one artificial agent need is not representable in a unique coherent knowledge base, but it is more appropriate to represent knowledge as a set of interconnected modules (aka contexts).  In this area we are investigating the theory and implementation of Multi Context Systems (MCS) and Defeasible Reasoning within MCS.

Key projects and results

  • PIKES is a framework for extracting knowledge from text that adopts a 2-phase approach. First, an RDF graph of mentions is built by distilling the output of several state-of-the-art NLP tools for tokenization, part-of-speech tagging, lemmatization, named entity recognition and classification, temporal expression recognition and normalization, parsing, coreference resolution, word sense disambiguation, entity linking, and semantic role labelling. Then, the mention graph is processed to distill, using SPARQL-like mapping rules, a knowledge graph representing the content conveyed by the text.
  • The KnowledgeStore is a scalable, fault-tolerant, and Semantic Web grounded storage system to jointly store, manage, retrieve, and semantically query, both structured and unstructured data. The KnowledgeStore plays a central role in the NewsReader EU project: it stores all contents that have to be processed and produced in order to extract knowledge from news, and it provides a shared data space through which NewsReader components cooperate.
  • RDFpro (RDF Processor) is a public domain, Java command line tool and library for RDF processing. RDFpro offers a suite of stream-oriented, highly optimized RDF processors for common tasks that can be assembled in complex pipelines to efficiently process RDF data in one or more passes. RDFpro originated from the need of a tool supporting typical Linked Data integration tasks, involving dataset sizes up to few billions triples.
  • The Contextualized Knowledge Repository (CKR) is a knowledge representation and reasoning framework that build on Semantic Web technologies to represent, store, query and reason with contextualized knowledge, i.e. knowledge that holds under specific circumstances or contexts. The CKR addresses an arising needs in the Semantic Web, where as large amounts of Linked Data are published on the Web, it is becoming apparent that the validity of published knowledge is not absolute, but often depends on time, location, topic, and other contextual attributes.
  • KnowPic is a framework for semantic image interpretation that leverages ontological knowledge for understanding the content of pictures. KnowPic extracts structured information from images and represent it in an RDF graph (= set of triples). The RDF graph produced by KnowPic contains nodes that refer to objects detected in the picture (bounding boxes), their types and their semantic relations. Object types and object relations are semantically described by the domain ontology. The automatic generation of picture content in terms of an RDF graph, as the one proposed by KnowPic, opens the possibility to use standard well-developed techniques for semantic image processing.

Head of Unit