Manually annotating unstructured texts for finding significant concepts is a knowledge intensive process and, given the amount of data available on the Web and on digital libraries nowadays, it is not cost effective. Therefore automatic annotators capable to perform like human experts are extremely desirable. State of the art systems already offer good performance but they are often limited to one language, one domain of application, and can not entail concepts that do not appear but are logically/semantically implied in the text. In order to overcome this shortcomings, we propose here a novel knowledge-based, language independent, unsupervised approach towards keyphrase generation. We developed DIKpE-G, an experimental prototype system which integrates different kinds of knowledge, from linguistic to statistical, meta/structural, social, and ontological knowledge. DIKpE-G is capable to extract, evaluate, and infer meaningful concepts from a natural language text. The prototype performs well over both Italian and English texts.
A Novel Knowledge-Based Architecture for Concept Mining on Italian and English Texts
DEGL'INNOCENTI, Dante;DE NART, Dario;TASSO, Carlo
2015-01-01
Abstract
Manually annotating unstructured texts for finding significant concepts is a knowledge intensive process and, given the amount of data available on the Web and on digital libraries nowadays, it is not cost effective. Therefore automatic annotators capable to perform like human experts are extremely desirable. State of the art systems already offer good performance but they are often limited to one language, one domain of application, and can not entail concepts that do not appear but are logically/semantically implied in the text. In order to overcome this shortcomings, we propose here a novel knowledge-based, language independent, unsupervised approach towards keyphrase generation. We developed DIKpE-G, an experimental prototype system which integrates different kinds of knowledge, from linguistic to statistical, meta/structural, social, and ontological knowledge. DIKpE-G is capable to extract, evaluate, and infer meaningful concepts from a natural language text. The prototype performs well over both Italian and English texts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.