Information Extraction and Mapping for Populating Ontologies in Product Lifecycle Management
Beera, Damayanthi Jesudas
MetadataShow full item record
Computer-based tools used to manage a product's lifecycle from the cradle to the grave need to share product information across phases. If this information is well structured, accessing, and manipulating such information across the lifecycle by these tools becomes effective and efficient. Ontologies are used to capture the semantics and model information of a domain. The primary issue in creating and using ontologies is in populating them with instances and their attributes. It requires domain experts to spend considerable time and effort to identify the information and structure it when done manually. This thesis addresses the problem of populating ontologies automatically. Automation of the task of populating ontologies requires the acquisition of information and structuring it. The sources of information in this thesis are text containing domain knowledge, and the domain of interest is design and manufacturing in PLM. Since the input is plain text, natural language processing (NLP) techniques are used to analyze the text. The input text is first parsed using shallow NLP, namely Part-of-speech tagging. This thesis then follows a two-step process to populate the ontology – the first step is information extraction from the parsed text. This is followed by mapping the new instance identified from the text to the reference ontology. The objective in the first step is to take the parsing step's output and process this text to extract information contained in the text in some structured form based on a reference ontology. The thesis describes three broad approaches to this problem. The first potential concepts in an ontology are extracted through semantic role labeling. Second, concepts from reference ontology are used as seed words to identify instances of the concept from the text parsed through part-of-speech tagging. Finally, instances of concepts in reference ontology are obtained from the part-of-speech tagged text by developing rules specific to the ontology domain. In the second step, the new concepts or instances extracted are mapped to the reference ontology for establishing semantic equivalence. The mapping step decides whether the concept represents a new entry or is a synonym of an existing entity in the ontology, or does not belong to the ontology based on the established semantic equivalence level. If there is a match at the level of an instance, it is added to the ontology as a synonym of an existing instance; if the match is at the level of a concept, it is added as a new instance, and if there is no match it is ignored. Results of semantic mapping are validated using the confusion matrix and Tversky's Index for semantic mapping of assembly ontology. The thesis concludes by summarizing the contributions and identifying a few avenues for further research.