|Title||Semantic component selection|
The means of locating information quickly and efficiently is a growing area of research. However the real challenge is not related to locating bits of information, but finding those that are relevant. Relevant information resides within unstructured ‘natural’ text. However, understanding natural text and judging information relevancy is a challenge. The challenge is partially addressed by use of semantic models and reasoning
approaches that allow categorisation and (within limited fashion) provide
understanding of this information. Nevertheless, many such methods are dependent on expert input and, consequently, are expensive to produce and do not scale.
Although automated solutions exist, thus far, these have not been able to approach accuracy levels achievable through use of expert input.
This thesis presents SemaCS - a novel nondomain specific automated framework of
categorising and searching natural text. SemaCS does not rely on expert input; it is based on actual data being searched and statistical semantic distances between words. These semantic distances are used to perform basic reasoning and semantic query
interpretation. The approach was tested through a feasibility study and two case studies. Based on reasoning and analyses of data collected through these studies, it can be concluded that SemaCS provides a domain independent approach of semantic
model generation and query interpretation without expert input. Moreover, SemaCS can be further extended to provide a scalable solution applicable to large datasets (i.e. World Wide Web).
This thesis contributes to the current body of knowledge by establishing, adapting, and using novel techniques to define a generic selection/categorisation framework.
Implementing the framework outlined in the thesis improves an existing algorithm of semantic distance acquisition. Finally, as a novel approach to the extraction of semantic information is proposed, there exists a positive impact on Information Retrieval domain and, specifically, on Natural Language Processing, word disambiguation and