Abstract | This paper studies a Bankruptcy Prediction Computational Model (BPCM model) – a comprehensive methodology of evaluating companies’ bankruptcy level, which combines storing, structuring and pre-processing of raw financial data using semantic methods with machine learning analysis techniques. Raw financial data are interconnected, diverse, often potentially inconsistent, and open to duplication. The main goal of our research is to develop data pre-processing techniques, where ontologies play a central role. We show how ontologies are used to extract and integrate information from different sources, prepare data for further processing, and enable communication in natural language. Using ontology, we give meaning to the disparate and raw business data, build logical relationships between data in various formats and sources and establish relevant context. Our Ontology of Bankruptcy Prediction (OBP Ontology) which provides a conceptual framework for companies’ financial analysis, is built in the widely established Prote ́ge ́ environment. An OBP Ontology can be effectively described with a graph database. Graph database expands the capabilities of traditional databases tackling the interconnected nature of economic data and providing graph-based structures to store information allowing the effective selection of the most relevant input features for the machine learning algorithm. To create and manage the BPCM Graph Database (Graph DB), we use the Neo4j environment and Neo4j query language, Cypher, to perform feature selection of the structured data. Selected key features are used for the Machine Learning Engine – supervised MLP Neural Network with Sigmoid activation function. The programming of this component is performed in Python. We illustrate the approach and advantages of semantic data pre-processing applying it to a representative use case. |
---|