ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

ETL for disease indicators using brute force rule-based NLP algorithm and metadata exploration

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.9, No. 90)

Publication Date:

Authors : ; ;

Page : 644-662

Keywords : PDF scraping; Unstructured data; Diagnostic lab reports; Heuristics; Brute force; Natural language processing; Metadata; Information extraction.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

As data driven decisions are based on facts, data collection can be used to lay a foundation for decision-making irrespective of industry. With the decision-making capability provided by the data from various digital medical records, the doctors can provide a precise diagnosis and a sufficient treatment by fitting together fundamentally different disease symptoms. This data manuscript describes the preparation procedure of a diabetes dataset from the panels of liver and lipid profile. The data is collected from a medical center in Srinagar, Jammu and Kashmir in the form of unstructured data reports. The unstructured data is extracted on the basis of the metadata of the source document; the required data field values of different tests are extracted from the intermediate file using the brute force pattern matching heuristics and integrated together to fill the relational database. The database can be used for further descriptive, exploratory as well as predictive data analysis and can be helpful in diagnosing and predicting the diabetes disease of the liver and lipid panels. This paper presents a novel concept to predict and detect one disease from the markers of other related disease/s as a way to fill the theoretical research gap. The detection rate achieved by our proposed brute force rule-based natural language processing (NLP) algorithm is recorded as 98.44%.

Last modified: 2022-07-01 22:02:21