ETL for disease indicators using brute force rule-based NLP algorithm and metadata exploration
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.9, No. 90)Publication Date: 2022-05-30
Authors : Ifra Altaf Muheet Ahmed Butt; Majid Zaman;
Page : 644-662
Keywords : PDF scraping; Unstructured data; Diagnostic lab reports; Heuristics; Brute force; Natural language processing; Metadata; Information extraction.;
Abstract
As data driven decisions are based on facts, data collection can be used to lay a foundation for decision-making irrespective of industry. With the decision-making capability provided by the data from various digital medical records, the doctors can provide a precise diagnosis and a sufficient treatment by fitting together fundamentally different disease symptoms. This data manuscript describes the preparation procedure of a diabetes dataset from the panels of liver and lipid profile. The data is collected from a medical center in Srinagar, Jammu and Kashmir in the form of unstructured data reports. The unstructured data is extracted on the basis of the metadata of the source document; the required data field values of different tests are extracted from the intermediate file using the brute force pattern matching heuristics and integrated together to fill the relational database. The database can be used for further descriptive, exploratory as well as predictive data analysis and can be helpful in diagnosing and predicting the diabetes disease of the liver and lipid panels. This paper presents a novel concept to predict and detect one disease from the markers of other related disease/s as a way to fill the theoretical research gap. The detection rate achieved by our proposed brute force rule-based natural language processing (NLP) algorithm is recorded as 98.44%.
Other Latest Articles
- A narrative review of medical image processing by deep learning models: origin to COVID-19
- Prediction of neurodegenerative disease using brain image analysis with multilinear principal component analysis and quadratic discriminant analysis
- Efficient ensemble machine learning techniques for early prediction of diphtheria diseases based on clinical data
- Talent management by predicting employee attrition using enhanced weighted forest optimization algorithm with improved random forest classifier
- Intelligent analysis and processing large heterogeneous data for parrying threats in complex distributed systems
Last modified: 2022-07-01 22:02:21