Free Form Document Based Extraction Using ML
Journal: International Journal of Science and Research (IJSR) (Vol.8, No. 6)Publication Date: 2019-06-05
Authors : Mona Deshmukh; Shruti Maheshwari;
Page : 2165-2169
Keywords : spaCy; POS tagging; tokenization; OCR engine; open NLP;
Abstract
Information extraction is concerned with applying natural language processing to automatically extract required information from free form based text documents. Several machine learning techniques have been applied in order to facilitate the portability of the information extraction systems. The challenge is not just to extract data from scanned documents but also to extract it accurately. This paper describes a general method for building an information extraction system using properties such as tokenization, POS tagging, entity detection and dependency parsing along with supervised learning algorithms. In this method, the extraction decisions are lead by a set of classifiers instead of sophisticated linguistic analyses. A major problem incurred by many businesses today is insufficiency to leverage data from scanned documents and images. Whenever a business makes use of data which is to be captured from paper documents, manually entering data can impact the efficiency, system vulnerability and speed of carrying out of business. In such business cases, we need data entry automation that helps to extract data from scanned documents and automate document based business processes.
Other Latest Articles
- Role of Mifepristone and Misoprostol in Termination of Pregnancies up to 63 Days of Gestation
- Foreign Direct Investment in Morocco: Attempt to Model its Behaviour through Different Economic Indicators
- Management Practices of Family-Owned Enterprises
- A Study to Assess the Knowledge Regarding Post Natal Complications among the Nurses Working in Selected Primary Health Centres in Sangli District
- Immunomodulatory Activity of Gentiana Macrophylla Pall on C57bl/6 Mice
Last modified: 2021-06-28 18:17:02