Search Engine For Ebook Portal
Journal: International Journal of Scientific & Technology Research (Vol.6, No. 5)Publication Date: 2017-05-15
Authors : Prashant Kanade; Aishwarya Sadasivan; Komal Dhuri; Manaswini Muralidaran; Meghna Mohan;
Page : 77-81
Keywords : similarity modeling; clustering; elastic search; vector space model; term frequency-inverse document frequency tf-idf matrix; language modeling; spell checking; text segmentation.;
Abstract
The purpose of this paper is to establish the textual analytics involved in developing a search engine for an ebook portal. We have extracted our dataset from Project Gutenberg using a robot harvester. Textual Analytics is used for efficient search retrieval. The entire dataset is represented using Vector Space Model where each document is a vector in the vector space. Further for computational purposes we represent our dataset in the form of a Term Frequency- Inverse Document Frequency tf-idf matrix. The first step involves obtaining the most coherent sequence of words of the search query entered. The entered query is processed using Front End algorithms this includes-Spell Checker Text Segmentation and Language Modeling. Back End processing includes Similarity Modeling Clustering Indexing and Retrieval. The relationship between documents and words is established using cosine similarity measured between the documents and words in Vector Space. Clustering performed is used to suggest books that are similar to the search query entered by the user. Lastly the Lucene Based Elasticsearch engine is used for indexing on the documents. This allows faster retrieval of data. Elasticsearch returns a dictionary and creates a tf-idf matrix. The processed query is compared with the dictionary obtained and tf-idf matrix is used to calculate the score for each match to give most relevant result.
Other Latest Articles
- Analysis Of Formation Damage During The Drilling Of Horizontal Wells
- The Influence Of Globalisation And Modern Technological Changes On Manufacturing Industries In Libya
- Mobile-Based Medical Health Application - Medi-Chat App
- Managed Sustainable Development Classification Of Resources And Goods 26 Services Calculating Sustainable Growth Rate And The Sustainable Development Index
- Lean Leadership - Organizational Buy - Ins
Last modified: 2017-06-11 23:01:23