Generative Topic Modeling in Taxonomic Structure of Genomic Data using LDA?
Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 7)Publication Date: 2014-07-30
Authors : Dnyati.S.Randhave; S.N.Deshmukh;
Page : 832-840
Keywords : Data mining; Bioinformatics (genome or protein) databases; Language models; Metagenomics;
Abstract
Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval. In this work, we focus on developing probabilistic topic models for LDA and specifically, a probabilistic topic model is proposed for data analysis and function analysis using homogenous approach and composite approach. In this paper, we aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species and tell their functional roles. To achieve this end, we firstly apply a We firstly show that generative topic model can be used to model the taxon abundance information obtained by homology based approach and study the microbial core. The model considers each sample as a ‘document’, which has a mixture of functional groups, while each functional group (also known as a ‘latent topic’) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Secondly composition-based approach to break down DNA sequences into sub-reads called the ‘N-mer’ and represents the sequences by N-mer frequencies. Then, we introduce the Latent DirichletAllocation (LDA) model to study the genome-level statistic patterns (a.k.a. latent topics) of the ‘N-mer’ features. Each estimated latent topic represents a certain component of the whole genome.
Other Latest Articles
- SECURE DATA COLLECTION IN WSN BY RANDOMIZED DISPERSIVE ROUTING
- LANGUAGE EDUCATIONAL FACTOR OF THE DETERMINING OF CONSTRUCTIVE AND DESTRUCTIVE POLITICS IN MODERN UKRAINE
- PSYCHOLOGICAL PECULIARITIES OF MANAGEMENT CONTROL IN THE SYSTEM OF ADMINISTRATIVE RELATIONS
- EMOTIONAL BURNOUT SYNDROME OF MEDICAL WORKERS OF TUBERCULOUS AND ONCOLOGICAL MEDICAL INSTITUTIONS IN THE CONTEXT OF PRIMORSK TERRITORY
- DEVELOPMENT OF STUDENTS’ COGNITIVE ABILITIES IN THE SYSTEM OF ?SCHOOL ? TECHNICAL INSTITUTION OF HIGHER EDUCATION? IN THE PROCESS OF FOREIGN LANGUAGE TEACHING
Last modified: 2014-07-30 23:40:41