Automatic Inference Multiple Topics and Trend Analysis using DSTM
Pages : 217-220
Download PDF
Abstract
In variety of domains large numbers of documents are generated every day. Mining text document and extracting useful information is challenging task. A group of words in a document describes the topic discussed in the document. Lot of work has been done for mining topic from a document set. This work focuses on analysis of time series documents like collection of news articles, series of scientific papers, posts or tweets on social media sites, etc. Topics are evolved over time and are correlated. The system finds temporal topic evolution and topic hierarchy. Along with the topic modeling, topic trend forecasting is also done. The document data is pre-processed using machine learning techniques and important words are extracted. These words are used for topic modeling. The topic modeling is performed using Latent Dirichlet Allocation and Gibbs Sampler. The important topic words are treated as topic dictionary. The performance in terms of accuracy will be compared with the existing approaches.
Keywords: Text mining, Topic forecast, Topic discovery, Cluster labeling, topic modeling, Label identification