##plugins.themes.bootstrap3.article.main##

The main aim of the present study is to compare the keywords extracted from abstracts and full length text of scientific research papers. In addition to that, here, we compare Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) to identify better performer for keyword extraction. This comparative study is divided into three levels, In the first level, scientific research articles on topics such as Indian Economic growth, GDP, Economic Slowdown etc. were collected and abstracts and full length text was extracted from the sources and pre-processed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the pre-processed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, in order to study the feasibility of the Natural Language Processing (NLP) techniques, Latent Semantic analysis (LSA) and Latent Dirichlet Allocations (LDA) methods were applied over the resultant corpus.

References

  1. Tan AH. Text mining: The state of the art and the challenges. In Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases. 1999; 8: 65-70.
     Google Scholar
  2. Hussein H, Hafez A, Mathkour H. Selection criteria for text mining approaches. Computers in Human Behavior. 2015; 51: 729-733.
     Google Scholar
  3. Firoozeh N, Nazarenko A, Alizon F, Daille B. Keyword extraction: Issues and methods. Natural Language Engineering. 2020; 26(3): 259-291.
     Google Scholar
  4. Slobodan B, Mestrovic A, Martincic-Ipsic S. An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences. 2015; 39(1): 1-20.
     Google Scholar
  5. Hong L. Text feature extraction based on deep learning: a review. EURASIP Journal on Wireless Communications and Networking. 2017; 1: 1-12.
     Google Scholar
  6. Merchant K, Pande Y. NLP Based Latent Semantic Analysis for Legal Text Summarization. International Conference on Advances in Computing, Communications and Informatics (ICACCI). 2018: 1803-1807.
     Google Scholar
  7. Hong T, Phan TT, Nguyen KP. An adaptive latent semantic analysis for text mining. International Conference on System Science and Engineering (ICSSE). 2017.
     Google Scholar
  8. Suzek, TO. Using latent semantic analysis for automated keyword extraction from large document corpora. Turkish Journal of Electrical Engineering & Computer Sciences. 2017; 25(3): 1784-1794.
     Google Scholar
  9. HM Mahedi, Sanyal F, Chaki D. A novel approach to extract important keywords from documents applying latent semantic analysis. 10th International Conference on Knowledge and Smart Technology (KST). 2018,
     Google Scholar
  10. Huaijin P, Jing W, Qiwei S. Improving Text Models with Latent Feature Vector Representations. 13th International Conference on Semantic Computing (ICSC). 2019: 154-157,
     Google Scholar
  11. Niu L, Dai X, Zhang J, Chen J. Topic2Vec: Learning distributed representations of topics. 2015 International Conference on Asian Language Processing (IALP). 2015: 193-196.
     Google Scholar
  12. Qi L. An Efficient Method for Text Classification Task. Proceedings of the 2019 International Conference on Big Data Engineering. 2019.
     Google Scholar
  13. Murugan A, Hill C, Nolan T. Text pre processing. Practical Text Analytics. Springer, Cham, 2019: 45-59.
     Google Scholar
  14. Onan A, Korukoglu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications. 2016; 57: 232-247.
     Google Scholar
  15. Wild F, Stahl C. Investigating Unstructured Texts with Latent Semantic Analysis. Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. 2007.
     Google Scholar
  16. Landauer, TK. Handbook of latent semantic analysis. Psychology Press, 2013.
     Google Scholar
  17. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003.
     Google Scholar
  18. Shams, Mohammadreza, Ahmad Baraani-Dastjerdi. Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Systems with Applications. 2017; 80: 136-146.
     Google Scholar