Keyword Extraction – Comparison of Latent Dirichlet Allocation and Latent Semantic Analysis

Bhuvaneshwari Kondeti; Jyothirani S. A; Haragopal V. V

doi:10.24018/ejmath.2022.3.3.119

Research Article

Bhuvaneshwari Kondeti

Osmania University, India

* Corresponding author

Jyothirani S. A

Osmania University, India

Haragopal V. V

BITS-Pilani, India

$DOI icon$ 10.24018/ejmath.2022.3.3.119

Read Counter
661

Downloads
550

Citations

Share

Submitted 2022-04-27
Published 2022-06-13

Read counter = 661 times

Abstract

The main aim of the present study is to compare the keywords extracted from abstracts and full length text of scientific research papers. In addition to that, here, we compare Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) to identify better performer for keyword extraction. This comparative study is divided into three levels, In the first level, scientific research articles on topics such as Indian Economic growth, GDP, Economic Slowdown etc. were collected and abstracts and full length text was extracted from the sources and pre-processed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the pre-processed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, in order to study the feasibility of the Natural Language Processing (NLP) techniques, Latent Semantic analysis (LSA) and Latent Dirichlet Allocations (LDA) methods were applied over the resultant corpus.

Keywords: Keyword Extraction, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Natural language Processing

References

Tan AH. Text mining: The state of the art and the challenges. In Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases. 1999; 8: 65-70.
Google Scholar

Hussein H, Hafez A, Mathkour H. Selection criteria for text mining approaches. Computers in Human Behavior. 2015; 51: 729-733.
Google Scholar

Firoozeh N, Nazarenko A, Alizon F, Daille B. Keyword extraction: Issues and methods. Natural Language Engineering. 2020; 26(3): 259-291.
Google Scholar

Slobodan B, Mestrovic A, Martincic-Ipsic S. An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences. 2015; 39(1): 1-20.
Google Scholar

Hong L. Text feature extraction based on deep learning: a review. EURASIP Journal on Wireless Communications and Networking. 2017; 1: 1-12.
Google Scholar

Merchant K, Pande Y. NLP Based Latent Semantic Analysis for Legal Text Summarization. International Conference on Advances in Computing, Communications and Informatics (ICACCI). 2018: 1803-1807.
Google Scholar

Hong T, Phan TT, Nguyen KP. An adaptive latent semantic analysis for text mining. International Conference on System Science and Engineering (ICSSE). 2017.
Google Scholar

Suzek, TO. Using latent semantic analysis for automated keyword extraction from large document corpora. Turkish Journal of Electrical Engineering & Computer Sciences. 2017; 25(3): 1784-1794.
Google Scholar

HM Mahedi, Sanyal F, Chaki D. A novel approach to extract important keywords from documents applying latent semantic analysis. 10th International Conference on Knowledge and Smart Technology (KST). 2018,
Google Scholar

Huaijin P, Jing W, Qiwei S. Improving Text Models with Latent Feature Vector Representations. 13th International Conference on Semantic Computing (ICSC). 2019: 154-157,
Google Scholar

Niu L, Dai X, Zhang J, Chen J. Topic2Vec: Learning distributed representations of topics. 2015 International Conference on Asian Language Processing (IALP). 2015: 193-196.
Google Scholar

Qi L. An Efficient Method for Text Classification Task. Proceedings of the 2019 International Conference on Big Data Engineering. 2019.
Google Scholar

Murugan A, Hill C, Nolan T. Text pre processing. Practical Text Analytics. Springer, Cham, 2019: 45-59.
Google Scholar

Onan A, Korukoglu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications. 2016; 57: 232-247.
Google Scholar

Wild F, Stahl C. Investigating Unstructured Texts with Latent Semantic Analysis. Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. 2007.
Google Scholar

Landauer, TK. Handbook of latent semantic analysis. Psychology Press, 2013.
Google Scholar

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003.
Google Scholar

Shams, Mohammadreza, Ahmad Baraani-Dastjerdi. Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Systems with Applications. 2017; 80: 136-146.
Google Scholar

Downloads

PDF

How to Cite

Keyword Extraction – Comparison of Latent Dirichlet Allocation and Latent Semantic Analysis. (2022). European Journal of Mathematics and Statistics, 3(3), 40-47. https://doi.org/10.24018/ejmath.2022.3.3.119

Issue

Vol. 3 No. 3 (2022)

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Tan AH. Text mining: The state of the art and the challenges. In Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases. 1999; 8: 65-70.
Google Scholar

[2] Hussein H, Hafez A, Mathkour H. Selection criteria for text mining approaches. Computers in Human Behavior. 2015; 51: 729-733.
Google Scholar

[3] Firoozeh N, Nazarenko A, Alizon F, Daille B. Keyword extraction: Issues and methods. Natural Language Engineering. 2020; 26(3): 259-291.
Google Scholar

[4] Slobodan B, Mestrovic A, Martincic-Ipsic S. An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences. 2015; 39(1): 1-20.
Google Scholar

[5] Hong L. Text feature extraction based on deep learning: a review. EURASIP Journal on Wireless Communications and Networking. 2017; 1: 1-12.
Google Scholar

[6] Merchant K, Pande Y. NLP Based Latent Semantic Analysis for Legal Text Summarization. International Conference on Advances in Computing, Communications and Informatics (ICACCI). 2018: 1803-1807.
Google Scholar

[7] Hong T, Phan TT, Nguyen KP. An adaptive latent semantic analysis for text mining. International Conference on System Science and Engineering (ICSSE). 2017.
Google Scholar

[8] Suzek, TO. Using latent semantic analysis for automated keyword extraction from large document corpora. Turkish Journal of Electrical Engineering & Computer Sciences. 2017; 25(3): 1784-1794.
Google Scholar

[9] HM Mahedi, Sanyal F, Chaki D. A novel approach to extract important keywords from documents applying latent semantic analysis. 10th International Conference on Knowledge and Smart Technology (KST). 2018,
Google Scholar

[10] Huaijin P, Jing W, Qiwei S. Improving Text Models with Latent Feature Vector Representations. 13th International Conference on Semantic Computing (ICSC). 2019: 154-157,
Google Scholar

[11] Niu L, Dai X, Zhang J, Chen J. Topic2Vec: Learning distributed representations of topics. 2015 International Conference on Asian Language Processing (IALP). 2015: 193-196.
Google Scholar

[12] Qi L. An Efficient Method for Text Classification Task. Proceedings of the 2019 International Conference on Big Data Engineering. 2019.
Google Scholar

[13] Murugan A, Hill C, Nolan T. Text pre processing. Practical Text Analytics. Springer, Cham, 2019: 45-59.
Google Scholar

[14] Onan A, Korukoglu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications. 2016; 57: 232-247.
Google Scholar

[15] Wild F, Stahl C. Investigating Unstructured Texts with Latent Semantic Analysis. Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. 2007.
Google Scholar

[16] Landauer, TK. Handbook of latent semantic analysis. Psychology Press, 2013.
Google Scholar

[17] Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003.
Google Scholar

[18] Shams, Mohammadreza, Ahmad Baraani-Dastjerdi. Enriched LDA (ELDA): Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Systems with Applications. 2017; 80: 136-146.
Google Scholar

Keyword Extraction – Comparison of Latent Dirichlet Allocation and Latent Semantic Analysis

Article Sidebar

Article Main Content

References