Research Article | | Peer-Reviewed

Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach

Received: 21 November 2023     Accepted: 14 December 2023     Published: 22 December 2023
Views:       Downloads:
Abstract

Idiomatic phrases are natural components of all languages that cannot be comprehended straight from the word from which they are generated. Vector representations are a key method that bridges the human understanding of language to that of machines and solves many NLP problems. Idiomatic expression representation is necessary for machine learning, deep learning, and natural language processing applications. Machine learning and deep learning techniques have not been used to process text as input for natural language processing applications in previous literature. As such, in order to study natural language processing with machine learning and deep learning methods, vector or numeric representations of idiomatic statements are needed. Therefore, this research aimed at the proposed vector representation of Amharic idioms for NLP applications through vector representation models. Researchers that study natural language processing use this format, and for classification or regression, they employ machine learning and deep learning techniques. Before doing NLP application researches on Amharic idiom, first, it requires vector or numeric representation using suitable methods. We used five hundred idiomatic expressions from Amharic Idioms book as a dataset for this representation, which are comprised of two words. To evaluate performance, we employed the accuracy, precision, recall, and F-score metrics. The dataset produced a result of 95.5% accuracy.

Published in Machine Learning Research (Volume 8, Issue 2)
DOI 10.11648/j.mlr.20230802.11
Page(s) 17-22
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2023. Published by Science Publishing Group

Keywords

Amharic Idiom, Machine Learning, Vector Representation, Word2vector

References
[1] A. A. F. &. S. Gebeyehu, "Automatic Idiom Identification Model for Amharic Language," ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 8, Article 210 https://doi.org/10.1145/3606864, p. 9, 2023.
[2] G. D. Salton, "Representations of Idioms for Natural Language Processing: Idiom type and token identification, Language Modelling and Neural Machine Translation," Doctotal thesis, DIT, 2017. doi.org/10.21427/D77H8K, 2017.
[3] A. A. &. D. Worku, Amharic Idioms 2nd edition, Addis Abeba, Ethiopia: Kuraz publishing Agency, 1992.
[4] J. P. &. A. Feldman, "Automatic Idiom Recognition with Word Embeddings," in In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig SIMBig 2015 2016. Communications in Computer and Information Science, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-319-55209-5_2, 2017.
[5] J. P. &. A. Feldman, "Experiments in Idiom Recognition," in In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2752–2761. The COLING 2016 Organizing Committee, Osaka, Japan, 2016.
[6] R. R. J. K. Giancarlo Salton, "Idiom Token Classification using Sentential Distributed Semantics," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) doi 10.18653/v1/P16-1019, Berlin, Germeny, 2016.
[7] A. M. A. Y. M. H. R. M. R. &. A. A. Mohamed A. Zahran, "Word Representations in Vector Space and their Applications for Arabic," in International Conference on Intelligent Text Processing and Computational Linguistics: Computational Linguistics and Intelligent Text Processing pp 430–443, 2015.
[8] G. W. &. X. Z. Lei Zhu, "A Study of Chinese Document Representation and Classification with Word2vec," in 2016 9th International Symposium on Computational Intelligence and Design (ISCID) DOI: 10.1109/ISCID.2016.1075, 2016.
[9] G. A. P. J. &. T. K. Yash Sharma, "Vector representation of words for sentiment analysis using GloVe," in 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT) DOI: 10.1109/INTELCCT.2017.8324059, 2017.
[10] Y. Z. Joseph Lilleberg & Yun Zhu, "Support Vector Machines and Word2vec for Text Classification with Semantic Features," in Proc. 20151IEE 14th Internations conference on Cognitive Inlormatics & Cognitive Computing, 2015.
[11] Q. V. L. &. I. S. Tomas Mikolov, "Exploiting Similarities among Languages for Machine Translation," Cornell University arXiv: 1309.4168v1 [cs.CL], 2013.
[12] K. Grzegorczyk, "Vector representations of text data in deep learning," Cornell University arXiv: 1901.01695v1, 2019.
[13] G. T. &. T. A. Abebawu Eshetu, "Learning Word and Sub-word Vectors for Amharic (Less Resourced Language)," International Journal of Advanced Engineering Research and Science, vol. 7, no. 8, 2020.
[14] A. Abebe, "Automatic Idiom identification Model for Amharic language," ir.bdu.edu.et, Bahir Dar, 2021.
Cite This Article
  • APA Style

    Abebe Fenta, A. (2023). Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Machine Learning Research, 8(2), 17-22. https://doi.org/10.11648/j.mlr.20230802.11

    Copy | Download

    ACS Style

    Abebe Fenta, A. Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Mach. Learn. Res. 2023, 8(2), 17-22. doi: 10.11648/j.mlr.20230802.11

    Copy | Download

    AMA Style

    Abebe Fenta A. Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach. Mach Learn Res. 2023;8(2):17-22. doi: 10.11648/j.mlr.20230802.11

    Copy | Download

  • @article{10.11648/j.mlr.20230802.11,
      author = {Anduamlak Abebe Fenta},
      title = {Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach},
      journal = {Machine Learning Research},
      volume = {8},
      number = {2},
      pages = {17-22},
      doi = {10.11648/j.mlr.20230802.11},
      url = {https://doi.org/10.11648/j.mlr.20230802.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20230802.11},
      abstract = {Idiomatic phrases are natural components of all languages that cannot be comprehended straight from the word from which they are generated. Vector representations are a key method that bridges the human understanding of language to that of machines and solves many NLP problems. Idiomatic expression representation is necessary for machine learning, deep learning, and natural language processing applications. Machine learning and deep learning techniques have not been used to process text as input for natural language processing applications in previous literature. As such, in order to study natural language processing with machine learning and deep learning methods, vector or numeric representations of idiomatic statements are needed. Therefore, this research aimed at the proposed vector representation of Amharic idioms for NLP applications through vector representation models. Researchers that study natural language processing use this format, and for classification or regression, they employ machine learning and deep learning techniques. Before doing NLP application researches on Amharic idiom, first, it requires vector or numeric representation using suitable methods. We used five hundred idiomatic expressions from Amharic Idioms book as a dataset for this representation, which are comprised of two words. To evaluate performance, we employed the accuracy, precision, recall, and F-score metrics. The dataset produced a result of 95.5% accuracy.
    },
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Vector Representation of Amharic Idioms for Natural Language Processing Applications Using Machine Learning Approach
    AU  - Anduamlak Abebe Fenta
    Y1  - 2023/12/22
    PY  - 2023
    N1  - https://doi.org/10.11648/j.mlr.20230802.11
    DO  - 10.11648/j.mlr.20230802.11
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 17
    EP  - 22
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20230802.11
    AB  - Idiomatic phrases are natural components of all languages that cannot be comprehended straight from the word from which they are generated. Vector representations are a key method that bridges the human understanding of language to that of machines and solves many NLP problems. Idiomatic expression representation is necessary for machine learning, deep learning, and natural language processing applications. Machine learning and deep learning techniques have not been used to process text as input for natural language processing applications in previous literature. As such, in order to study natural language processing with machine learning and deep learning methods, vector or numeric representations of idiomatic statements are needed. Therefore, this research aimed at the proposed vector representation of Amharic idioms for NLP applications through vector representation models. Researchers that study natural language processing use this format, and for classification or regression, they employ machine learning and deep learning techniques. Before doing NLP application researches on Amharic idiom, first, it requires vector or numeric representation using suitable methods. We used five hundred idiomatic expressions from Amharic Idioms book as a dataset for this representation, which are comprised of two words. To evaluate performance, we employed the accuracy, precision, recall, and F-score metrics. The dataset produced a result of 95.5% accuracy.
    
    VL  - 8
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science, Gafat Institute of Technology, Debre Tabor University, Debra Tabor, Ethiopia

  • Sections