Text Vectorization Techniques Based on Wordnet

Dávid Držík; Kirsten Šteflovič

Open Access

Text Vectorization Techniques Based on Wordnet

Dávid Držík

and

Kirsten Šteflovič

| Dec 25, 2023

Journal of Linguistics/Jazykovedný casopis

Volume 74 (2023): Issue 1 (June 2023)

About this article

Cite

Page range: 310 - 322

DOI: https://doi.org/10.2478/jazcas-2023-0048

Keywords
word embedding, Word2Vec, Glove, synsets, text data augmentation, semantic similarity

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The utilization of text vectorization techniques has become essential for numerous classification tasks in present-day natural language processing. Word embedding methods commonly used today, such as Word2Vec, GloVe, etc., are based on the semantic similarity of words. WordNet, as a lexical database of words, provides a rich source of semantic information. In our article, we propose a text vectorization technique using extended text data with the data augmentation method, specifically by replacing words with their synonyms obtained from WordNet. The results obtained from text classification tasks using multiple classifiers demonstrate that expanding the corpus with this method leads to improved vector representations of words.

eISSN:: 1338-4287
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other

Journal RSS Feed

Text Vectorization Techniques Based on Wordnet

Published Online: Dec 25, 2023

Page range: 310 - 322

DOI: https://doi.org/10.2478/jazcas-2023-0048

Keywords
word embedding, Word2Vec, Glove, synsets, text data augmentation, semantic similarity

© 2023 Dávid Držík et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Text Vectorization Techniques Based on Wordnet

Published Online: Dec 25, 2023

Page range: 310 - 322

DOI: https://doi.org/10.2478/jazcas-2023-0048

Keywordsword embedding, Word2Vec, Glove, synsets, text data augmentation, semantic similarity

© 2023 Dávid Držík et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
word embedding, Word2Vec, Glove, synsets, text data augmentation, semantic similarity