1. bookVolume 72 (2021): Issue 2 (December 2021)
    NLP, Corpus Linguistics and Interdisciplinarity
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

Sharing Data Through Specialized Corpus-Based Tools: The Case of GramatiKat

Published Online: 30 Dec 2021
Page range: 531 - 544
Journal Details
License
Format
Journal
eISSN
1338-4287
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
Abstract

This paper presents a specialized corpus tool GramatiKat in the context of Open Science principles, namely data sharing, which offers opportunities for original research and facilitates verifiability of research and building on previous research. The tool is designed primarily for examining grammatical categories from the quantitative point of view. It offers grammatical profiles of particular lemmas (currently 14 thousand Czech nouns) and the proportion of individual grammatical categories within a part of speech, i.e., the standard behavior of a word class. The data in GramatiKat are pre-processed, statistically evaluated, and presented in charts and tables for clarity, and they are available to other linguists, especially from fields of morphology and lexicography. This article is aimed at providing inspiration and support to corpus and non-corpus linguists with utilization and enhanced use of the existing tools and with the creation of new specialized tools available to other users.

Keywords

[1] Chromý, J., and Cvrček, V. (2021). Lingvistika jako otevřená a transparentní disciplína. Naše řeč, 104(1), page 514. Search in Google Scholar

[2] Cvrček, V., and Kováříková, D. (2011). Možnosti a meze korpusové linvistiky. Naše řeč, 94(3), pages 113–133. Search in Google Scholar

[3] Kováříková, D., and Kovářík, O. (2021). GramatiKat. Prague: ÚČNK FF UK. Praha 2021. Accessible at: http://www.korpus.cz/gramatikat. Search in Google Scholar

[4] Cvrček, V., and Vondřička, P. (2011). SyD – Korpusový průzkum variant. Prague: FF UK. Accessible at: http://syd.korpus.cz. Search in Google Scholar

[5] Cvrček, V., and Vondřička, P. (2013). Morfio. Prague: ÚČNK FF UK. Accessible at: http://morfio.korpus.cz. Search in Google Scholar

[6] Cvrček, V., and Vondřička, P. (2013). KWords. Prague: ÚČNK FF UK. Accessible at: http://kwords.korpus.cz. Search in Google Scholar

[7] Vavřín, M., and Rosen, A. (2015). Treq. Prague: ÚČNK FF UK. Accessible at: http://treq.korpus.cz. Search in Google Scholar

[8] L. Lukešová (ed.). (2017). Pro školy – reportáž korpusových cvičení. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz/protokoly. Search in Google Scholar

[9] Machálek, T. (2019). Slovo v kostce – agregátor slovních profilů. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz/slovo-v-kostce. Search in Google Scholar

[10] Cvrček, V. (2019). Calc: Korpusová kalkulačka. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz/calc. Search in Google Scholar

[11] Křen, M., and Cvrček, V. (2019). Lists: Prohlížeč frekvenčních seznamů. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz/lists. Search in Google Scholar

[12] Vondřička, P. (2020). KorpusDB: Databáze slovních tvarů a lemmat doložených v korpusech ČNK. Verze 1.0. Prague: ÚČNK FF UK. Accessible at: http://db.korpus.cz/. Search in Google Scholar

[13] Cvrček, V., Čech, R., and Kubát, M. (2020). QuitaUp – nástroj pro kvantitativní stylometrickou analýzu. Czech National Corpus and University of Ostrava. Accessible at: https://korpus.cz/quitaup/. Search in Google Scholar

[14] Goláňová, H., Waclawičová, M., and Pejcha, J. (2021). Mapka: Mapová aplikace pro korpusy mluvené češtiny. Verze 1.1. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz/mapka. Search in Google Scholar

[15] Kováříková, D., and Kovářík, O. (2021). Akalex. Prague: ÚČNK FF UK. Praha 2021. Accessible at: http://www.korpus.cz/akalex. Search in Google Scholar

[16] Cvrček, V. et al. (2009). Mluvnice současné češtiny I.: Jak se píše a jak se mluví. Praha: Karolinum. Search in Google Scholar

[17] Křen, M. et al. (2020). SYN2020: reprezentativní korpus psané češtiny. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz. Search in Google Scholar

[18] Janda, L. A., and Tyers, F. M. (2018). Less is more: why all paradigms are defective, and why that is a good thing. Corpus linguistics and linguistic theory, 14(2). Accessible at: https://doi.org/10.1515/cllt-2018-0031.10.1515/cllt-2018-0031 Search in Google Scholar

[19] Křen, M. et al. (2015). SYN2015: reprezentativní korpus psané češtiny. Prague: ÚČNK FF UK. Accessible at: http://www.korpus.cz. Search in Google Scholar

[20] Čermák, F. et al. (2009). Statistiky češtiny. Prague: NLN. Search in Google Scholar

[21] Kováříková, D. et al. (2019). Lexicographer’s Lacunas or How to Deal with Missing Representative Dictionary Forms on the Example of Czech. International Journal of Lexicography, 33(1), pages 90–103. Accessible at: https://doi.org/10.1093/ijl/ecz027.10.1093/ijl/ecz027 Search in Google Scholar

[22] Akademický slovník současné češtiny (2021). Accessible at: https://slovnikcestiny.cz/uvod.php. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo