1. bookVolumen 72 (2021): Edición 2 (December 2021)
    NLP, Corpus Linguistics and Interdisciplinarity
Detalles de la revista
License
Formato
Revista
eISSN
1338-4287
Primera edición
05 Mar 2010
Calendario de la edición
2 veces al año
Idiomas
Inglés
access type Acceso abierto

Using a parallel corpus to adapt the Flesch Reading Ease formula to Czech

Publicado en línea: 30 Dec 2021
Volumen & Edición: Volumen 72 (2021) - Edición 2 (December 2021)<br/>NLP, Corpus Linguistics and Interdisciplinarity
Páginas: 477 - 487
Detalles de la revista
License
Formato
Revista
eISSN
1338-4287
Primera edición
05 Mar 2010
Calendario de la edición
2 veces al año
Idiomas
Inglés
Abstract

Text readability metrics assess how much effort a reader must put into comprehending a given text. They are, e.g., used to choose appropriate readings for different student proficiency levels, or to make sure that crucial information is efficiently conveyed (e.g., in an emergency). Flesch Reading Ease is such a globally used formula that it is even integrated into the MS Word Processor. However, its constants are language-dependent. The original formula was created for English. So far it has been adapted to several European languages, Bangla, and Hindi. This paper describes the Czech adaptation, with the language-dependent constants optimized by a machine-learning algorithm working on parallel corpora of Czech and English, Russian, Italian, and French, respectively.

Keywords

[1] Flesch, R. (1948). A New Readability Yardstick. Journal of Applied Psychology, 32, pages 221–233.10.1037/h0057532 Search in Google Scholar

[2] Rosen, A. (2016). InterCorp – a look behind the façade of a parallel corpus. In Polskojęzyczne Korpusy Równoległe Polish-Language Parallel Corpora, pages 21–40, Instytut Lingwistyki Stosowanej, Warszawa. Search in Google Scholar

[3] DuBay, W. (2007). Smart Language. Readers, Readability, and the Grading of Text. Impact Information, Costa Mesa, California. Search in Google Scholar

[4] Šlerka, J., and Smolík, F. (2010). Automatická měřítka čitelnosti pro česky psané texty. Studie z Aplikované Lingvistiky, 1, pages 33–44. Search in Google Scholar

[5] Novák, M., Mírovský, J., Rysová, K., Rysová, M., and Hajičová, E. (2019). EVALD 4.0 – Evaluator of Discourse. Accesible at: http://hdl.handle.net/11234/1-3065. Search in Google Scholar

[6] Kincaid, J. P., Fishburne, R. P., Rogers, R. L., Chissom, B. S., and BRANCH, N.T.T.C.M.T.R. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula). for Navy Enlisted Personnel. Defense Technical Information Center. Accessible at: https://books.google.cz/books?id=7Z7ENwAACAAJ. Search in Google Scholar

[7] Coleman, M., and Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60, pages 283–284.10.1037/h0076540 Search in Google Scholar

[8] McLaughlin, G. H. (1969). SMOG grading – a new readability formula. Journal of Reading, 22, pages 639–646. Search in Google Scholar

[9] Council of Europe. (2018). Common European Framework of Reference for Languages: Learning, Teaching, Asessment. Companion volume. Council of Europe Publishing, Strasbourg. Accesible at: https://www.coe.int/lang-cefr. Search in Google Scholar

[10] Rysová, K., Rysová, M., Mírovský, J., and Novák, M. (2017). Introducing EVALD – Software Applications for Automatic Evaluation of Discourse in Czech. RANLP Proceedings, Bulgaria, pages 634–641.10.26615/978-954-452-049-6_082 Search in Google Scholar

[11] Cvrček, V., Čech, R., and Kubát, M. (2020). QuitaUp. Czech National Corpus and University of Ostrava. Accesible at: https://www.korpus.cz/quitaup/. Search in Google Scholar

[12] Dębowski, Ł., Broda, B., Nitoń, B., and Charzyńska, E. (2015). Jasnopis – A Program to Compute Readability of Texts in Polish Based on Psycholinguistic Research. Natural Language Processing and Cognitive Science, 2015 Libreria Editrice Cafoscarina, Venezia, Italy, pages 51–61. Search in Google Scholar

[13] Chen, x., and Meurers, D. (2016). CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis. Apollo – University of Cambridge Repository. Accesible at: https://www.repository.cam.ac.uk/handle/1810/292470. Search in Google Scholar

[14] Flesch, R. (1974). The art of readable writing. 2nd ed. Harper, New York. Search in Google Scholar

[15] DuBay, W. H. (2008). Unlocking Language: Classic Readability Studies. IEEE Transactions on Professional Communication, 51.10.1109/TPC.2008.2007872 Search in Google Scholar

[16] Guryanov, I., Yarmakeev, I., Kiselnikov, A., and Harkova, I. (2017). Text Complexity: Periods of Study in Russian Linguistics. Revista Publicando, 4, pages 616–625. Search in Google Scholar

[17] Oborneva, I. V. (2006). Mathematical model for evaluation of didactic texts. Proc of Moscow State Pedag Univ, 4, pages 141–147. Search in Google Scholar

[18] Garais, E.-G. (2011). Web Applications Readability. Romanian Economic Business Review, 5, pages 117–121. Search in Google Scholar

[19] Amstad, T. (1978). Wie verständlich sind unsere Zeitungen? Studenten-Schreib-Service. Search in Google Scholar

[20] Sinha, M., Sharma, S., Dasgupta, T., and Anupam, B. (2012). New Readability Measures for Bangla and Hindi Texts. Search in Google Scholar

[21] Kandel, L., and Moles, A. (1958). Application de l’indice de flesch à la langue française. Cahiers Etudes de Radio-Télévision, 19, pages 253–274. Search in Google Scholar

[22] De Landsheere, G. (1963). Pour une application des tests de lisibilité de Flesch à la langue française. Le Travail Humain, pages 141–154. Search in Google Scholar

[23] Henry, G. (1975). Comment mesurer la lisibilité. Labor, Brussels, Belgium. Search in Google Scholar

[24] François, T., and Fairon, C. (2012). An AI readability formula for French as a foreign language, 477 p. Search in Google Scholar

[25] Solnyshkina, M., Ivanov, V., and Solovyev, V. (2018). Readability Formula for Russian Texts: A Modified Version: 17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Proceedings, Part II, pages 132–145.10.1007/978-3-030-04497-8_11 Search in Google Scholar

[26] Čermák, F., and Rosen, A. (2012). The Case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 13, pages 411–427.10.1075/ijcl.17.3.05cer Search in Google Scholar

[27] Straka, M. (2018). UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Association for Computational Linguistics, Brussels, Belgium, pages 197–207. Search in Google Scholar

[28] SciPy 1.0 Contributors, Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T. et al. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, pages 261–272. Search in Google Scholar

[29] https://github.com/vanickovak/ReadabilityFormula. Search in Google Scholar

Artículos recomendados de Trend MD

Planifique su conferencia remota con Sciendo