1. bookVolume 70 (2019): Issue 2 (December 2019)
Journal Details
License
Format
Journal
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English
access type Open Access

Relevant Criteria for Selection of Spoken Data: Theory Meets Practice

Published Online: 21 Dec 2019
Page range: 324 - 335
Journal Details
License
Format
Journal
First Published
05 Mar 2010
Publication timeframe
2 times per year
Languages
English

The present paper seeks to review relevant criteria used in classifying speech events (SEs) from the perspective of spoken corpus design. The primary goal is to survey the landscape of possible types of spoken language, so as to assess in which directions the coverage of spoken Czech offered by Czech National Corpus corpora can be expanded in the future. We approach the problem from both theoretical and practical points of view, examining what the theoretical literature has to say as well as approaches implemented in practice by existing spoken corpora of various languages. We then synthesize the obtained information into a pragmatically motivated set of SE classification criteria which does not aspire to be universal or definitive but aims to serve as a useful guiding principle and conceptual framework for understanding and promoting SE diversity when collecting spoken data.

Keywords

[1] Svartvik, J. (ed.) (1990). The London-Lund Corpus of Spoken English: Description and Research. Lund Studies in English 82.Search in Google Scholar

[2] Deppermann, A., and Hartung, M. (2012). Was gehört in ein nationales Gesprächskorpus? Kriterien, Probleme und Prioritäten der Stratifikation des “Forschungs- und Lehrkorpus Gesprochenes Deutsch” (FOLK) am Institut für Deutsche Sprache (Mannheim). In Felder, E., Müller, M., and Vogel, F. (eds). Korpuspragmatik, pages 414–450, Berlin, de Gruyter.Search in Google Scholar

[3] Kopřivová, M. (2017). Mluvený korpus. In P. Karlík, M. Nekula, and J. Pleskalová (eds.), CzechEncy – Nový encyklopedický slovník češtiny.Search in Google Scholar

[4] Gajdošová, K., and Šimková, M. (2018). Frekvenčný slovník hovorenej slovenčiny na báze Slovenského hovoreného korpusu. Bratislava, VEDA.Search in Google Scholar

[5] Hirschová, M. (2017). Komunikační situace. In Karlík, P., Nekula, M., and Pleskalová, J. (eds.), CzechEncy – Nový encyklopedický slovník češtiny. Accessible at: https://www.czechency.org/slovnik/KOMUNIKAČNÍSITUACE.Search in Google Scholar

[6] Chloupek, J. (1986). Dichotomie spisovnosti a nespisovnosti. Brno, Filozofická fakulta. Spisy univerzity J. E. Purkyně v Brně.Search in Google Scholar

[7] Daneš, F. et al. (1997). Český jazyk na přelomu tisíciletí. Praha, Academia.Search in Google Scholar

[8] Hoffmannová, J. et al. (2016). Stylistika mluvené a psané češtiny. Praha, Academia.Search in Google Scholar

[9] Ervin-Tripp, S. M. (1964). An Analysis of the Interaction of Language, Topic and Listener. American Anthropologist 66, pages 86–102.Search in Google Scholar

[10] Vachek, J. (1942). Psaný jazyk a pravopis. In Čtení o jazyce a poesii, pages 231–306.Search in Google Scholar

[11] Hoffmannová, J. and Zeman, J. (2017). Výzkum syntaxe mluvené češtiny: inventarizace problémů, Slovo a slovesnost 78(1), pages 45–66.Search in Google Scholar

[12] Clancy, B. (2015). Investigating Intimate Discourse: Exploring the spoken interaction of families, couples and friends. Routledge.Search in Google Scholar

[13] Čermák, F. (2009). Spoken Corpora Design: Their Constitutive Parameters. International Journal of Corpus Linguistics 14(1), pages 113–123.Search in Google Scholar

[14] Joos, M. (1967). The five clocks. New York, Harcourt Brace & World.Search in Google Scholar

[15] Chloupek, J. (1995). Sjednocující a rozrůzňující faktory v mluvené komunikaci. In K diferenciaci současného mluveného jazyka, pages 33–39, Ostrava, Repronis.Search in Google Scholar

[16] Knowles, G., Taylor, L., and Williams, B. (1996). A Corpus of Formal British English Speech: The Lancaster/IBM Spoken English. Routledge, London & NY.Search in Google Scholar

[17] Love, R., Dembry, C. Hardie A., Brezina, V., and McEnery, T. (2017). The Spoken BNC2014. Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, pages 319–344.Search in Google Scholar

[18] Burnard, L. (ed.) (2000). The British National Corpus Users Reference Guide. Accessible at: http://www.natcorp.ox.ac.uk/docs/userManual/Search in Google Scholar

[19] Oostdijk, N. (2002). The Design of the Spoken Dutch Corpus. In Peters, P., Collins, P., and Smith, A. (eds.), New Frontiers of Corpus Research. Amsterdam, pages 105–112.Search in Google Scholar

[20] Oostdijk, N. et al. (2002). Experiences from the Spoken Dutch Corpus Project. Proceedings of the LREC 2002, pages 340–347.Search in Google Scholar

[21] Schmidt, T. (2014). The Research and Teaching Corpus of Spoken German – FOLK. In Proceedings of the Ninth International conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland: European Language Resources Association (ELRA).Search in Google Scholar

[22] Allwood, J. et al. (2003). Annotations and Tools for an Activity Based Spoken Language Corpus. In van Kuppevelt, Jan C.J., and Smith, R.W. (eds.), Current and New Directions in Discourse and Dialogue, pages 1–18, Springer.Search in Google Scholar

[23] Šimková, M., Garabík, R., Karčová, A., and Gajdošová, K. (2008). Hovorený korpus slovenčiny. In M. Kopřivová, and M. Waclawičová: Čeština v mluveném korpusu, pages 227–233, Praha, NLN – ÚČNK.Search in Google Scholar

[24] Čermák, F. et al. (2007). Frekvenční slovník mluvené češtiny. Praha, Karolinum.Search in Google Scholar

[25] Hladká, Z. (2005). Zkušenosti s tvorbou korpusů češtiny v ÚČJ FF MU v Brně. In SPFFBU A 53, pages 115–124. Brno, Masarykova univerzita. Accessible at: http://hdl.handle.net/11222.digilib/101736Search in Google Scholar

[26] Kopřivová, M., Lukeš, D., Komrsková, Z., and Poukarová, P. (2017). Korpus ORAL: sestavení, lemmatizace a morfologické značkování. In Korpus – Gramatika – Axiologie 15, pages 47–67.Search in Google Scholar

[27] Komrsková, Z., Kopřivová, M., Lukeš, D., Poukarová, P., and Goláňová, H. (2017). New Spoken Corpora of Czech: ORTOFON and DIALEKT. Jazykovedný časopis, 68(2), pages 219–228.Search in Google Scholar

[28] Goláňová, H. (2015): A new dialect corpus: DIALEKT. In Gajdošová, K., and Žáková, A. (eds.): Proceedings of the Eight International Conference Slovko 2015 (Natural Language Processing, Corpus Linguistics, Lexicography), pages 36–44. Lüdenscheid, RAM-Verlag.Search in Google Scholar

[29] Šebesta, K. (2010): Korpusy češtiny a osvojování jazyka. Studie z aplikované lingvistiky, 2, pages 11–33. Accessible at: https://studiezaplikovanelingvistiky.ff.cuni.cz/wp-content/uploads/sites/19/2016/03/karel_sebesta_11-33.pdfSearch in Google Scholar

[30] Čmejrková, S., Jílková, L., and Kaderka, P. (2004). Mluvená čeština v televizních debatách: korpus DIALOG. Slovo a slovesnost, 65, pages 243–269.Search in Google Scholar

[31] Vláčil, J. (2017). Role. In Z. R. Nešpor, editor, Sociologická encyklopedie. Praha, Sociologický ústav AV ČR, v.v.i. Accessible at: https://encyklopedie.soc.cas.cz/w/RoleSearch in Google Scholar

[32] Keller, J. – Vláčil, J. (2017). Instituce. In Z. R. Nešpor (ed.), Sociologická encyklopedie. Praha, Sociologický ústav AV ČR, v.v.i. Accessible at: https://encyklopedie.soc.cas.cz/w/InstituceSearch in Google Scholar

[33] Novotná, E. (2010). Sociologie sociálních skupin. Praha, Grada.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo