Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora

[1] A. Ali, S. Siddiq, and M. K. Malik, “Development of parallel corpus and English to Urdu statistical machine translation,” Resource, vol. 9, no. 10, 2010. [Online}. Available: https://www.academia.edu/31197083/Development_of_Parallel_Corpus_and_English_to_Urdu_Statistical_Machine_Translation Search in Google Scholar

[2] R. Srivastava and R. A. Bhat, “Transliteration systems across Indian languages using parallel corpora,” in Proceedings of the 27th Pacific Asia Conference on Language, Information and Computation (PACLIC 27), 2013, pp. 390–398. Search in Google Scholar

[3] M. M. Kenning, “What are parallel and comparable corpora and how can we use them,” in The Routledge handbook of corpus linguistics. Routledge, Jan. 2010, pp. 487–500. https://www.researchgate.net/publication/265061773_What_are_parallel_and_comparable_corpora_and_how_can_we_use_them Search in Google Scholar

[4] D. Kaur and S. Singh, “A systematic literature review on extraction of parallel corpora from comparable corpora,” Journal of Computer Science, vol. 17, no. 10, pp. 924–952, Oct. 2021. https://doi.org/10.3844/jcssp.2021.924.952 Search in Google Scholar

[5] D. Ştefănescu and R. Ion, “Parallel-Wiki: A collection of parallel sentences extracted from Wikipedia,” in Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2013), Mar. 2013, pp. 24–30. Search in Google Scholar

[6] G. P. Archana, V. S. Jithesh, L. B. Remya, and E. Sherly, “Building a parallel Corpora: Translation issues and remedial case,” in 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, Aug. 2015, pp. 2414–2417. https://doi.org/10.1109/ICACCI.2015.7275980 Search in Google Scholar

[7] J. R. Smith, C. Quirk, and K. Toutanova, “Extracting parallel sentences from comparable Corpora using document level alignment,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, pp. 403–411. Search in Google Scholar

[8] C. Tillmann, “A beam-search extraction algorithm for comparable data,” in Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Aug. 2009, pp. 225–228. https://doi.org/10.3115/1667583.1667653 Search in Google Scholar

[9] A. Srivastav and S. Singh, “Proposed model for context topic identification of English and Hindi news article through LDA approach with NLP technique,” Journal of the Institution of Engineers (India): Series B, vol. 103, no. 4, pp. 591–597, 2022. https://doi.org/10.1007/s40031-021-00655-w Search in Google Scholar

[10] W. Ling, G. Xiang, C. Dyer, A. W. Black, and I. Trancoso, “Microblogs as parallel corpora,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1, 2013, pp. 176–186. Search in Google Scholar

[11] S. Singh and H. Beniwal, “A survey on near-human conversational agents,” Journal of King Saud University Computer and Information Sciences, vol. 34, no. 10, pp. 8852–8866, Nov. 2022. https://doi.org/10.1016/j.jksuci.2021.10.013 Search in Google Scholar

[12] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 35, no. 2, pp. 590–611, Feb. 2023. https://doi.org/10.1016/j.jksuci.2023.01.004 Search in Google Scholar

[13] S. Abdul-Rauf, H. Schwenk, and M. Nawaz, „Parallel fragments: Measuring their impact on translation performance,” Computer Speech & Language, vol. 43, pp. 56–69, May 2017. https://doi.org/10.1016/j.csl.2016.12.002 Search in Google Scholar

[14] P. Fung and P. Cheung, “Mining very-nonparallel corpora: Parallel sentence and lexicon extraction via bootstrapping and E,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, Jul. 2004, pp. 57–63. https://aclanthology.org/W04-3208/ Search in Google Scholar

[15] S. Jindal, V. Goyal, and J. S. Bhullar, “Building English-Punjabi parallel corpus for machine translation,” International Journal of Engineering, Science and Mathematics, vol. 7, no. 3, pp. 223–229, 2018. Search in Google Scholar

[16] B. Premjith, M. A. Kumar, and K. P. Soman, “Neural machine translation system for English to Indian language translation using MTIL parallel corpus,” Journal of Intelligent Systems, vol. 28, no. 3, pp. 387–398, 2019. https://doi.org/10.1515/jisys-2019-2510 Search in Google Scholar

[17] M. L. Paramita, A. Aker, P. Clough, R. Gaizauskas, N. Glaros, N. Mastropavlos, and D. Tufiș, “Collecting comparable corpora,” in Using Comparable Corpora for Under-Resourced Areas of Machine Translation, Theory and Applications of Natural Language Processing, I. Skadiņa et al., Eds. Springer, Cham, 2019, pp. 55–87. https://doi.org/10.1007/978-3-319-99004-0_3 Search in Google Scholar

[18] Z. Zhu, M. Li, L. Chen, and Z. Yang, “Building comparable corpora based on bilingual LDA model,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2, 2013, pp. 278–282. Search in Google Scholar

[19] D. S. Munteanu and D. Marcu, “Improving machine translation performance by exploiting non-parallel corpora,” Computational Linguistics, vol. 31, no. 4, pp. 477–504, Dec. 2005. https://doi.org/10.1162/089120105775299168 Search in Google Scholar

[20] Y. C. Chiao and P. Zweigenbaum, “Looking for candidate translational equivalents in specialized, comparable corpora,” in COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes, vol. 2, Aug. 2002, pp. 1–5. https://doi.org/10.3115/1071884.1071904 Search in Google Scholar

[21] A. A. Argaw and L. Asker, “Web mining for an Amharic-English bilingual corpus,” in WEBIST 2005 – 1st International Conference on Web Information Systems and Technologies, Kista, Sweden, 2005. https://www.scitepress.org/papers/2005/12285/12285.pdf Search in Google Scholar

[22] S. Gahbiche-Braham, H. Bonneau-Maynard, and F. Yvon, “Two ways to use a noisy parallel news corpus for improving statistical machine translation,” in Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, 2011, pp. 44–51. Search in Google Scholar

[23] R. Singh and S. Singh, “Text similarity measures in news articles by vector space model using NLP,” Journal of The Institution of Engineers (India), vol. 102, no. 2, pp. 329–338, Nov. 2020. https://doi.org/10.1007/s40031-020-00501-5 Search in Google Scholar

[24] D. Widdows, B. Dorow, and C. K. Chan, “Using parallel corpora to enrich multilingual lexical resources,” in Third International Conference on Language Resources, 2002, pp. 240–245. Search in Google Scholar

[25] H. Xu, D. Liu, L. Qian, and G. Zhou, “Improving bilingual lexicon construction from Chinese-English comparable corpora via dependency relationship mapping,” in 2011 International Conference on Asian Language Processing, Penang, Malaysia, Nov. 2011, pp. 169–172. https://doi.org/10.1109/IALP.2011.22 Search in Google Scholar

[26] L. Qian, H. Wang, G. Zhou, and Q. Zhu, “Bilingual lexicon construction from comparable corpora via dependency mapping,” in Proceedings of COLING 2012, 2012, pp. 2275–2290. Search in Google Scholar

[27] X. Liu, K. Duh, and Y. Matsumoto, “Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus,” in Proceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013, pp. 212–221. Search in Google Scholar

[28] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” The Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf Search in Google Scholar

[29] D. Bouamor, A. Popescu, N. Semmar, and P. Zweigenbaum, “Building specialized bilingual lexicons using large scale background knowledge,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Oct. 2013, pp. 479–489. https://www.researchgate.net/publication/281863666_Building_Specialized_Bilingual_Lexicons_Using_Large-Scale_Background_Knowledge Search in Google Scholar

[30] D. Bouamor, N. Semmar, and P. Zweigenbaum, “Context vector disambiguation for bilingual lexicon extraction from comparable corpora,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2, 2013, pp. 759–764. Search in Google Scholar

[31] I. Vulić, W. De Smet, and M. Moens, “Identifying word translations from comparable corpora using latent topic models,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, 2011, pp. 479–484. Search in Google Scholar

[32] I. Vulić and M.-F. Moens, “Detecting highly confident word translations from comparable corpora without any prior knowledge,” in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 449–459. Search in Google Scholar

[33] D. Kaur and S. Singh, “English Punjabi aligned nouns dataset,” Mendeley Data, V1, 2022. Search in Google Scholar

eISSN:: 2255-8691
Langue:: Anglais

Périodicité:: 2 fois par an
Sujets de la revue:: Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development

RSS Feed de la revue

Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora

Publié en ligne: 29 janv. 2024

Pages: 245 - 251

DOI: https://doi.org/10.2478/acss-2023-0024

Mots clésAligned corpora, comparable corpora, English-Punjabi, parallel corpora

© 2023 Dilshad Kaur et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Aligned corpora, comparable corpora, English-Punjabi, parallel corpora