Supporting secondary research in early drug discovery process through a Natural Language Processing based system

Alina Popa

Open Access

Supporting secondary research in early drug discovery process through a Natural Language Processing based system

Alina Popa

| May 31, 2021

Proceedings of the International Conference on Applied Statistics

Volume 2 (2020): Issue 1 (December 2020)

About this article

Cite

Page range: 209 - 222

DOI: https://doi.org/10.2478/icas-2021-0019

Keywords
Natural Language Processing, Dynamic Topic Modelling, Research Trends, Drug Discovery, Named Entity Recognition

© 2021 Alina Popa, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Last decades were characterised by a constant decline in the productivity of research and development activities of pharmaceutical companies. This is due to the fact that the drug discovery process contains an intrinsic risk that should be managed efficiently. Within this process, the early phase projects could be streamlined by doing more secondary research. These activities would involve the integration of chemical and biological knowledge from scientific literature in order to extract an overview and the evolution of a certain research area. This would then help refine the research and development operations.

Considering the vast amount of pharmaceutical studies publications, it is not easy to identify the important information. For this task, a series of projects leveraged the advantages of the open pharmacological space through state-of-the-art technologies. The most popular are Knowledge Graphs methods. Although extremely useful, this technology requires increased investments of time and human resources. An alternative would be to develop a system that uses Natural Language Processing blocks. Still, there is no defined framework and reusable code template for the use-case of compounds development.

In this study, it is presented the design and development of a system that uses Dynamic Topic Modelling and Named Entity Recognition modules in order to extract meaningful information from a large volume of unstructured texts. Moreover, the dynamic character of the topic modelling technique allows to analyse the evolution of different subject areas over time. In order to validate the system, a collection of articles from the Pharmaceutical Research Journal was used.

Our results show that the system is able to identify the main research areas in the last 20 years, namely crystalline and amorphous systems, insulin resistance, paracellular permeability. Additionally, the evolution of the subjects is a highly valuable resource and should be used to get an in-depth understanding about the shifts that happened in a specific domain.

However, a limitation of this system is that it cannot detect association between two concepts or entities if they are not involved in the same document.

eISSN:: 2668-6309
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Computer Sciences, Artificial Intelligence, Business and Economics, Political Economics, Macroecomics, Mathematics and Statistics for Economists, Statitistics, Econometrics

Journal RSS Feed

Supporting secondary research in early drug discovery process through a Natural Language Processing based system

Published Online: May 31, 2021

Page range: 209 - 222

DOI: https://doi.org/10.2478/icas-2021-0019

KeywordsNatural Language Processing, Dynamic Topic Modelling, Research Trends, Drug Discovery, Named Entity Recognition

© 2021 Alina Popa, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Natural Language Processing, Dynamic Topic Modelling, Research Trends, Drug Discovery, Named Entity Recognition