<abstract xmlns="http://www.w3.org/1999/xhtml">

<sec><h3>Purpose</h3>
<p>We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks.</p>
</sec>
<sec><h3>Design/methodology/approach</h3>
<p>Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics.</p>
</sec>
<sec><h3>Findings</h3>
<p>Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes.</p>
</sec>
<sec><h3>Research limitations</h3>
<p>The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings.</p>
</sec>
<sec><h3>Practical implications</h3>
<p>Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics.</p>
</sec>
<sec><h3>Originality/value</h3>
<p>This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.</p>
</sec>
</abstract>

Purpose
We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks.

Design/methodology/approach
Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics.

Findings
Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes.

Research limitations
The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings.

Practical implications
Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics.

Originality/value
This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.

A Topic Detection Method Based on Word-attention Networks

National University of Defense Technology

Journal of Data and Information Science

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

{"article-title":"A Topic Detection Method Based on Word-attention Networks"}

Purpose
We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex...