Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science


 
 
 We attempt to find out whether OA or TA really affects the dissemination of scientific discoveries.
 
 
 
 We design the indicators, hot-degree, and R-index to indicate a topic OA or TA advantages. First, according to the OA classification of the Web of Science (WoS), we collect data from the WoS by downloading OA and TA articles, letters, and reviews published in Nature and Science during 2010–2019. These papers are divided into three broad disciplines, namely biomedicine, physics, and others. Then, taking a discipline in a journal and using the classical Latent Dirichlet Allocation (LDA) to cluster 100 topics of OA and TA papers respectively, we apply the Pearson correlation coefficient to match the topics of OA and TA, and calculate the hot-degree and R-index of every OA-TA topic pair. Finally, characteristics of the discipline can be presented. In qualitative comparison, we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs, and analyze the relations between OA/TA and citation numbers.
 
 
 
 The result shows that OA hot-degree in biomedicine is significantly greater than that of TA, but significantly less than that of TA in physics. Based on the R-index, it is found that OA advantages exist in biomedicine and TA advantages do in physics. Therefore, the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or TA. However, OA promotes the spread of important scientific discoveries in high-quality papers.
 
 
 
 We lost some citations by ignoring other open sources such as arXiv and bioArxiv. Another limitation came from that Nature employs some strong measures for access-promoting subscription-based articles, on which the boundary between OA and TA became fuzzy.
 
 
 
 It is useful to select hot topics in a set of publications by the hot-degree index. The finding comprehensively reflects the differences of OA and TA in different disciplines, which is a useful reference when researchers choose the publishing way as OA or TA.
 
 
 
 We propose a new method, including two indicators, to explore and measure OA or TA advantages.



Introduction with Brief Review
Open Science (OS) is one of the hottest topics in the scientific world. According to UNESCO, OS allows scientific information, data, and outputs to be more widely accessible (Open Access, OA) and more reliably harnessed (Open Data) with the active engagement of all the stakeholders (Open to Society). Since open sources in computer software began the open movement, along with Open Data (OD) and Open Peer Review (OPR), OA has jointly pushed forward and benefited OS and attracted the most attention.
Open access means that accessing, downloading, and reading scientific literature is free to the entire population of Internet users (Craig, et al., 2007).
There are two routes to open access: open-access journals and e-print repositories (i.e. pre-prints or post-prints) (Antelman, 2004). By tracking and predicting growth areas (Small, 2006), OA possesses advantages in scholarly communication (Wang et al., 2015) and free online availability that substantially increases the impact of research (Lawrence, 2001). Researchers can gain OA advantage through open practices, including more citations, media attention, potential collaborators, job opportunities, and funding opportunities (McKiernan et al., 2016).
Researchers have conducted many studies on the OA advantage. Some studies have shown that authors who made their works open accessed received more downloads (Davis, 2011) or citations (Antelman, 2004;Eysenbach, 2006;Lawrence, 2001;McKiernan et al., 2016;Norris, Oppenheim, & Rowland, 2008;Wang et al., 2015) than those authors whose articles remained behind a subscription wall. A comparison of OA and TA (toll access) of 4,633 articles in ecology, applied mathematics, sociology, and economics reveals that 2,280 (49%) OA articles achieved a mean citation count of 9.04 whereas the mean for TA articles was 5.76

Research Paper
Journal of Data and Information Science (Eysenbach, 2006). But some researchers also argue that there is no OA citation advantage. The observed citation differences between OA and TA articles are mainly caused by open access postulate, a selection bias postulate and early view postulate (Craig et al., 2007). Davis (2011) states that OA articles receive more downloads from a broader group of readers, however, receive no more citations than TA articles. A study of working papers in economics shows that the impact of working papers is relatively low and provides no evidence of an OA advantage (Frandsen, 2009). OA advantages might be topic-dependent; that is, authors tend to select citationattractive topics, especially for OA outlets that are more likely to attract citations (Sotudeh, 2019). However, we do not really know whether OA benefits hot topics or TA cools cold topics.
Therefore, we designed indicators of hot-degree and R-index to verify the OA-TA relations quantitatively, following mentioned studies (Craig et al., 2007;Eysenbach, 2006;Lawrence, 2001;Norris et al., 2008;Small, 2006;Sotudeh, 2019;Wang et al., 2015). Then, our unforeseen results show that OA or TA does not affect the normal spread of real scientific discoveries while disseminating scientific discoveries varies with OA or TA.

Methodology
Indicators are designed and the clustering methods are applied as follows.

Indicators
For measuring hot topics (Banks, 2006;Ye, 2013), we can follow the definition of original h-index to define a hot-degree of a topic (including subjects, keywords, and so on) in sciences, as follows.

Research Paper
Journal of Data and Information Science

Clustering
To clarify the topic-related properties of published papers, we used classical Latent Dirichlet Allocation (LDA, c.f. Blei, Ng, & Jordan, 2003) to cluster the topics of Nature and Science. We divided the Nature dataset into three parts according to its disciplines, namely biomedicine, physics, and others. We performed the same operation on the Science dataset. Taking the Nature biomedicine dataset as an example, we show the data processing flowchart in Figure 1.

Research Paper
Journal of Data and Information Science Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science http://www.jdis.org https://www.degruyter.com/view/j/jdis The algorithm includes the following steps.
• Divided Nature biomedicine dataset into two parts, OA and TA.
• Based on the title and abstract fields, we used the LDA model to extract 100 topics of the OA and TA datasets, which can be represented by Topic OA and Topic TA Each paper in OA or TA datasets consists of a set of topics with a probability greater than 0.01. For example, one paper in OA datasets can be denoted by p oi , the topic of paper p oi can be retrieved from the LDA model after training from OA dataset.
So, for each topic t oi in Topic OA , it belongs to a set of papers in OA dataset, which means • For topics in Topic OA and Topic TA , we can calculate the hot-degree of the topic based on the citations of the paper which belong to the topic, noted H OA , H TA respectively. 1  100  1  100 , , , , , , , , ,

{ } { }
where h oi represents the hot-degree of i th topic of Topic OA , and h tj represents the hot-degree of j th topic of Topic TA . • We suppose that during the period of 2010-2019, the research topics of OA and TA are similar, so the topics can be matched. In this paper, we used the Pearson correlation coefficient to match the topics of OA and TA. So the OA-TA topic pair can be represented by where t oi ∈Topic OA , t tj ∈Topic TA .
The OA topic t oi and the TA topic t tj represent the same topic, so the R-index can be calculated from the Pair OA -TA .

Data description
We collected data from the Web of Science (WoS) by searching and downloading both OA and TA articles, letters, and reviews published in Nature and Science for

Research Paper
Journal of Data and Information Science the period of 2010-2019. In our research, we only chose papers published in Nature or Science. The sub journals of Nature or Science like Nature Communications, Scientific Reports were not taken into account. As the most prominent comprehe nsive scientific magazines, both Nature and Science receive significant recognition among colleagues and may open the way to greater media visibility. Science and Nature publish only 10 percent of received papers, thus creating what has been called a situation of "artificial scarcity" (a paradox in the age of digital information). The data from both magazines are comparable, according to the OA classification of the WoS.
As more and more academic journals provide the choice of open access, the OA status of one paper is also provided across the WoS platform, which can be one of the following choices: legal Gold, Bronze (free contents at a publisher's website) and Green(e.g. author self-archived as a repository) OA versions. In our research, papers with any of the above open access statuses will be classified into OA dataset. Otherwise, they will be classified into TA dataset. Since WoS does not provide the disciplinary classification for Nature and Science, we collected the disciplinary data from www.nature.com and www.sciencemag.org directly and classified the data into three broad disciplines, which are biomedicine, physics, and others. Table 1 shows the characteristics of the dataset. As the data was distributed unequally, we calculated median values rather than average values. When summing both OA and TA items together, we found that 56% of the papers belong to biomedicine, 19% physics, and 25% others in Nature; while 43% of the papers are in biomedicine, 24% physics, and 33% others in Science. That is, publications in Nature and Science are skewed to biomedicine.
Furthermore, we used classical LDA to quantitatively cluster all papers (which mean average scientific discoveries) into 100 topics and compared them with the qualitative dataset, as well as 2011-2019 Nature remarkable papers and 2010-2019 Science breakthroughs (which mean important scientific discoveries) qualitatively.

Research Paper
Journal of Data and Information Science Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science http://www.jdis.org https://www.degruyter.com/view/j/jdis For comparative analysis, we selected three topics to manifest in both Nature and Science, CRISPR/Cas9, quantum computing & quantum entanglement, and astrophysics. CRISPR/Cas9 and quantum computing & entanglement are hot topics in biology and physics, respectively, and astronomy belongs to a frontier topic of physics. CRISPR/Cas9 was selected as one of Science breakthroughs in 2013 (Genetic Microsurgery for the Masses, 2013), 2015 (Travis, 2015), and 2017 (Stokstad et al., 2017) and among Nature remarkable papers in 2019 (Robots, hominins, and superconductors: 10 remarkable papers from 2019, 2019). Quantum computing is a hot topic of 2019 (Pennisi et al., 2019) Science breakthroughs and 2016 ( Editors' choice, 2016 Nature remarkable paper, and quantum entanglement is a 2015 (Runners-up, 2015) Science breakthrough and a 2015 ( Editors' choice, 2015 Nature remarkable paper. There are 5 Nature remarkable papers and 9 Science breakthroughs in astronomy, mainly involving celestial mechanics, astrometry, and astrophysics.

Results and discussion
We strengthen the results and discussion as quantitative results and qualitative discussion.

Quantitative results
Using designed indicators -hot-degree and R-index, we concluded the representative results as shown in Table 2, including biomedical, physical, and other records as well as p-value of OA-TA related t-test, where p < 0.05 indicates significant effectiveness. Except for Science Other category, we can see a significant difference between the hot-degree of OA and TA at the level of p < 0.01 (high significant effectiveness) in the related t-test. The hot-degree of OA and TA in different disciplines show

Research Paper
Journal of Data and Information Science differences. For example, the hot-degree of OA in biomedical is greater than that of TA and the hot-degree of OA in physical is less than TA. This conclusion can be drawn from both Nature and Science datasets. For other disciplines, the difference of hot-degree between OA and TA is significant in the Nature dataset and not significant in the Science dataset, but the hot-degree of OA is less than TA in both Nature and Science datasets. The combined dataset shows that there is a significant difference between the hot-degree of OA and TA, and the hot-degree of OA is greater than TA.
Obviously, there exist disciplinary differences. Annual disciplinary distributions of biomedicine and physics suggest distribution curves of total and median citations in Nature and Science as Figure 2. As OA > TA means open advantages and TA > OA indicates toll advantages, we cannot say that OA is always advantageous over TA in any field and under any condition. According to Figure 2, median citations show OA advantages in both biomedicine and physics. In the view of total citations, the discipline difference

Research Paper
Journal of Data and Information Science Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science http://www.jdis.org https://www.degruyter.com/view/j/jdis between biology and physics is slightly obvious. The total citation of biomedical OA papers is greater than that of TA papers, but in physics total citation of TA papers is at a disadvantage compared to that of OA ones.
After clustering the publications in both magazines using LDA, where hot topics are clustered by LDA algorithm in total papers of Nature and Science and 100 different topics are extracted in ten years, we obtained the distributions of hotdegree, which shows biomedical and physical topics with OA or TA respectively in Figure 3.  Figure 3 shows that different topics have different distributions with different hot-degree. The distribution of OA hot-degree in biomedicine is significantly greater than that of TA, while physics shows the opposite. The distribution of hot-degree of OA is significantly less than TA.
Meanwhile, we conclude R-index distribution on LDA topics as Figure 4. The pie chart in each distribution plot depicts the topic proportions of r > 1, r = 1 or r < 1 in different datasets.

Research Paper
Journal of Data and Information Science In Figure 4, obviously, OA advantages do exist in biomedicine. However, TA advantages occupy most portion in physics.
Therefore, we see that the way a paper is published in Science and Nature, OA or TA, does not necessarily affect the dissemination of average scientific discoveries in all fields. The above analysis also indicates that OA may benefit and promote hot topics in biomedicine. Therefore, we selected typical qualitative evidence for further discussion.

Qualitative discussion
Quantum computing & quantum entanglement and CRISPR/CAS9 are two hot sub-fields of physics and biology, respectively, which can be used for comparison. The former has a total of 3 papers in the qualitative dataset, so we selected the three most important papers of CRISPR/Cas9. Jennifer Doudna and Emmanuelle Charpentier firstly demonstrated that the CRISPR/Cas9 system could be used to edit genes accurately in 2012 (Jinek et al., 2012). Zhang Feng's team proved the ease of programming and wide applicability of RNA-guided nuclease technology based on human and mouse cells (Cong et al., 2013). George Church engineered the type II bacterial CRISPR system to function with custom guide RNA (gRNA) in human cells. These are pioneering papers in CRISPR/CAS9 (Mali et al., 2013). Since there are 17 papers (7 OA and 10 TA papers, access at https://www.nature.com and https://www.sciencemag.org) of astronomy covering several branches, they are used separately to further compare OA and TA.
From the representative qualified papers, we found that CRISPR/Cas9 papers were more likely to go down the OA route, quantum computing papers the TA route, and astronomy likes both OA and TA. These papers were all published in 2010-2019

Research Paper
Journal of Data and Information Science Scientific Value Weights more than Being Open or Toll Access: An analysis of the OA advantage in Nature and Science http://www.jdis.org https://www.degruyter.com/view/j/jdis In Figure 5, the upper (a) part shows trends of citations to CRISPR/Cas9 and quantum computing & quantum entanglement papers, and the lower (b) part shows the median citation trends of the astronomy papers in OA and TA. In Figure 5 (a), the three curves with higher citations are CRISPR/CAS9, and the three curves with lower citations are quantum computing & quantum entanglement. CRISPR/Cas9 papers prefer the OA route and quantum topic the TA route, which is consistent with the above quantitative analysis on the disciplinary differences of biomedicine and physics. Zooming on the topic of quantum computing & quantum entanglement, we further found that there were both OA and TA papers. Although one OA paper (the green curve) was published in 2019, it obtained a higher number of citations compared with the other two TA papers. In Figure 4 (b), the median citation curves of OA always keep above the TA-median curve, which might indicate an OA

Research Paper
Journal of Data and Information Science advantage in astronomy. However, whether the publication is Nature or Science, the median citation of Physics TA is higher than that of OA in Table 1. This result indicates that OA can promote the spread of important scientific discoveries more than TA.
These findings reveal that disseminating important scientific discoveries may follow either the open or close route, but scientific breakthroughs are never decided accordingly. OA or TA may affect the citations to papers, leading to either OA or TA advantages, where OA papers may receive high citations while TA may not, which merely suggests that OA promotes the spread of scientific discoveries. However, real scientific discoveries have become known not because of the channel through which the papers about these discoveries are published.
More and more academic journals became hybrid journals, where the author(s) could choose OA or TA publishing route. Therefore, the OA and TA records in database WoS became a research example in this article. As we ignored the effect of other open information sources such as arXiv and bioArxiv, we may partly lose the numbers of citations, which could cause limitations. Another flaw concerns that Nature truly employs a number of strong measures for access-promoting subscriptionbased articles. For example, since December 2, 2014, all research papers from Nature have been made free to read in a proprietary screen-view format that can be annotated but not copied, printed, or downloaded (Nature, 2014), so that the boundary between OA and TA articles may become fuzzy.

Conclusion
We conclude that OA or TA does not necessarily affect the academic dissemination of average scientific discoveries, but OA promotes the spread of important scientific discoveries. A real scientific discovery is spreading not because of its taking the OA or TA route but its significance that is a paramount factor helping its dissemination. OA or TA is only the disseminating way, and the key is scientific finding in published contents. Scientific discovery itself exceeds its disseminating way, and OA advantages never generally exist. We hope that this research may stimulate further explorations.