ZB MED - Information Centre for Life Sciences
Data science for the life sciences and specialist information services – comprehensive, digital and research-based.
go to profile

ZB MED has collected all literature from repositories that have been opened up as a contribution to foster research results helping in our current COVID-19 crisis. Use this link to retrieve the latest literature content with regards to COVID-19 / SARS-CoV-2 from our literature search engine LIVIVO.

Literature Resources for Text Mining

Below you find a list with all available literature sets and an overview about the overlap between the different datasets.


Figure: Overlap between the databases based on the DOIs of the included documents. Provided by our cooperation partner Fraunhofer SCAI.

COVID-19 Open Research Dataset Challenge (CORD-19):
“In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.” (Text source and Link)

WHO Global research on coronavirus disease (COVID-19):
“WHO is gathering the latest scientific findings and knowledge on coronavirus disease (COVID-19) and compiling it in a database. We update the database daily from searches of bibliographic databases, hand searches of the table of contents of relevant journals, and the addition of other relevant scientific articles that come to our attention. The entries in the database may not be exhaustive and new research will be added regularly.” (Text source and Link)

COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv:
Preprints from both medRxiv and bioRxiv can be found under: Link .

Elsevier's Novel Coronavirus Information Center:
"includes the latest early-stage and peer-reviewed research on COVID-19 from journals including The Lancet and Cell Press. In addition nearly 20,000 related articles are free to access on ScienceDirect. These articles are also available to download with rights for full text and data mining, re-use and analyses for as long as needed." (Text source and Link Under this link, you find information for clinicians and patients too. Click to view articles.

Download and login to sftp server: sftp (password: beat_corona)

COVID-19 Research Pass
In collaboration with leading scholarly publishers, the COVID-19 Research Pass enables instant full text access to the latest COVID-19 research literature from participating publishers and via Open Access. Additionally, the program offers API access to millions of full text research articles for text and data mining purposes.

Literature Search Engines

More than 50,000 entries about COVID-19 / SARS-CoV-2 from various scientific sources as well as current relevant preprints from bioRxiv and medRxiv. Furthermore it covers articles included in the COVID-19 Open Research Dataset (CORD-19) and other sources. Livivo Covid-19 collection

A literature retrieval platform, developed by NIH, provides curated SARS-Cov-2/COVID-19 literature categorized manually in 6 categories:
general information, mechanism, transmission, treatment, case report and epidemic forecasting.
The documents (unfortunately without their categories) can be downloaded at LitCovid.

CORD-19 Explorer
The CORD-19 Explorer is a full-text search engine for the COVID-19 Open Research Dataset.

provides a full text search engine for Covid-19 publications

Text Mining Services

A search engine with terminology annotations based on text mining that we setup with our cooperation partner Fraunhofer SCAI can be found here.


SciBite provides an annotated version of the CORD-19 corpus. It is, amongst others, tagged with MeSH terms, genes (HGNC) and drugs (ChEMBL) and is provided as JSON files.

PubTator contains annotated Covid-19 related documents based on LitCovid.

Covid-19 Preprint index
Covid-19 bioRxiv and medRxiv preprints are are clustered according to similarity in their topics.

Covid-19 interactome miner
provides access to a database of interactions between genes / proteins, chemicals and biological processes related to the SARS-Cov2 (COVID-19) virus. The interactions have been automatically extracted using text mining and ist based on the Cord-19 dataset and Covid-19 bioRxiv and medRxiv preprints.

follows the TREC model for building IR text collections through community evaluations of search systems. It is based on the Cord-19 dataset. Topics of the first challenge can be found here.