Bat βCoV Watch: the PREDICT data

Jun 28

Into the PREDICT-verse!

The USAID PREDICT program was a ten-year international scientific collaboration to discover novel viruses in wildlife that could someday pose a threat to humans. From over 164,000 animal and human samples collected in over 30 countries, the PREDICT program was able to discover nearly 1,000 new viruses, including hundreds of novel coronaviruses. Since 2009, the findings of the PREDICT program have been published in hundreds of scientific studies and released in thousands of GenBank accessions.

Many of the findings of the PREDICT program are therefore already incorporated into the source databases we use in our project. However, this year, the full testing data has been released in USAID’s data library, providing an exciting opportunity to check for novel hosts. A preliminary version of those data from the HealthMap interface are integrated into Verena’s VIRION database, but unfortunately, they mostly lack granular taxonomy. For example, we know that “PREDICT CoV-67” was found in bats, and is in the coronavirus family (Coronaviridae), but can’t necessarily tell whether PREDICT CoV-67 is in the genus Betacoronavirus if the sequence isn’t available and/or hasn’t been analyzed by a scientific study yet. This leaves us a bit in the dark - there are hundreds of bat-coronavirus associations in these data, but up to this point, it’s been difficult to fully utilize them.

One possible solution to that problem was identified this week by Clif McKee, a brilliant scientist studying bat diseases at Johns Hopkins, who you should absolutely follow on Twitter. Clif points out that the Spillover Risk Ranking Database tells us that PREDICT CoV-67 is a Betacoronavirus, which means we can go back to VIRION and sort out the new hosts. But there’s actually another more direct solution!

If you go to the full testing database released by the PREDICT program, you can see each individual animal, what it was tested for, whether it came up positive, and sometimes even the virus’s genome sequence. There’s also a helpful column “Interpretation” with information that helps us here. For example, our bat friend IDAB0148.RST, one animal from the species Pteropus alecto, was swabbed rectally, and the sample was tested using a conventional PCR test for coronaviruses. This animal came back positive for PREDICT_CoV-67, and the Interpretation column helpfully explains:

This is a new coronavirus found in bats belonging to the betacoronavirus genus. The genus Betacoronavirus includes viruses that are of significance to public health such as SARS and MERS however this virus is not considered to be closely related to either of these viruses. Therefore at this time there is no evidence to suggest this virus poses a threat to human health.

This is great! Data and science communication rolled into one.

A quick search of this column - and a cross-check against HealthMap + spillover.global to make sure we haven’t missed anything - reveals that there are actually 10 new hosts in these data. This is an astonishing quantity of data, bringing our total 30 new hosts up to a nice round 40. Please welcome to the halls of betacoronavirus hosts: Acerodon celebensis, Chaerephon pumilus, Epomops buettikoferi, Glauconycteris variegata, Hipposideros cervinus, Hipposideros fuliginosus, Megaerops ecaudatus, Myonycteris torquata, Neoromicia somalicus*, and Pteropus conspicillatus*. (*For full transparency: these two species’ relevant samples have a record of uncertainty in the usually-morphological identification of the animals in the field, so while we’re happy to incorporate these data, it’s worth flagging that up front.)

With these 10 new hosts, that brings the running stats for the top models to:

60% success by the ensemble (24/40)
92.5% success by Trait-1 (37/40)
95% success by Network-1 (20/21) - that’s right, Myonycteris torquata finally broke this one!

Of the ten new hosts, seven were identified correctly by the ensemble and all were correctly identified by Trait-1 and Trait-3. This is a big set of wins for our models, and show just how useful they might be in the hands of future programs like PREDICT if we want to target sampling more efficiently and at a lower cost.

This concludes perhaps the most exciting Bat βCoV Watch of all time. Thank you for your continued support of the Viral Emergence Research Initiative.

Dr. Colin Carlson

June 28, 2021

VERENA Consortium

Bat βCoV Watch: the PREDICT data

Into the PREDICT-verse!

Can AI help us trace Omicron’s origins?

After a long hiatus… Bat βCoV Watch 🦇

The Verena program is supported by a Biology Integration Institute grant from the U.S. National Science Foundation (NSF BII 2021909 and NSF BII 2213854).