Name Uploaded Size
3dbeacon.tar.index Sat, 14 Dec 2024 14:27:49 GMT 8.0 MB
3dbeacon.tar Sat, 14 Dec 2024 14:27:47 GMT 340.0 MB
bfvd.tar.gz Thu, 17 Oct 2024 06:22:47 GMT 8.5 GB
bfvd.version Fri, 06 Sep 2024 09:41:50 GMT 8 B
bfvd_foldcompdb.tar.gz Thu, 17 Oct 2024 06:24:51 GMT 985.1 MB
bfvd_foldseekdb.tar.gz Sat, 09 Nov 2024 03:06:54 GMT 535.4 MB
bfvd_indexed.tar.index Sat, 19 Oct 2024 18:52:39 GMT 9.1 MB
bfvd_indexed.tar Sat, 19 Oct 2024 18:21:17 GMT 8.7 GB
bfvd_metadata.tsv Fri, 01 Nov 2024 08:59:14 GMT 31.2 MB
bfvd_msa.tar.index Tue, 19 Nov 2024 13:39:32 GMT 9.1 MB
bfvd_msa.tar Tue, 19 Nov 2024 11:46:52 GMT 21.1 GB
bfvd_taxid.tsv Mon, 25 Nov 2024 14:46:19 GMT 24.5 MB
bfvd_taxid_rank_scientificname_lineage.tsv Tue, 12 Nov 2024 16:42:44 GMT 83.8 MB
cif.tar.index Fri, 29 Nov 2024 00:41:00 GMT 9.0 MB
cif.tar Fri, 29 Nov 2024 00:46:48 GMT 7.8 GB
uniref30_2302_virus-rep_mem.tsv Sun, 08 Sep 2024 08:23:10 GMT 60.8 MB

Readme

The Big Fantastic Virus Database (BFVD) is a repository of 351,242 protein structures predicted by applying ColabFold to the viral sequence representatives of the UniRef30 clusters. BFVD holds a unique repertoire of protein structures, spanning major viral clades.

Kim R, Levy Karin E, Steinegger M. BFVD - a large repository of predicted viral protein structures Nucleic Acids Research doi: doi.org/10.1093/nar/gkae1119 (2024)

ColabFold Marv

Updates

  • 2024-09-04: First distribution of BFVD.
  • 2024-11-01: Based on the Logan-MSA, the predicted structures and the metadata were updated.

Availability

  • BFVD is browsable with UniProt accessions through website
  • BFVD is searchable through Foldseek webserver
  • Scripts for BFVD analyses are available at Zenodo
  • PDB files of BFVD are also available at Zenodo

Data description

1-bfvd.tar.gz: 351,242 predicted structures of BFVD.
2-bfvd.version: version file.
3-bfvd_foldcompdb.tar.gz: Compressed version of Foldseek database using Foldcomp.
Only 347,481 structures, none of which are discontinuous, were included.
4-bfvd_foldseekdb.tar.gz: Foldseek databse of 351,242 predicted structures of BFVD.
5-bfvd_metadata.tsv: General information of each model.

  1. UniRef100: UniRef100 identifier of the sequence
  2. model: File name of the predicted protein structure
  3. avg_pLDDT: Average pLDDT score of the predicted protein structure
  4. pTM: pTM score of the predicted protein structure
  5. splitted: Whether the protein sequence of UniRef100 entry was splitted into multiple models
    We splitted the protein sequences if their length are above 1500. (0 = not splitted, 1 = splitted)
6-bfvd_msa.tar: MSAs for each BFVD entries
7-bfvd_taxid.tsv: BFVD entry and their taxonomic identifier.
  1. model: File name of the BFVD.
  2. taxId: Taxonomy identifier of the protein.
    The protein ID, the portion before the first underscore in model, was used to retrieve the taxonomy ID.
8-bfvd_taxID_rank_scientificname_lineage.tsv: BFVD entry and their taxonomic information.
  1. model: File name of the BFVD.
  2. taxId: Taxonomy identifier of the protein.
    The protein ID, the portion before the first underscore in model, was used to retrieve the taxonomy ID.
  3. rank: rank of the taxonomy.
  4. scientific name: scientific name of the corresponding taxonomy identifier.
  5. lineage: lineage of the taxonomy.
9-uniref30_2302_virus-rep_mem.tsv: UniRef30 virus clusters.
  1. repId: Cluster representatives used for structure prediction
  2. memId: Member corresponding to the representative

License

All files are available under a Creative Commons Attribution 4.0 International License.