How the search works

  1. When you select a genome, it is processed into a sourmash signature locally. Your genome never leaves your computer.
  2. The signature is used to query over 1 million metagenomes in the SRA using the mastiff implementation of branchwater. The quality of the match is represented by containment and containment-based Average Nucleotide Identity (cANI).
  3. Select SRA metadata is pulled for each of the accessions from a curated database and used for plots.

Search Considerations

For best results, search with sequences 50kb or longer. Queries smaller than 50kb may not perform well, and queries smaller than 10kb are likely to produce no results at all. These limitations may be addressed in the future; we welcome feedback and requests on the branchwater-web issue tracker on github.

SRA metadata options

The list of SRA metadata options and their definitions are a subset of over 900 potential metadata options that are listed in the SRA biosample attribute list and the SRA cloud-based metadata tables . The options presented in the branchwater web query were selected based on a >4.5% availability cutoff. Some metadata names are similar but represent distinct options, such as the sampling date, because they were derived from information submitted for the SRA run and some are derived from information submitted for the SRA sample (denoted with the '_sam' suffix). Other than minor data reformatting, the metadata provided are directly representative of what is available in the SRA, including missing information or any errors in the SRA submission.

Who we are

This web search is powered by sourmash and the result of a collaboration between the Data Intensive Biology lab at UC Davis, the USDA-ARS Genomics and Bioinformatics Research Unit, and the Joint Genome Institute. Any issues, questions, or comments about the search can be directed to the branchwater-web issue tracker on github.