Libraries are fundamental to good science, and today are far more diverse than buildings with rows of book-laden shelves. Genetic libraries, for example, are databases of DNA sequence data, and are a critical resource for scientists using molecular techniques to survey species. As with any library, it is essential to have a system for organizing and sifting through mountains of data. New software is helping to achieve this, making it possible for scientists to evaluate how well existing genetic libraries represent the full community of species in biodiverse regions like the Mekong River Basin. Understanding this coverage is vital for strategically planning studies that analyze environmental DNA (eDNA), the genetic material shed by animals into their environment. By leveraging this new software, researchers working with the Wonders of the Mekong project evaluated available genetic sequence data for fish in the hyper-diverse Mekong River basin. Their recent publication in the journal Water (Jerde et al. 2021) explains that although a significant amount of data is available, much work remains to maximize the utility of eDNA in the Mekong.
Environmental DNA analysis is a rapidly advancing field that shows particular promise for studying aquatic biodiversity. However, the main challenge to implementing this remarkable technology is the need for genetic reference libraries, which are critical for identifying the species present in a field sample by matching DNA sequences between the sample and the library. The recently developed GAPeDNA website is a centralized interface that makes it easier to assess existing genetic data in current libraries. Researchers can use this website to efficiently obtain available fish species genetic sequence data by river basin and region. They can also look up which species have reference sequences for various primers, which are the DNA segments that adhere to the target section of the fish’s genome in a sample and allow it to be amplified and analyzed.
To date, only a single published study study has performed eDNA metabarcoding in the Mekong (Gillet et al. 2018), a type of eDNA analysis that seeks to identify DNA from many different species within a single sample, and requires reference sequences from each species. The results of this study suggested that current genetic libraries have substantial gaps when it comes to Mekong species. To determine the extent of the gaps, the Wonders of the Mekong project scientists used the GAPeDNA database to assess available sequence data for the Mekong fish community. They compiled multiple species lists for the Mekong Basin based on various sources – including the default list included on the GAPeDNA database – and evaluated the proportion of species in each list for which reference DNA sequences exist. They also evaluated which primers and primer combinations were capable of identifying the most fish species. Finally, they more closely analyzed several fish genera to evaluate whether the sequenced genetic regions in libraries are different enough to readily distinguish among species.
Across all 23 of the assessed primers, a total of 782 Mekong fish species were found to have existing reference sequences, which represents 58% of the 1,345 total fish species believed to be present in the basin. The researchers were also able to identify the primer with the best coverage in the system (the 16S marker developed by McInnes et al. 2017), as well as multi-primer combinations that would optimize coverage. Among the fish species whose sequence data they examined, the team found that certain primers would be unable to distinguish among some closely related species, and others may even fail to distinguish among species in different genera. Moreover, there were indications that some of the available sequences may have been incorrectly identified, and a dearth of sequence data was revealed for diverse genera such as the stone loaches (Schistura), of which only two of 77 species have been sequenced.
GAPeDNA is a valuable tool for informing the development of eDNA research, but the results of this analysis suggest that more work is needed to fill in genetic sequence gaps for Mekong species. Due to the challenges of obtaining complete species lists, the limited capacity of existing primers to distinguish among species, and a lack of reliable sequence data for many species, watersheds with very high biodiversity have always posed challenges to eDNA methodologies. However, assessments like this one can help to inform targeted research to overcome these challenges in the future. Decreasing costs of sequencing mean that it will only become easier to make genetic libraries more complete, and thereby better facilitate eDNA metabarcoding to study and monitor ecologically, socially, and economically important aquatic biodiversity in regions like the Mekong.