Data Release
Dimensions of Biodiversity: An Interdisciplinary Study of Hyperdiverse Endophytic Fungi and their Function in Boreal Forests
Aim 5: Data sharing and release
For additional information, contact Ignazio Carbone (ignazio_carbone@ncsu.edu)
Data management and release. Our goal is to share protocols, data, results, and tools with widely distributed researchers and the public, with a focus on integration among project participants, quality control, and fair and rapid public release. Our efforts will be facilitated by our web-based system, Mobyle SNAP Workbench, which will provide the infrastructure for data storage, analysis, and tool development. This is accessible through the Resources tab on EnDoBiodiversity.org, which will provide a front end for data entry and visualization, analysis workflow, and both within-project and public use of bioinformatics tools that seamlessly operate on group-generated data.
Data management. In addition to providing the home and portal to our analytical tools, SNAP will manage data and analyses securely and efficiently through (1) shared workspaces in which group-generated data (e.g., DNA sequences and ESTs) can be accessed and analyzed by all participants, and (2) specialized storage architecture for data that will seamlessly link steps in our biodiversity informatics pipeline.
The broad data set of cultures, clones, and sequences generated by Aim 1, complemented by host records, environmental parameters and associated metadata, will be linked via SNAP for ready access in achieving Aims 2-4. Multi-locus sequence data will be aligned automatically to existing alignments stored in SNAP using WASABI. An online portal for the efficient phylogenetic placement of unknown Ascomycota will be opened to the public online in year 4, allowing automatic placement of ITS-LSU or ITS-only sequences within our most comprehensive ITS-LSU/ITS tree and providing access to host, geographic, and ecological information. All strains isolated for this project and deposited at ARIZ and/or CBS will be clearly marked for researchers to request strains of interest. Taxonomic status of all species will be shown on the online ITS-LSU/ITS tree with links to publications for newly described species and (b) MycoBank, https://www.viagrasansordonnancefr.com/viagra-ou-cialis/ GBIF, Encyclopedia of Life, and GenBank. It will also be possible to download alignments for the eight loci targeted by this project by clicking on specific nodes of the multilocus tree generated by our work.
Data release. The SNAP portal and database infrastructure will be made available to the public as part of the open-source community. Data and tool release will occur through several specific outlets. First, EnDoBiodiversity.org will provide a visible entry point and public access to information on project objectives, personnel, protocols, reports, workshops and training modules, data, results, and the SNAP environment. Second, we will make the tools in our portal immediately accessible to project participants and to the public via guest user login and registration. Third, we will deposit complete genome sequence data, single-locus data, and ecological metadata in GenBank and other public databases.
In accordance with NSF principles, we will publicly release all data generated under this award as rapidly as possible, as follows: chromatogram files – We will submit all sequence and trace files, with associated templates and quality values, to the Trace Archive at NCBI. ITS-LSU and multilocus sequencing data will be released after phylogenetic validation. 454 genome assemblies – All assemblies will be made available via the NCBI Whole Genome Shotgun (WGS) site after internal validation. Assuming no significant errors are detected during the validation process, intermediate and final assemblies will be released within 45 calendar days of generation. Illumina short reads – All genome and transcriptome data will be submitted to the NCBI Short Read Archive (SRA) and NCBI Gene Expression Omnibus (GEO) database. Genome annotation – Automated annotation data will be made available via GenBank and our project website within 45 days of internal validation.