GigaScience is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB. GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data. The journal also publishes commentaries and reviews to provide a forum for discussions surrounding best practices and issues in handling large-scale data. See http://www.gigasciencejournal.com/ for additional information about the journal and prospective article submission.
GigaDB primarily serves as a repository to host data and tools associated with articles in GigaScience; however, it also includes a subset of datasets that are not associated with GigaScience articles (see below). GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study. Through our association with DataCite, each dataset in GigaDB will be assigned a DOI that can be used as a standard citation for future use of these data in other articles by the authors and other researchers. Datasets in GigaDB all require a title that is specific to the dataset, an author list, and an abstract that provides information specific to the data included within the set. We encourage detailed information about the data we host to be submitted by their creators in ISA-Tab, a format used by the BioSharing and ISA Commons communities that we work with to maintain the highest data and metadata standards in our journal. To maximize its utility to the research community, all datasets in GigaDB are placed under a CC0 waiver (for more information on the issues surrounding CC0 and data see Hrynaszkiewicz and Cockerill, 2012).
Datasets that are not affiliated with a GigaScience article are approved for inclusion by the Editors of GigaScience. The majority of such datasets are from internal projects at the BGI, given their sponsorship of GigaDB. Many of these datasets may not have another discipline-specific repository suitably able to host them or have been rapidly released prior to any publications for use by the research community, whilst enabling their producers to obtain credit through data citation. The GigaScience Editors may also consider the inclusion of particularly interesting, previously unpublished datasets in GigaDB, especially if they meet our criteria and inclusion as Data Note articles in the journal (see our author instructions here).
GigaDB has been included in the DataCite search engine and Thomson Reuters Data Citation Index (DCI) to aid data discovery. Through DataCite the metadata is also exposed and accessible through their metadata store through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The DCI allows data to be discoverable to other researchers around the world. This indexes a significant number of the world’s leading data repositories of critical interest to the scientific community. The records for the datasets, which include authors, institutions, keywords, citations and other metadata, are connected to related peer-reviewed literature indexed in their Web of Knowledge database.