Scientific communication has traditionally relied upon publications and presentations, with an estimate of millions of publications worldwide per year; the growth rate of PubMed alone is now 1 paper per minute. The results described in these articles are often backed by large amounts of diverse data produced by complex experiments, computer simulations, and observations of physical phenomena. Because of this avalanche of data, it is increasingly hard to validate, reproduce, reuse and leverage scientific data. In addition, although publications, methods and datasets are very related, they are not easily accessible and interlinked. The notable exception is omics research where journals require deposit of sequences in databanks as a condition of publication. Even where data is discoverable and accessible, significant challenges remain in data reuse and sharing, in facilitating the necessary correlation, integration and synthesis of data across levels of theory, techniques and disciplines.
In the 2nd International Workshop on Linked Science (LISC2012) we will discuss and present results of new ways of publishing, sharing and linking scientific data together, and reasoning over such data to discover interesting new links to validate research. The theme of this year’s workshop will focus on research addressing these issues with respect to big data. Big Data is loosely characterized by the size and/or number of individual files, the number of represented variables, a range of physical scales, a range of scientific disciplines, heterogeneous metadata and data formats, in short data that cannot easily be accessed and manipulated from a thumb-drive.
- Line C. Pouchard (Scientific Data Group, Oak Ridge National Laboratory) - Keynote title: Semantic Challenges and Solutions in DataONE
- Tomi Kauppinen (University of Münster)
- Line C. Pouchard (Oak Ridge National Laboratory)
- Carsten Keßler (University of Münster)