
Ebook: The Semantic Web in Earth and Space Science. Current Status and Future Directions

The geosciences are one of the fields leading the way in advancing semantic technologies. This book continues the dialogue and feedback between the geoscience and semantic web communities.
Increasing data volumes within the geosciences makes it no longer practical to copy data and perform local analysis. Hypotheses are now being tested through online tools that combine and mine pools of data. This evolution in the way research is conducted is commonly referred to as e-Science. As e-Science has flourished, the barriers to free and open access to data have been lowered and the need for semantics has been heighted.
As the volume, complexity, and heterogeneity of data resources grow, geoscientists are creating new capabilities that rely on semantic approaches. Geoscience researchers are actively working toward a research environment of software tools and interfaces to data archives and services with the goals of full-scale semantic integration beginning to take shape. The members of this emerging semantic e-Science community are increasingly in need of semantic-based methodologies, tools and infrastructure. A feedback system between the geo- and computational sciences is forming. Advances in knowledge modeling, logic-based hypothesis checking, semantic data integration, and knowledge discovery are leading to advances in scientific domains, which in turn are validating semantic approaches and pointing to new research directions. We present mature semantic applications within the geosciences and stimulate discussion on emerging challenges and new research directions.
In the field of oceanographic data management, there is a long history of using controlled vocabularies to define metadata fields and describe data channels in files. These controlled vocabularies were originally published in book form, but are now available as Linked Data resources on the Semantic Web. By ensuring the controlled vocabularies are published in a manner conforming to best practices, their uptake has been wide. This has led to the inception of the Linked Ocean Data concept as a micro-cloud of Linked Data resources. The creation of the Linked Ocean Data cloud and some of the applications of developing this concept are described in this chapter. Also discussed are some emerging methods of using Linked Data techniques to add impact to datasets, allowing data managers and data scientists to tell new stories around their data encouraging new funding opportunities.
Linked Science is the practice of inter-connecting scientific assets by publishing, sharing and linking scientific data and processes in end-to-end loosely coupled workflows that allow the sharing and re-use of scientific data. Much of this data does not live in the cloud or on the Web, but rather in multi-institutional data centers that provide tools and add value through quality assurance, validation, curation, dissemination, and analysis of the data. In this paper, we make the case for the use of scientific scenarios in Linked Science. We propose a scenario in river-channel transport that requires biogeochemical experimental data and global climate-simulation model data from many sources. We focus on the use of ontologies — formal machine-readable descriptions of the domain — to facilitate search and discovery of this data. Mercury, developed at Oak Ridge National Laboratory, is a tool for distributed metadata harvesting, search and retrieval. Mercury currently provides uniform access to more than 100,000 metadata records; 30,000 scientists use it each month. We augmented search in Mercury with ontologies, such as the ontologies in the Semantic Web for Earth and Environmental Terminology (SWEET) collection by prototyping a com-ponent that provides access to the ontology terms from Mercury. We evaluate the coverage of SWEET for the ORNL Distributed Active Archive Center (ORNL DAAC).
Within the GeoKnow project, various tools are developed and integrated which aim to simplify managing geospatial Linked Data on the web. In this article, we summarize the state of the art and describe the status of open geospatial data on the web. We continue by presenting the Linked Data Stack as technical underpinning of GeoKnow and give a first presentation of the platform providing a light-weight integration of those tools.
One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies.
This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy-ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.
The interdisciplinary research and application fields of solar, solar-terrestrial and space physics encompasses a wide variety of physical and chemical phenomena. And increasingly there is a tremendous need for cross-disciplinary collaboration very often underpinned by the ability to find appropriate data and information resources, using them, and sharing outputs. On the provider side, there is a strong need to have the data they produce found and used by many stakeholder groups, i.e. not just researchers. With such diverse areas of study, and emphasis on multi-disciplinarity, the challenges presented in finding appropriate data sources in these solar-terrestrial studies mimic the challenges that emerge out of any almost any interdisciplinary project, especially in terms of the computer science, informatics, and information technology needed to address key data management capabilities, e.g. data discovery/ dissemination. This contribution addresses some specific approaches to the “search problem” in solar-terrestrial sciences. i.e. the search for data and related resources using semantics that have been undertaken over the last ~15 years. Back then the task seemed straightforward. Extracted metadata from mostly individual files was organized into catalogs and made available via Web. However, the homonym problem quickly appeared, as did confusion of what metadata meant what, and where should they reside in the catalog and the interface? At the heart of this combination of multiple disciplines, heterogeneous data, and jargon was that almost no one had used science-terminology-based use cases for exploring and formalizing semantics. Semantics potentially allowed a return to the “science query”, or that was the vision in the early 2000s. Along the way, a process for determining what semantics were required and how could they be implemented, or how should the search infrastructure be modified, was developed. Herein we recap key elements of the progress using semantics for search drawing from two projects: the Virtual Solar-Terrestrial Observatory (VSTO) that became a production, interdisciplinary virtual observatory covering solar and solar-terrestrial physics resources in two primary communities. The particular semantic Web methods and technologies that were used to design, develop and deploy VSTO are presented and discussed. As VSTO evolved, we added additional capabilities based on use cases and demonstrated measureable benefits to the intended and unintended users. In regard to domain semantics for VSTO (and later projects), an extensible, reusable ontology for solar-terrestrial physics was a key output. As VSTO was underway, other projects beyond VSTO's domain were providing clarity as to where different levels of semantics are needed in the architecting of platforms for collaborative research, built upon semantic data frameworks. Thus, the Semantic eScience Framework (SeSF) project was conceived. SeSF aimed at identifying what needed to be in a core framework, i.e. domain independent, so that domain dependencies could be plugged in, and applications deployed. One of the key applications (and still is) is search. The evolution of semantic search via faceted browsing into a cohesive user environment had implication for ontology and interface (widget) developments. Across these two projects, plus ~6 others, many lessons were learned. The ones relevant to semantic search are presented in this contribution. Two important ones were: the balance between expressivity, implementability and maintainability/extensibility of the semantics, i.e. between the level and depth of knowledge representation and what the current and evolving software and tools could support. Secondly, on the use of semantics demanded a similar balance choice among query, inference and rules. The relative combinations were based on the particular stage and iteration of the methodology. Since the evolution of search using semantics is still evolving, musings on current and future development are presented at the end.
The widespread propagation of networked computers in Earth and space science and especially the representation of scientific data in formats that can be shared, analyzed and visualized has given rise to exciting new opportunities for exploration and discovery. Data are the representation of some facts. We can see data of various subjects, types and dimensions, such as a geologic map of New York State, records of sulfur dioxide concentration in the plume of Turrialba Volcano, Costa Rica, and the global mean sea level time series derived from radar altimeter records of the TOPEX and Jason satellite series. A unique feature of most data in Earth and space science is that they are georeferenced, which means they contain positional values below, on, and/or above the surface of the Earth. Information is the meaning of data as interpreted by human beings, such as, a geologist may discover some initial clues for shale gas exploration by using a geologic map, a volcanologist may detect a few abnormal sulfur dioxide concentration values in the plume of a volcano, and a climatologist may find that the global mean sea level has been rising in the past twenty years. People use their knowledge to discover information from data. The knowledge is their expertise or familiarity with one or more subjects under working. The discovery process of data to information in turn may make new contribution to people's knowledge.
People not only possess knowledge implicitly in their brains, they also encode some of their knowledge in models, algorithms and programs that can be operated or run by computers and can be reused by other people. Data visualization plays an increasingly important role in the interactions between human and computer, and the Semantic Web provides a broader space for people to share both data and knowledge. This chapter will introduce research topics and applications associated with data visualization in the Semantic Web, with a focus on subjects in the Earth and space science. The text is organized as two parts: the first part is about concepts and theories and the second is about technologies and applications.
Geospatial information is of interest to a wide range of scientific analysis and is used to support interoperable solutions as well as the vision of “geo-enabling” the Web. Formalization of geospatial semantics is therefore of interest to a wide set of communities in support of a richer, more semantic Web with support services. The Spatial Ontology Community of Practice (SOCoP) has been engaged in a range of activities and collaborations that support this goal of a geo-enabled Web. Of particular interest is development of small, modular geospatial ontologies that can serve as building blocks for larger, publically available ontologies. These small ontologies are developed in 2–3 day long workshops called VoCamps. Workgroups are formed around defined topics and start a structured process of vocabulary and conceptual model development aimed at producing usable and reusable ontology design patterns (ODPs) that address reoccurring problems. The OPDs are grounded in real, existing data data such as available for interrelated patterns on Motion, Path and Trajectories. Geo-patterns express domain ideas in simple and intuitive ways using a handful of concepts and relations for spatiotemporal properties & geographic knowledge. The overall organizing vision of the geospatial VoCamps is to create a Core of related, modular ODPs over a series of VoCamps and related meetings. The target ontological content of this so called Descartes-Core would be a constellation of related, extensible, aligned group of ODPs which can serve as composable building blocks for a range of purposes. To date 7 workshops with a geospatial focus have been held and a range of ODPs produced such as illustrated by a semantic (annotation) trajectory pattern which is built on prior patterns and demonstrates applicability to a range of data. It, like other patterns, is also extendable and can be made more specific for particular purposes as well as being aligned to a range of external ontologies to support interoperability based on extant semantics is other ontologies.
EarthCube is a major effort of the National Science Foundation to establish a next-generation knowledge architecture for the broader geosciences. Data storage, retrieval, access, and reuse are central parts of this new effort. Currently, EarthCube is organized around several building blocks and research coordination networks. OceanLink is a semantics-enabled building block that aims at improving data retrieval and reuse via ontologies, Semantic Web technologies, and Linked Data for the ocean sciences. Cruises, in the sense of research expeditions, are central events for ocean scientists. Consequently, information about these cruises and the involved vessels is of primary interest for oceanographers, and thus, needs to be shared and made retrievable. In this paper, we report the use of a design pattern-centric strategy to model Cruise for OceanLink data integration. We provide a formal axiomatization of the introduced pattern using the Web Ontology Language, explain design choices and discuss the planned deployment and application scenarios of our model.