
Ebook: Advances in Semantic Authoring and Publishing

Dissemination can be seen as a communication process between scientists. Over the course of several publications, they expose and support their findings, while discussing stated claims. Such discourse structures are trapped within the content of the publications, thus making the semantics discoverable only by humans. In addition, the lack of advances in scientific publishing, where electronic publications are still used as simple projections of paper documents, combined with the current growth in the amount of scientific research being published, transforms the process of finding relevant literature into a cumbersome task.
The work presented in this publication proposes a solution that takes full advantage of the support provided by electronic publications and of the current Semantic Web technologies to expose and crystallize the different discourse structures. The goal is to pave the way towards a Semantic Publishing Ecosystem that will alleviate, at least partly, the information overload problem. Our solution relies on enriching scientific publications with explicit rhetorical and argumentation discourse structures, in addition to explicit linear structures for identification and localization, and bibliographic information. Embedding these structures within the publication documents (as semantic metadata) enables the creation of semantic publications, i.e., foundational artefacts of the Semantic Publishing Ecosystem and linked resources part of the current Web of Data.
Dissemination can be seen as a communication process between scientists. Over the course of several publications, they expose and support their findings, while discussing claims stated in these publications. Unfortunately, such discourse structures are trapped within the content of the publications, thus making the semantics discoverable only by humans, and only by reading the publications. In addition, the lack of advances in scientific publishing, where electronic publications are still used as simple projections of paper documents, combined with the current growth in the amount of scientific research being published, transforms the process of finding relevant literature into a cumbersome task.
The solution relies in taking advantage of the full support provided by electronic publications and making the different discourse structures explicit. Consequently, the resulting knowledge becomes crystallised and can be shared with and by others. From a technological perspective, Semantic Web technologies provide viable ways for representing this knowledge in a machine-understandable form, as semantic metadata, and for transforming simple electronic publications into semantic publications.
The work in this thesis is about paving the way towards a Semantic Publishing Ecosystem by developing Semantic Authoring and Publishing mechanisms, with the generic goal of alleviating, at least partly, the information overload problem. More concretely, Semantic Authoring is about enriching scientific publications with explicit rhetorical and argumentation discourse structures, in addition to explicit linear structure for identification and localisation, and bibliographic information, while authoring the publication. At the same time, Semantic Publishing is about creating semantic publications, by embedding these structures encoded as semantic metadata, into the publication documents. Additionally, Semantic Publishing will also include the publishing, use and retrieval of semantic publications on the Web.
Our hypothesis is that, the Semantic Authoring and Publishing processes bring added value to researchers and improve their daily activities by enabling new functionalities for structuring, retrieving and browsing scientific publications. Furthermore, based on Semantic Authoring and Publishing, the rhetorical and argumentation discourse structures can be formalised and made machine-interpretable using knowledge representation technology. We devise solutions that: capture information present in scientific publications according to its structural, rhetorical and argumentation roles; acquire such information based on manual and automatic approaches, the latter with a satisfactory eficiency; and store, publish and expose the resulted semantic publications in a machine and human processable way.