Ebook: Mining the Digital Information Networks
Electronic publishing is continuously changing; new technologies open new ways for individuals, scholars, communities and networks to establish contacts, exchange data, produce information and share knowledge on a variety of devices, from personal computers to mobile media. There is an urgent need to rethink electronic publishing in order to develop and use new communication paradigms and technologies, and to devise a truly digital format for the future.
This book presents the conference proceedings of the ELPUB 2013 conference, held in Karlskrona, Sweden, in June 2013. The main theme of the conference is extracting and processing data from the vast wealth of digital publishing, and the ways to use and reuse this information in innovative social contexts in a sustainable way. The conference brings together researchers and practitioners to discuss data mining, digital publishing and social networks, along with their implications for scholarly communication, information services, e-learning, e-businesses, the cultural heritage sector and other areas where electronic publishing is imperative.
The book is divided into three sections: full research articles, full professional articles and extended abstracts. Each section is further subdivided into Data Mining and Intelligent Computing, Publishing and Access and Social Computing and Practices.
Focusing on key issues surrounding the development of methods for gathering and processing information, and on the means for making these data useful and accessible, this book will be of interest to the whole digital community.
The main theme of the 17th International Conference on Electronic Publishing (ELPUB) concerns different ways to extract and process data from the vast wealth of digital publishing and how to use and reuse this information in innovative social contexts in a sustainable way. We bring together researchers and practitioners to discuss data mining, digital publishing and social networks along with their implications for scholarly communication, information services, e-learning, e-businesses, the cultural heritage sector, and other areas where electronic publishing is imperative.
ELPUB 2013 received 36 paper submissions. The peer review process resulted in the acceptance of 16 papers. From the accepted papers, 8 were submitted as full papers and 8 as extended abstracts. These papers were grouped into sessions based on the following topics: Data Mining and Intelligent Computing, Publishing and Access, and Social Computing and Practices.
James MacGregor and Karen Meijer-Kline from the Public Knowledge Project (Simon Fraser University Library, Canada) lead the pre-conference workshop on June 12. The workshop is entitled “The Future of E-publishing: An Introduction to Open Journal Systems & Open Monograph Press”.
The main program on June 13–14 features two keynotes. Stephan Shakespeare (YouGov, UK) will deliver a keynote entitled “Getting value out of our digital trace: a strategy for unleashing the economic and justice potential of data sharing”. Professor Felix S. Wu (University of California at DavisUSA) will deliver a keynote entitled “Social computing leveraging online social informatics”. ELPUB 2013 also features a panel discussion entitled “Setting research data free – problems and solutions”. The panel consists of the aforementioned keynote speakers as well as Professor David Rosenthal (Stanford University, USA) and Hans Jörgen Marker (Swedish National Data Service, Sweden).
We believe that the topics featured in the program of this year's ELPUB conference are diverse and exciting. Firstly, we would like to thank members of the ELPUB Executive Committee who, together with the Local Advisory Committee, provided valuable advice and assistance during the entire process of the organization. Secondly, we would like to thank our colleagues in the Program Committee who helped in assuring the quality of the conference throughout the peer reviewing process. Lastly, we acknowledge the Local Organization team for making sure that all efforts materialized into a very interesting scientific event. Thank you all for helping us maintain the quality of ELPUB and deserve the trust of our authors and attendees.
We wish you all a good conference and we say farewell hoping to see you again in Greece for the next installment of the conference in 2014!
Niklas Lavesson, Peter Linde, and Panayiota Polydoratou (editors)
We address the problem of improving, automatically, the usability of a large online document. We propose an adaptive hypertext approach, based on splitting the document into components smaller than the page or screen, called noogramicles, and creating each page as a new assemblage of noogramicles each time it is accessed. The adaptation comes from learning the navigation patterns of the usors (authors and readers), and is manifested in the assemblage of pages. We test this model across a number of configurations, including chance and non-adaptive systems. We evaluate our model through simulation. We have designed a simulator based on established findings about the behaviour of hypertext users. We have realised a quantitative evaluation based on hypertext usability measures adapted to the problem: session size, session cost.
This paper describes an approach towards the interaction with 3D representations of large document collections. The goal was to provide the user with a highly dynamic environment in which even the very mapping strategy to position documents in space can be adjusted by the user depending on the specific task at hand, on his preferences, or on the context. A modification to the FDP algorithm is proposed, as well as a new gesture-based interaction paradigm in which the user can explore and search information in the collection just by simple hand movements. An experimental user evaluation was carried on to investigate the impact of the proposed approach on the precision of the mental model built by users through exploration, on the effectiveness in information search tasks, and on the general user satisfaction and perception of utility.
The spread of the cover sheet is a divisive phenomenon. Their appearance is geographically bound and their content situated in the local political and financial context. In this article we discuss the arguments for and against the coversheet in its guise as a fixture on institutional repository preprints, exploring the issue through statistical information gathered from survey material. We lay out the reasoning behind the use of cover sheets in the United Kingdom and discuss their prevalence and the underlying trends.
Recent studies show that there is no method to develop a Dublin Core Application Profile (DCAP). A DCAP is a very important construct to implement interoperability, therefore it is essential to have a method to be able to develop such a construct, in order to give DCAP developers a common ground of work. This paper presents the first version of a method to develop Dublin Core Application Profiles (Me4DACP V0.1) that has been developed in a PhD project with a Design Science Research (DSR) approach. Me4DCAP was built having as starting point the Singapore Framework for DCAP and shows the way through the DCAP development. It encompasses a group of pre-defined interconnected activities, explicitly states when they should take place, what techniques could be used to execute them and what artifacts should result from their execution.
Organizing the peer review process for a scientific conference can be a cumbersome task. Electronic conference management systems support chairs and reviewers in managing the huge amount of submissions. These system implement the complete work-flow of a scientific conference. We present a new approach to such systems. By providing an open API framework instead of a closed system it enables external programs to harvest and to utilize open information sources available on the internet today.
Open access (OA) is a way of providing unrestricted access via the Internet to peer-reviewed journal articles as well as theses, monographs and book chapters. Many open access repositories have been created in the last decade. There is also a number of registry websites that index these repositories. This article analyzes the repositories indexed by the Open Archives Initiative (OAI) organization in terms of record duplication. Based on the sample of 958 metadata files containing records modified in 2012 we provide an estimate on the number of duplicates in the entire collection of repositories indexed by OAI. In addition, this work describes several open source tools that form a generic workflow suitable for deduplication of bibliographic records.
The European Library provides access to research materials from the collections of Europe's national and research libraries, representing members from 46 countries. This paper presents the current status, on-going work, and future plans of the resource dissemination services provided by The European Library, covering resources such as national bibliographies, digital collections, full text collections, its access portal and API, open linked data publication, and integration in digital humanities infrastructures. In the coming years, The European Library will work to provide the means and tools for digital humanities researchers to easily use research materials from libraries in their research activities.
This case study explores alternative science metrics on grant-supported research publications. The study is based on plosOpenR, a software package for the statistical computing environment R. plosOpenR facilitates access to the application programming interfaces (API) provided by Open Access publisher Public Library of Science (PLOS) and OpenAIRE – Open Access Infrastructure for Research in Europe.
We report 1,166 PLOS articles that acknowledge grant support from 624 different research projects funded by the European Union's 7th Framework Programme (FP7). plosOpenR allows the exploration of PLOS Article-Level Metrics (PLOS ALM), including citations, usage and social media events as well as collaboration patterns on these articles. Findings reveal the potential of reusing data, that are made openly and automatically available by publishers, funders and the repository community.
Problem and Objectives. ANACOM (Portuguese National Authority on Communications) is the regulator, supervisor and representative of the communications sector in Portugal. Through the years, ANACOM has been developing a kind of a flat controlled vocabulary: a list of controlled terms, without any structure and without specific a priori relations to the DC properties in use. One of the requirements of the project now in hands is to organize that list into one or more controlled vocabularies, relate them with other LOD vocabularies, encode them in SKOS and make them open in at least two languages (Portuguese and English).
Methodological Approach. We will adopt a hybrid approach, combining the best features of two methodologies: Ontology Development 101(1) and the “Process and Methodology for Developing Core Vocabularies” used by the European Programme ISA (2). We will use qualitative research techniques, such as interviews and focus groups to analyse and validate the relations between the terms.
Expected Results. This article presents a work in progress comprised of the whole process starting on the organization of the terms and ending in the first version of the controlled vocabularies encoded in SKOS.
The universities of the world are ranked by several different research institutions. In this study we will take a closer look at the Ranking Web of Universities (RWU) ranking, because it claims to be more extensive than the others and because its focus is on academic web presence of the universities. RWU has analyzed over 21 000 universities and the actual ranking covers 12 000 universities. This paper examines the internet visibility of the University of Jyväskylä and eight European countries and how the openness of universities has developed during last two editions of the Ranking Web of Universities. The Finnish Academy published a report The state of scientific research in Finland 2012 in October 2012. In this report The Academy is analyzing the relative citation impact of eight European countries. Analysis shows the good values of Switzerland, Netherlands and Denmark. If we analyze the statistics in the RWU and the Ranking Web of Repositories we might find connections between the success of universities, ranking in openness and repositories in Switzerland, Netherlands and Denmark. From the comparison of eight European countries we can find out that at least in Netherlands and Sweden there is a deep connection between the ranking of universities, repositories and the openness of the universities. But the most interesting developments can be seen, if we look at the trends of these numbers and figures. If we compare the top 5 universities of these countries we can see, that Finland, Denmark and Norway have improved their placing in openness substantially compared to Switzerland and even the whole RWU. These same countries seem to be in their way up also if we look at the development of relative citation impact in recent years. One explaining factor could be the Open Access policy of these countries and universities. Is it due to proper Open Access policy of the country, coincidence or unreliability of the methodology of Ranking Web of Universities is hard to say yet.
The University of Southern Mississippi's School of Library and Information Science (SLIS) publishes SLIS Connecting in the university's digital repository, Aquila Digital Community (http://aquila.usm.edu/), which is hosted through Digital Commons. The purpose of SLIS Connecting is “to share news, information and research with future students, current students, alumni, faculty, and the general population through selected faculty publications, invited student publications, refereed publications, and through regular columns” [1]. The first issue was electronically published in February 2012 and the second in October 2012. The third issue was published in February 2013 and contains the first paper submitted from an author not affiliated with SLIS. SLIS Connecting is currently indexed in Google Search and in Google Scholar. From the first issue of SLIS Connecting in February 2012 through March 2013, there were over 7,000 page views. Eighty-three percent were from U.S. While the patterns in the United States align closely to the student and alumni distribution as expected; the international pattern at first glance seems somewhat unexpected, but there are several possible reasons for the unexpected international reach. This extended abstract presents the spatial analysis of usage data of SLIS Connecting in the United States and abroad.
The sharing of the data generated by research projects is increasingly being recognised as an academic priority by funders and researchers. For example, out of 110 listed funders on the JULIET 2 service, 32 have data policies of some form. The topic has been discussed by national and international organisations, for example, ICSU (the International Council for Science), the OECD (Organisation for Economic Co-operation and Development) and the UK's Royal Society. The public statements that emerge from these scientific bodies call for both research transparency and freely available access to research data created with public funding for possible reuse. The rights associated with the sharing of data and the environment in which it can be done is also of interest to publishers. This interest can be attributed to two motivating factors: to support the academic function of data such as the corroboration of research findings and the facilitation of the re-use of data; and to respond to a strategic, commercial development, for instance, an engagement with the rights, process and environment of data sharing. Currently some publishers are introducing contractual policies on the archiving and sharing of data in addition to policies governing the deposit and sharing of research articles through repositories. The issue of policies on sharing set out by academic journals has been raised by scientific organisations, such as the US National Academy of Sciences, which urges journals to make clear statements of their sharing policies. On the other hand, the publishing community whilst broadly supporting the principle of open and accessible research data expresses concerns over the intellectual property implications of archiving shared data.
Currently the trend in digital publishing of Humanities data seems to be moving towards openness and interoperability. In this abstract I will examine to what extent and in what way current digital publications are open and accessible. My hypothesis is that while many digital publications are currently made available online and can be searched and viewed by the general public, very few are available to researchers in a meaningful way. By meaningful I mean that external researchers can search and export data for reuse and are possibly even encouraged to define their own search criteria. I believe that this is the true essence of data sharing. Following this, I will propose one approach, using XML and Web Services, to creating a digital publication of Humanities data that would be open to the research community in a meaningful way, as defined above.
Recently the technological and organizational infrastructures of institutional repositories have been questioned. For example the British so-called Finch report from last summer argued that further development, as well as higher standards of accessibility of repositories, are needed in order to make them better integrated and interoperable to ultimately bring greater use by both authors and readers. Not only the technical frameworks and presumably low usage levels are criticized but also the lack of “clear policies on such matters as the content they will accept, the uses to which it may be put, and the role that they will play in preservation”. The report concludes that: “In practice patterns of deposit are patchy”.
As in the UK, today, all universities and university colleges in Sweden, except a couple of very small and specialized ones, do have an institutional repository. A majority (around 80%) are working together on a co-operative basis within the DiVA Publishing System with the Electronic Publishing Centre at Uppsala University Library acting as the technical and organizational hub. Because the system is jointly funded, and the members contribute according to their size, it has been possible even for smaller institutions with limited resources to run a repository with exactly the same functionalities as the biggest universities.
In this presentation we want to demonstrate the ever-increasing importance of institutional repositories in Sweden. Starting more than a decade ago the DiVA Consortium has, for some time, been addressing the problems now raised by the Finch report in a number of areas.
Introduction. Many surveys have been promoted and/or supported by SPARC (Scholarly Publishing and Academic Resources Coalition) and ARL (American Research Libraries) to gain understanding of the initiatives and projects undertaken by universities and/or research institutions in the development of value-added services to manage in-house publications. These surveys considered the advantages of using open source software to manage digital contents and pointed out critical issues in the development of these services, stressing the necessity of integrating them with other digital repositories. Case studies of successful strategies are also reported, highlighting the research context, type of products and/or collections to be managed with e-publishing services.
Aim. This paper intends to describe a methodology used to analyze the editorial production of CNR Institutes belonging to the Department of Humanities and Social Sciences. This analysis is considered a pre-requisite to design a feasibility study with the aim of developing an e-publishing service tailored to HSS characteristics. To reach this aim the paper describes in particular the characteristics of editorial products defining a set of quality criteria for current production. The result of this analysis can provide insight into the identification of weak and strong points that have to be addressed when developing a new and sustainable e-publishing service.
Survey design. To gain insight into the characteristics of editorial products we identified a set of variables that express stability (start date, number of years and frequency of publication); editorial quality (presence of standardized bibliographic elements and codes; attribution of Copyright/Creative commons; Peer-review process) and visibility (indexed in national/international catalogues and/or archives, access modes). For the purpose of our analysis the results of the survey are described distinguishing between the editorial products entirely managed in-house and those that are published and/or distributed by commercial publishers. Moreover, results are also reported by type of editorial products (Monograph series, journals and e-journals and report series) considering that each type of product has specific modes of publishing and editorial process.
Results. CNR Institutes in HSS produce different types of editorial products in a stable way and with continuity over time. There is a consistent number of series that have published for more than 20 years and editorial activity is keeping pace with new-born products that also include e-Journals. No major differences emerged in the editorial quality of in-house and external products, especially if we consider formal editorial aspects. The selection of content depends on the type of products, while content evaluation of in-house publishing is not so diffused as well as the attribution of copyright/creative commons. The introduction of an e-publishing service could support more efficiently the peer-review process and also improve visibility thanks to their additional services embedded in their platform that support all the activities connected with the content exposure and retrieval in indexing and abstracting services. This is particularly important in HSS.
In the future we intend to further analyze the organization context where editorial activities are managed carrying out a questionnaire-based survey to explore the role of libraries and/or other stakeholders involved in this process as well as researchers' needs when publishing their results.
Recently, online social networks, OSNs, have gained significant popularity and are among the most popular ways to use the Internet. Additionally, researchers have become more interested in using the social interaction networks, SINs[1], in order to further enhance and personalize their services[2]. OSNs are also redefining roles within the publishing industry, allowing publishers and authors to reach and engage with readers directly[3]. However, SINs are not very easily available as of today through the current APIs provided by most OSNs. Such applications would therefore spend tremendous amount of time trying to gather the required SINs for their services. Therefore, our research problem is how we can design a system that makes social interactions in OSNs accessible. This also refers to the problem of how to crawl OSNs in a structured way, which is the focus of this short paper.