Ebook: Knowledge of the Law in the Big Data Age
The changes brought about by digital technology and the consequent explosion of information known as Big Data have brought opportunities and challenges in all areas of society, and the law is no exception.
This book, Knowledge of the Law in the Big Data Age contains a selection of the papers presented at the conference ‘Law via the Internet 2018’, held in Florence, Italy, on 11-12 October 2018. This annual conference of the ‘Free Access to Law Movement’ (http://www.fatlm.org) hosted more than 60 international speakers from universities, government and research bodies as well as EU institutions.
Topics covered range from free access to law and Big Data and data analytics in the legal domain, to policy issues concerning access, publishing and the dissemination of legal information, tools to support democratic participation and opportunities for digital democracy. The book is divided into 3 sections: Part I provides an introductory background, covering aspects such as the evolution of legal science and models for representing the law; Part II addresses the present and future of access to law and to various legal information sources; and Part III covers updates in projects, initiatives, and concrete achievements in the field.
The book provides an overview of the practical implementation of legal information systems and the tools to manage this special kind of information, as well as some of the critical issues which must be faced, and will be of interest to all those working at the intersection of law and technology.
Free online access to information is approaching maturity and is evolving in line with the Big Data ecosystem: data volumes are continuing to grow and so are the possibilities of what can be done with so much raw data available. The major challenges of the Big Data age are of course well known (volume of ever-increasing data, variety of data types and structures, contribution of big data to evidence-based decision making). Added to these is not only the increasing number of disciplines and problem domains where Big Data is having effects, but also the consequent challenges and opportunities for Big Data to have a major impact on science, business, and government. In recent times, the legal domain and in particular legal information management has started to embrace this trend for better accessing, disseminating and understanding law, for improved decision making, and so much more.
Big Data represents both the greatest innovation and peril of these times and promises to provide a scientific and empirical approach to law. The digital paradigm has revolutionized the ability to communicate legal information: the physical and geographical inhibitors no longer matter. Legal information must be available on the Internet, should be freely available, and nobody should have to pay to access information essential to one’s rights and obligations in a working democracy. A specific issue also involves distribution or redistribution of legal information and the way that it is accomplished. In this regard not only are there obligations on the part of the governments, but upon those who are responsible for the distribution of and access to legal information.
The topics addressed in this volume are situated in such a context, and range from free access to law, Big Data, data analytics in the legal domain, to policy issues for accessing, publishing and disseminating legal information, as well as tools to support democratic participation and opportunities for digital democracy.
These aspects, together with other related issues, have been specifically tackled and discussed at the international conference ‘Law via the Internet 2018’ which was organized by the Institute of Legal Information Theory and Techniques of the National Research Council of Italy (CNR-ITTIG) on 11–12 October 2018 in Florence. This was the annual conference of the ‘Free Access to Law Movement’ (http://www.fatlm.org) which brings together over 60 Legal Information Institutes (LIIs) from all over the world. The Florentine Conference hosted more than 60 speakers from universities, government and research bodies as well as EU institutions, who animated a lively and wide debate on the main theme that gave the specific title to the Conference and gives also now the title to this book: ‘Knowledge of the Law in the Big Data Age’.
The volume collects a selection of papers presented in Florence by Italian and foreign experts who accepted our invitation to contribute. The structure of the book reflects partially the sections designed for the Conference. Part I, entitled ‘Encountering Big Data and Law’, provides an introductory background, covering foundational issues such as legal epistemology, future of machine-driven evolution of legal science and (semi)formal models for representing the law.
Part II, dedicated to ‘Challenges and Opportunities in Disseminating and Accessing Legal Information’, addresses the present and future of access to law, with particular reference to ‘Rules, Policies and Publication Models’ for disseminating the increasingly rich and varied legal information sources and to the analysis of ‘Standards and interoperability’ issues.
Part III, ‘Experiences, Good Practices and Critical Issues’, is about essential updates in projects, initiatives, and concrete achievements in the field. The result is a picture that, far from being exhaustive, provides an overview of practical implementation of legal information systems, tools to manage this special kind of information and some critical issues to face.
The volume arises from the idea to reflect on the actual methods and strategies to access and have knowledge of the law as it is today and to compare the ways through which it is distributed and is made accessible. The intent is to present the current state of the discussion and to offer new perspectives of reflection on issues central in the current debate on the relationship between law and technology. As editors, our hope is, according to our research interests, that, by going through the chapters of this volume, the reader may realize how successful initiatives in the direction of free access to law mainly depend on at least two factors: the strong potential of interdisciplinary collaboration in today’s web environment, and the capacity of legal culture to understand and meet the challenges of the Big Data age.
We want to express many thanks to all the contributors who make this volume precious well above the merits of the editors. They deserve our gratitude for having patiently waited for the completion of this collective work. Moreover, we are glad to publish the volume with open access: this is in line with the desire for protecting free access to legal knowledge and confirms our firm belief and strong support of the philosophy of open access.
Finally, be informed that from June 1st, 2019, the Institute of Legal Information Theory and Techniques (CNR-ITTIG) which we, as researchers, are part of, will change its name to Institute of Legal Informatics and Judicial Systems (CNR-IGSG). We are particularly happy to have been able to go out with this last precious volume that leads us to a new and exciting chapter of our Institute’s life, full of new expectations, stimuli and challenges.
Florence, 30 May 2019
Ginevra Peruginelli and Sebastiano Faro
Acknowledgement: We would like to express our special thanks to Giuseppina Sabato and Simona Binazzi from CNR-ITTIG for their precious editing work of this volume.
This Chapter presents some of the main challanges put to lawyers by the growing Big Data environment. In particular it points out what are the consequences of passing from a causal logic to an inferential logic.
30 years after the advent of the World Wide Web, information and communication technologies keep triggering deep changes in the way we access, produce and use knowledge. The convergence between data warehouse facilities and computational science heuristics is populating the Internet with cloud infrastructures designed to manage and process information in completely new ways. We are facing the emergence of a new generation of online platforms integrating knowledge management, data analytics, visualization and collaboration tools for purposes that gradually move from information retrieval to scientific research. This Chapter introduces the looming of the platform era in the legal world showing, also by means of concrete examples, how these tools can be used to make the most of the growing amount of legal information today accessible online. The analysis becomes an opportunity to dwell on how computational tools can turn into the emergence of new perspectives in legal research and practice.
The Chapter aims to examine the legal remedies – both judicial and non-judicial – available in the area of electronic communication, adopting as the main comparison parameter the problem of the legal status of digital information. The infocentric structure of today’s society on the one hand does not allow for the advance identification of a clear and generalized correspondence between a subjective legal situation and digital information; on the other hand, protection mechanisms tend to converge both from a classification and a technical profile. In other words, the consolidated subjective right vs. remedy model – understood as a system of subjective situations that are pre-established by the law from which owners derive their faculty or powers and which puts the obligation to do (or also not do) in the hands of individuals or the rest of the community, and alongside which a range of protection instruments can be found that can be invoked before the courts in the case of violations (ubi jus, ibi remedium) – is often diminished and becomes more typically an action-reaction model. In a multi-subject context marked by a post-industrial, cognitive economic model, it is possible that at the operational level the administration of one type of remedy implies a different consequence for all the other subjects involved in the information flow. While respecting the diversity of the experiences analysed, the regulatory trend seems to be that of the parcelling up of behavioral standards in a preventive and collaborative key.
Some of the ordinary activities of IT practitioners require a certain degree of knowledge of IT law. Assuming these professionals will acquire legal knowledge better if expressed in terms familiar to them, this Chapter explores different manners of organising and presenting legal knowledge for its better cognition by IT professionals. This proposal features data models and knowledge organisation rooted in the specific legal theory of critical legal positivism of Kaarlo Tuori. It has been evaluated with an experiment, where BSc students in Computer Science have been provided with models and reference material describing the EU legislation on cookies, and have been asked specific questions. In sight of the new theoretical framework and the experiment results, we postulate that models and ontologies can bridge the knowledge gap and serve as lingua franca between the legal and the IT profession.
Electronic democracy is still far from being realized and several issues must be solved in order to make it possible. The quantitative problem of popular participation is one of them, but it can be mitigated through automation. This Chapter proposes two main applications that may help building a multilevel digital agora where demos, lawmakers, governments, and public administration may cooperate. The first is related to the integration, in each platform used for this purpose, of specific decision support systems. The second is inherent in the use of IT tools that, integrated into a digital agora, allow to transform the multiplicity of individual contributions into a general will.
The development of legal reasoning using decidable fragments of knowledge modeling languages is essential in the Semantic Web for the huge amount of triples available nowadays as Linked Open Data. This Chapter introduces a framework for legal knowledge representation and reasoning based on the distinction between the concepts of provision and norm, suited for different kinds of legal reasoning: legal provisions accessibility and norm compliance, respectively. The proposed framework allows the addressed types of reasoning to be implemented using OWL 2 decidable profiles and reasoners. Examples of decidable reasoning within the proposed framework are presented and tested.
The topic of public access to national court decisions has never been explicitly on the agenda of the Council of the European Union, since it has been viewed upon as a national responsibility. Recent developments though – like the introduction of the European Case Law Identifier (ECLI) and the go-live of the ECLI search engine, operated by the European Commission and containing millions of national court decisions – have raised awareness about the importance of online accessibility of national case law for the European legal order. This has set the stage for the adoption, on 8 March 2018, of the ‘Conclusions of the Council and the Representatives of the Governments of the Member States Meeting within the Council on Best Practices regarding the Online Publication of Court Decisions’. This Chapter discusses various aspects of these Conclusions. First of all, the character of such Council conclusions as a soft law instrument will be explained. Secondly, the document is reviewed in a broader context of recent policy developments and other (semi-) legal instruments. Finally, the substantive contents of the document will be examined. Although most of the best practices prescribe what is already common practice in all or most EU Member States, some provisions call upon governments and judiciaries to implement strategies that are not commonplace yet, e.g. to supply for some kind of importance qualification, indicating which, and to which extent court decisions are of relevance for others than the parties to the case.
A proper functioning of any legal system requires people to know the law. Our knowledge of the law, however, depends on how legal information are communicated. Currently, however legal information are communicated rather poorly. We are still missing opportunities that Big Data and algorithms offer in relation to how the law is published, disseminated, and accessed. This Chapter focuses on dissemination of legal information. It argues that we should strive for personalised dissemination. By highlighting and analysing examples from the history of legal publication, it argues that the shift to personalised dissemination of legal information does not pose a threat to the existing legal systems. Instead, it could enhance the overall efficiency and sustainability of our legal communication, increase our knowledge of the law, while reducing the total costs. The Chapter therefore makes a case for a new era in publication and communication of the law – the era of personalised dissemination of legal information.
The Chapter addresses, in an international/EU law perspective, the issue of the dissemination of legal research. The international legal order defines the right to science in the Article 27 of the Universal Declaration of Human Rights; the same right is cited in acts adopted by many international organizations and is included in binding instruments, mainly in the form of the principle of sharing the benefits of scientific research. Affirmed the existence of a right to science in contemporary international law, the Chapter will reconstruct its nature and content: some authors conceive it as an independent right, that deserves an autonomous protection, as it aims at increasing the quality of the life of individuals and collectivities; other scholars build it as an instrument for implementing ‘classic’ fundamental rights. Among its applications, the one related to the free dissemination of research results, promoted by the Open Access movement, is pivotal, especially with reference to public funded research. In this perspective, the Chapter will mainly focus on three issues: 1) the international law rules on the right to science as legal precursors for open access; 2) the international intellectual property rights regime as a limitation to the right to science and, by the latter, to open access; 3) artificial intelligence, fed by open access, as a means for reconstructing State practice and customary international law.
The evolution of Open Science in France is almost completely the result of constant friction with the business models that drive major international publishing houses, where each party has adapted to developments introduced by the other, but also of practical steps taken to ensure that shared documents are efficiently collected and made accessible. This Chapter will provide several examples of the development of Open Science in France, such as the platform http://dissem.in. How Open Science principles are effectively implemented in the area of legal knowledge in France? What can be done to encourage law scholars to publish their work on a single common platform? And which platform should that be? Should it be improved, and, if so, in what way? Will dialogue resolve conflicts and pave the way for Open Science in a viable economic context?
Discovery tools are specialized portals for bibliographic research widely used in libraries with heterogeneous collections of electronic and digital resources. The Chapter provides an overview of the library resource discovery environment, explaining how these technologies, methodologies, and products might be able to adapt to changes in the evolving information landscape in scholarly communications. This Chapter also attempts to explore which are the effects of discovery tools on legal research.
The European Legislation Identifier initiative (ELI) aims at bringing legislation into the global Web of data, to facilitate the access, sharing and interconnection of legal information. It proposes the creation of URI identifiers for legislation based on common components and the description of their metadata based on an ontology relying on FRBRoo; the ELI ontology includes in particular the description of the FRBR levels of abstraction, the needed date properties to describe legislation and links to relate legislative acts. Legislation metadata is thus viewed as a global graph of interconnected entities. While ELI tries to lower the entry barrier for legal publishers to disseminate structured metadata and currently counts 13 implementations, it is also facing challenges to progress towards its full potential: data quality, description of ELI datasets, alignment of thematic vocabularies or granular description of the text subdivisions. ELI has the potential to facilitate access to legal information by enabling unambiguous legal citations mark-up, giving legislation more visibility in major web search engines, describing early legislation drafts or facilitating the exchange of data between legal information systems. ELI is tightly connected to novel legal information system architectures, based on legal knowledge graphs; this style of architecture encourages legal publishers to move from a document-centric perspective towards a data-centric perspective, as exemplified by the Casemates in Luxembourg and the Cellar at the Office of Publications of the European Union.
In this paper we describe Linkoln, an open framework for the automatic detection and linking of legal references contained in legal texts. The problem was tackled by providing a modular and extensible approach in order to efficiently cover the wide variability and specific peculiarities of legal citation practices. The project was initiated in collaboration with the Italian Senate with the aim to make available to Italian legislative authorities and official publishing bodies, a robust and extensible automatic tool to improve access to published legislation. The result of this effort is Linkoln which was recently successfully integrated in the application serving documents on the institutional website of the Italian Senate to activate hyperlinks to cited legislation.
Akoma Ntoso is an international legal XML standard, whose technical specifications are now approved by the OASIS body. The standard has been developed to model legislative, parliamentary, and judicial documents using Semantic Web design principles. However, other types of normative and regulatory documents can benefit from being represented in Akoma Ntoso, making it possible to formally describe their structure, their components (e.g., attachments), their references to and from other documents, the semantic annotation of some peculiar parts of regulatory language (e.g., actions, purposes), the workflow of the creation process, and modifications over time. This Chapter presents a legal analysis of FAO Resolutions and how to apply Akoma Ntoso to interoperate with other UN documents (e.g., resolutions of the UN General Assembly). We also present the identifier naming convention for managing multilingual interconnection between documents (e.g., the UN manages six official languages). Finally, we present the ALLOT ontology application for improving semantic annotation in light of Linked Open Data. The combination of Akoma Ntoso and the ALLOT ontology makes it possible to enhance searching capacity and presentation accessibility.
This Chapter describes a four-stage methodology to generate Linguistic Linked Data for the legal domain: identification, creation, transformation (to RDF) and linking. The goal of this process is to enhance the presence of legal language resources in the Linguistic Linked Open Data cloud. Since this Chapter is framed within the H2020 LYNX project, aimed at creating a Legal Knowledge Graph, a parallel objective is to employ the resources generated as a linguistic foundation to annotate, classify and translate the legal resources represented in this graph.
The use of Artificial Intelligence (AI) in law has again become of great interest to lawyers and government. Legal Information Institutes (LIIs) have played a significant role in the provision of legal information via the Web. The concept of ‘free access to law’ is not static, and its principles now require a LII response to the renewed prominence of AI, possibly to include improving and expanding free access to legal advice. This overview of one approach, from justification to implementation, considers the potential for AI-aided free legal advice, its likely providers, and its importance to legal professionalism. The constraints that ‘free’ imposes lead to the potential roles LIIs may realistically play, and suggested guidelines for development of sustainable systems by free access providers. The AI-related services and tools that the Australasian Legal Information Institute (AustLII) is providing (the ‘DataLex’ platform) are outlined. Finally, ethical (or governance) issues LIIs need to address are discussed.
Governments publish legislation and case law widely in print and on the Web. Such legal information is provided for human consumption, but the information is usually not available as data for algorithmic analysis and applications to use. However, this would be beneficial in many use cases, such as building more intelligent juridical online services and conducting research into legislation and legal practice. To address these needs, this Chapter presents Semantic Finlex, a national in-use data resource and service for publishing Finnish legislation and related case law as Linked Open Data for legal applications to use. The system transforms and interlinks on a regular basis data from the legacy legal database Finlex of the Ministry of Justice into Linked Open Data, based on the European standards ECLI and ELI. The published data is hosted on the ‘7-star’ Linked Data Finland service and SPARQL endpoint with a variety of related services available that ease data re-use. Rich Internet Applications using SPARQL for data access are presented as application demonstrators of the data service. In addition, this Chapter presents methods and tools under development to automatically annotate legal texts and to anonymize case law documents prior to their publication on the Web. Anonymization is necessary due to issues of data protection and privacy, and annotation is needed for semantic search and interlinking the documents. The automated approaches could significantly speed up the process and minimize costs of publishing legal documents as Linked Open Data.
If free access to the law and to the case law is often considered one of the ways to develop e-justice, access to information about legal publications is somewhat off the radar. This is very strange if we consider the role of legal doctrine in the process of creating law and in the process of applying (juris-dicere) the law. The free online catalogue of the Library of the Court of Justice of the European Union gives access to all its bibliographical records and allows everyone who has Internet access to research EU law and other fields of law effectively. This Chapter analyses the tasks of the Library and the collections in order to find out how this could facilitate access to legal information in certain fields and then examine the changes that the library is facing, due to technological development, to the use of electronic resources and to the more and more stringent constraints on financial and human resources.
The motivations and processes developed at Rutgers Law Library for digitizing their print collection of United States Congressional hearings and committee prints, dating from 1967 to 2000 are discussed in this Chapter. Both the technical and collection goals of the project, and the important practical details of how it is being accomplished are described. The main theoretical goal was to show how a large scale digitization project could result in a useable, good quality, and sustainable collection while keeping costs at a scale that many institutions might consider affordable. The collection consists of over 25,000 documents. They are committee hearings and other print material that are generated as part of the U.S. Congress’ legislative and oversight roles. Although the materials have been unbound, scanned, and checked for quality by hand, most other processes have been automated to minimize cost. Equipment and other expenses have also been kept to a minimum, but without compromise to overall readability, and archival quality.