Ebook: Legal Knowledge and Information Systems
The field of legal knowledge and information systems has traditionally been concerned with the subjects of legal knowledge representation and engineering, computational models of legal reasoning, and the analysis of legal data, but recent years have also seen an increasing interest in the application of machine learning methods to ease and empower the everyday activities of legal experts.
This book presents the proceedings of the 33rd International Conference on Legal Knowledge and Information Systems (JURIX 2020), organised this year as a virtual event on 9–11 December 2020 due to restrictions resulting from the Covid-19 pandemic. For more than three decades, the annual JURIX international conference, which now also includes demo papers, has provided a platform for academics and practitioners to exchange knowledge about theoretical research and applications in concrete legal use cases. A total of 85 submissions by 255 authors from 28 countries were received for the conference, and after a rigorous review process, 20 were selected for publication as full papers, 14 as short papers, and 5 as demo papers. This selection process resulted in a total acceptance rate of 40% (full and short papers) and a competitive 23.5% acceptance rate for full papers. Topics span from computational models of legal argumentation, case-based reasoning, legal ontologies, smart contracts, privacy management and evidential reasoning to information extraction from different types of text in legal documents, and ethical dilemmas.
Providing a state-of-the-art overview of developments in the field, this book will be of interest to all those working with legal knowledge and information systems.
We are delighted to present the proceedings volume of the 33rd International Conference on Legal Knowledge and Information Systems (JURIX 2020). For more than three decades, JURIX has organized an annual international conference for academics and practitioners, recently also including demos. The intention is to create a virtuous exchange of knowledge between theoretical research and applications in concrete legal use cases. Traditionally, this field has been concerned with legal knowledge representation and engineering, computational models of legal reasoning, and analyses of legal data. However, recent years have witnessed an increasing interest in the application of machine learning tools to relevant tasks to ease and empower legal experts everyday activities. JURIX is also a community where different skills work together to advance research by way of cross-fertilisation between law and computing technologies.
The JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems (www.jurix.nl). It has been hosted in a variety of European locations, extending the borders of its action and becoming an international conference in virtue of the the various nationalities of its participants and attendees.
The 2020 edition of JURIX, which runs from December 9 to 11, is co-hosted by the Institute of Law and Technology (Faculty of Law, Masaryk University, Brno) and the Knowledge-based Software Systems Group (Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University, Prague). Due to the Covid-19 health crisis, the conference is organised in a virtual format.
For this edition we have received 85 submissions by 255 authors from 28 countries; 20 of these submissions were selected for publication as full papers (ten pages each), 14 as short papers (four pages each) for a total of 34 presentations. In addition, 5 submissions were selected for publication as demo papers (four pages each). We were inclusive in making our selection, but the competition stiff and the submissions were put through a rigorous review process with a total acceptance rate (full and short papers) of 40%, and a competitive 23.5% acceptance rate for full papers. Borderline submissions, including those that received widely divergent marks, were accepted as short papers or demo papers only. The accepted papers cover a broad array of topics, from computational models of legal argumentation, case-based reasoning, legal ontologies, smart contracts, privacy management and evidential reasoning, through information extraction from different types of text in legal documents, to ethical dilemmas.
Two invited speakers have honored JURIX 2020 by kindly agreeing to deliver two keynote lectures: Katie Atkinson and Raja Chatila. Katie Atkinson is full professor of Computer Science and the Dean of the School of Electrical Engineering, Electronics and Computer Science at the University of Liverpool. She has also been the President of the International Association for Artificial Intelligence and Law in 2016–2017. She is one of the most significant representatives of the computational argumentation research community, and of AI and Law, where she focused on case-based reasoning and implementation of models of this in real world applications. Raja Chatila is Professor emeritus at Sorbonne Université. He is the former Director of the Institute of Intelligent Systems and Robotics (ISIR) and of the Laboratory of Excellence “SMART” on human-machine interaction. He is co-chair of the Responsible AI Working group in the Global Patnership on AI (GPAI), and he was member of the High Level Expert Group in AI with the European Commission (HLEG-AI). He is one of the main research scientists studying the ethical issues around Artificial Intelligence applications. We are very grateful to them for having accepted our invitation and for their interesting and inspiring talks.
Traditionally, the main JURIX conference is accompanied by co-located events comprising workshops and tutorials. This year’s edition welcomes five workshops: EXplainable & Responsible AI in Law (XAILA 2020), Artificial Intelligence and Patent Data, Artificial Intelligence in JUrisdictional Logistics (JULIA 2020), the Fourth Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts (ASAIL 2020), and the Workshop on Artificial Intelligence and the Complexity of Legal Systems (AICOL 2020). One tutorial, titled “Defeasible Logic for Legal Reasoning”, is also planned in this edition of JURIX. The continuation of well-established events and the organization of entirely new ones provide a great added value to the JURIX conference, enhancing its thematic and methodological diversity and attracting members of the broader community. Since 2013, JURIX has also hosted the Doctoral Consortium, now in its eighth edition. This initiative aims to attract and promote Ph.D. researchers in the area of AI & Law so as to enrich the community with original and fresh contributions.
Organizing this edition of the conference would not have been possible without the support of many people and institutions. Special thanks are due to the local organizing team chaired by Jakub Harašta and Petr Křemen. We would like to thank the workshops’ and tutorials’ organizers for their excellent proposals and for the effort involved in organizing the events. We owe our gratitude to Monica Palmirani, who kindly assumed the function of the Doctoral Consortium Chair.
This year, we are particularly grateful to the 74 members of the Program Committee for their excellent work in the rigorous review process and for their participation in the discussions concerning borderline papers. Their work has been even more appreciated provided the complex situation we are experiencing due to the pandemic. Finally, we would like to thank the former and current JURIX executive committee and steering committee members not only for their support and advice but also generally for taking care of all the JURIX initiatives.
Last but not least, this year’s conference was supported by AK Janoušek, law firm based in Prague, Czechia (www.janousekadvokat.cz) and by Artificial Intelligence Center ARIC based in Hamburg, Germany (www.aric-hamburg.de).
Serena Villata, JURIX 2020 Program Chair
Jakub Harašta, JURIX 2020 Organization Co-Chair
Petr Křemen, JURIX 2020 Organization Co-Chair
Automatically assessing driving behaviour against traffic rules is a challenging task for improving the safety of Automated Vehicles (AVs). There are no AV specific traffic rules against which AV behaviour can be assessed. Moreover current traffic rules can be imprecisely expressed and are sometimes conflicting making it hard to validate AV driving behaviour. Therefore, in this paper, we propose a Defeasible Deontic Logic (DDL) based driving behaviour assessment methodology for AVs. DDL is used to effectively handle rule exceptions and resolve conflicts in rule norms. A data-driven experiment is conducted to prove the effectiveness of the proposed methodology.
This work provides a formal model for the burden of persuasion in legal proceedings. The model shows how the allocation of the burden of persuasion may induce a satisfactory outcome in contexts in which the assessment of conflicting arguments would, without such an allocation, remain undecided. The proposed model is based on an argumentation setting in which arguments may be accepted or rejected according to whether the burden of persuasion falls on the conclusion of such arguments or on its complements. Our model merges two ideas that have emerged in the debate on the burden of persuasion: the idea that allocation of the burden of persuasion makes it possible to resolve conflicts between arguments, and the idea that its satisfaction depends on the dialectical statuses of the arguments involved. Our model also addresses cases in which the burden of persuasion is inverted, and cases in which burdens of persuasion are inferred through arguments.
In interacting with digital apps and services, users create digital identities and generate massive amounts of associated personal data. The relationship between the user and the service provider in such cases is, inter alia, a principal-agent relationship governed by a ‘contract’. This contract is provided mostly in natural language text, however, and remains opaque to users. The need of the hour is multi-faceted documentation represented in machine-readable, natural language and graphical formats, to enable tools such as smart contracts and privacy assistants which could assist users in negotiating and monitoring agreements.
In this paper, we develop a Taxonomy for the Representation of Privacy and Data Control Signals. We focus on ‘signals’ because they play a crucial role in communicating how a service provider distinguishes itself in a market. We follow the methodology for developing taxonomies proposed by Nickerson et al. We start with a grounded analysis of the documentation of four smartphone-based fitness activity trackers, and compare these to insights from literature. We present the results of the first two iterations of the design cycle. Validation shows that the Taxonomy answers (10/14) relevant questions from Perera et al.’s requirements for the knowledge-modelling of privacy policies fully, (2/14) partially, and fails to answer (2/14). It also covers signals not identified by the checklist. We also validate the Taxonomy by applying it to extracts from documentation, and argue that it shows potential for the annotation and evaluation of privacy policies as well.
The analysis of court decisions and associated events is part of the daily life of many legal practitioners. Unfortunately, since court decision texts can often be long and complex, bringing all events relating to a case in order, to understand their connections and durations is a time-consuming task. Automated court decision timeline generation could provide a visual overview of what happened throughout a case by representing the main legal events, together with relevant temporal information. Tools and technologies to extract events from court decisions however are still underdeveloped. To this end, in the current paper we compare the effectiveness of three different extraction mechanisms, namely deep learning, conditional random fields, and rule-based method, to facilitate automated extraction of events and their components (i.e., the event type, who was involved, and when it happened). In addition, we provide a corpus of manually annotated decisions of the European Court of Human Rights, which shall serve as a gold standard not only for our own evaluation, but also for the research community for comparison and further experiments.
Witness testimonies are important constituents of a court case description and play a significant role in the final decision. We propose two techniques to identify sentences representing witness testimonies. The first technique employs linguistic rules whereas the second technique applies distant supervision where training set is constructed automatically using the output of the first technique. We then represent the identified witness testimonies in a more meaningful structure – event verb (predicate) along with its arguments corresponding to semantic roles A0 and A1 [1]. We demonstrate effectiveness of such representation in retrieving semantically similar prior relevant cases. To the best of our knowledge, this is the first paper to apply NLP techniques to extract witness information from court judgements and use it for retrieving prior court cases.
Since the legal rules cannot be perfect, we have proposed a work called Legal Debugging for handling counterintuitive consequences caused by imperfection of the law. Legal debugging consists of two steps. Firstly, legal debugging interacts with a judge as an oracle that gives the intended interpretation of the law and collaboratively figures out a legal rule called a culprit, which determines as a root cause of counterintuitive consequences. Secondly, the legal debugging determines possible resolutions for a culprit . The way we have proposed to resolve a culprit is to use extra facts that have not been considered in the legal rules to describe the exceptional situation of the case. Nevertheless, the result of the resolution is usually considered as too specific and no generalizations of the resolution are provided. Therefore, in this paper, we introduce a rule generalization step into Legal Debugging. Specifically, we have reorganized Legal Debugging into four steps, namely a culprit detection, an exception invention, a fact-based induction, and a rule-based induction. During these four steps, a new introduced rule is specific at first then becomes more generalized. This new step allows a user to use existing legal concepts from the background knowledge for revising and generalizing legal rules.
Determining if a court has applied a bright-line or totality-of-the-circumstances rule for Fourth Amendment cases demonstrates a difficult problem even for human lawyers and justices. Determining the type of test that governs an issue is essential to answering a legal question. Modern natural language processing (NLP) tools, such as transformers, demonstrate the capacity to extract relevant features from unlabelled text. This study demonstrates the effectiveness of the BERT, RoBERTa, and ALBERT transformer models to classify Fourth Amendment cases by bright-line or totality-of-the-circumstances rule. Two approaches are considered in which models are trained with either positive language extracted by a domain-expert or with full texts of cases. Transformers attain up to 92.31% accuracy on full texts, further demonstrating the capability of NLP techniques on domain-specific tasks even without handcrafted features.
To date, the effort made by existing vocabularies to provide a shared representation of the data protection domain is not fully exploited. Different natural language processing (NLP) techniques have been applied to the text of privacy policies without, however, taking advantage of existing vocabularies to provide those documents with a shared semantic superstructure. In this paper we show how a recently released domain-specific vocabulary, i.e. the Data Privacy Vocabulary (DPV), can be used to discover, in privacy policies, the information that is relevant with respect to the concepts modelled in the vocabulary itself. We also provide a machine-readable representation of this information to bridge the unstructured textual information to the formal taxonomy modelled in it. This is the first approach to the automatic processing of privacy policies that relies on the DPV, fuelling further investigation on the applicability of existing semantic resources to promote the reuse of information and the interoperability between systems in the data protection domain.
One advantage of using formal deontic logic to represent and reason about normative texts is that one can analyse such texts in a precise and incontrovertible manner. Conflict analysis is one such analysis technique — assessing whether a number of contracts, or more generally normative texts, are internally consistent, in that they may not lead to a situation in which active norms conflict or even contradict each other. In this paper we extend existing techniques from the literature to address conflicts in the context of environmental constraints on actions regulated by the contract, and which the parties involved can carry out. The approach is logic-agnostic and we show how it can be applied to a service provision contract written in 𝒞ℒ.
Free Choice Permission is one of the challenges for the formalisation of norms. In this paper, we follow a novel approach that accepts Free Choice Permission in a restricted form. The intuition behind the guarded form is strongly aligned with the idea of defeasibility. Accordingly, we investigate how to model the guarded form in Defeasible Deontic Logic extended with disjunctive permissions.
As Autonomous vehicles (AVs) are entering shared roads, the challenge of designing and implementing a completely autonomous vehicle is still open. Aside from technological issues regarding how to manage the complexity of the environment, AVs raise difficult legal issues and ethical dilemmas, especially in unavoidable accident scenarios. In this context, a vast speculation depicting moral dilemmas has developed in recent years. A new perspective was proposed: an “Ethical Knob” (EK), enabling passengers to ethically customise their AVs, namely, to choose between different settings corresponding to different moral approaches or principles. In this contribution we explore how an AV can automatically learn to determine the value of its “Ethical Knob” in order to achieve a trade-off between the ethical preferences of passengers and social values, learning from experienced instances of collision. To this end, we propose a novel approach based on a genetic algorithm to optimize a population of neural networks. We report a detailed description of simulation experiments as well as possible applications.
The present work proposes the use of Latent Dirichlet Allocation to model Extraordinary Appeals received by Brazil’s Supreme Court. The data consist of a corpus of 45,532 lawsuits manually annotated by the Court’s experts with theme labels, a multi-class and multi-label classification task. We initially train models with 10 and 30 topics and analyze their semantics by examining each topic’s most relevant words and their most representative texts, aiming to evaluate model interpretability and quality. We also train models with 30, 100, 300 and 1,000 topics, and quantitatively evaluate their potential using the topics to generate feature vectors for each appeal. These vectors are then used to train a lawsuit theme classifier. We compare traditional bag-of-words approaches (word counts and tf-idf values) with the topic-based text representation to assess topic relevancy. Our topics semantic analysis demonstrate that our models with 10 and 30 topics were capable of capturing some of the legal matters discussed by the Court. In addition, our experiments show that the model with 300 topics was the best text vectoriser and that the interpretable, low dimensional representations it generates achieve good classification results.
This paper presents a multilingual legal information retrieval system for mapping recitals to articles in European Union (EU) directives and normative provisions in national legislation. Such a system could be useful for purposive interpretation of norms. A previous work on mapping recitals and normative provisions was limited to EU legislation in English and only one lexical text similarity technique. In this paper, we develop state-of-the-art text similarity models to investigate the interplay between directive recitals, directive (sub-)articles and provisions of national implementing measures (NIMs) on a multilingual corpus (from Ireland, Italy and Luxembourg). Our results indicate that directive recitals do not have a direct influence on NIM provisions, but they sometimes contain additional information that is not present in the transposed directive sub-article, and can therefore facilitate purposive interpretation.
Predicting the outcome of a legal process has recently gained considerable research attention. Numerous attempts have been made to predict the exact outcome, judgment, charge, and fines of a case given the textual description of its facts and metadata. However, most of the effort has been focused on Chinese and European law, for which there exist annotated datasets. In this paper, we introduce CASELAW4 — a new dataset of 350k common law judicial decisions from the U.S. Caselaw Access Project, of which 250k have been automatically annotated with binary outcome labels of AFFIRM or REVERSE by our hybrid learning system. To our knowledge, it is the first attempt to perform outcome extraction (a) on such a large volume of English-language judicial opinions, (b) on the Caselaw Access Project data, and (c) on US State Courts of Appeal cases, and it paves the way to large-scale outcome prediction and advanced legal analytics using U.S. Case Law. We set up baseline results for the outcome extraction task on the new dataset, achieving an F-measure of 82.32%.
This paper presents the Open Knowledge Extraction (OKE) tools combined with natural language analysis of the sentence in order to enrich the semantic of the legal knowledge extracted from legal text. In particular the use case is on international private law with specific regard to the Rome I Regulation EC 593/2008, Rome II Regulation EC 864/2007, and Brussels I bis Regulation EU 1215/2012. A Knowledge Graph (KG) is built using OKE and Natural Language Processing (NLP) methods jointly with the main ontology design patterns defined for the legal domain (e.g., event, time, role, agent, right, obligations, jurisdiction). Using critical questions, underlined by legal experts in the domain, we have built a question answering tool capable to support the information retrieval and to answer to these queries. The system should help the legal expert to retrieve the relevant legal information connected with topics, concepts, entities, normative references in order to integrate his/her searching activities.
Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.
Human-performed annotation of sentences in legal documents is an important prerequisite to many machine learning based systems supporting legal tasks. Typically, the annotation is done sequentially, sentence by sentence, which is often time consuming and, hence, expensive. In this paper, we introduce a proof-of-concept system for annotating sentences “laterally.” The approach is based on the observation that sentences that are similar in meaning often have the same label in terms of a particular type system. We use this observation in allowing annotators to quickly view and annotate sentences that are semantically similar to a given sentence, across an entire corpus of documents. Here, we present the interface of the system and empirically evaluate the approach. The experiments show that lateral annotation has the potential to make the annotation process quicker and more consistent.
Judgment prediction is the task of predicting various outcomes of legal cases of which sentencing prediction is one of the most important yet difficult challenges. We study the applicability of machine learning (ML) techniques in predicting prison terms of drug trafficking cases. In particular, we study how legal domain knowledge can be integrated with ML models to construct highly accurate predictors. We illustrate how our criminal sentence predictors can be applied to address four important issues in legal knowledge management, which include (1) discovery of model drifts in legal rules, (2) identification of critical features in legal judgments, (3) fairness in machine predictions, and (4) explainability of machine predictions.
Argument mining, a subfield of natural language processing and text mining, is a process of extracting argumentative text portions and identifying the role the selected texts play. Legal argument mining targets the argumentative parts of a legal text. In order to better understand how to apply legal argument mining as a step toward improving case summarization, we have assembled a sizeable set of cases and human-expert-prepared summaries annotated in terms of legal argument triples that capture the most important skeletal argument structures in a case. We report the results of applying multiple machine learning techniques to demonstrate and analyze the advantages and disadvantages of different methods to identify sentence components of these legal argument triples.
The theory of formal argumentation distinguishes and unifies various notions of attack, support and preference among arguments, and principles are used to classify the semantics of various kinds of argumentation frameworks. In this paper, we consider the case in which we know that an argument is supporting another one, but we do not know yet which kind of support it is. Most common in the literature is to classify support as deductive, necessary, or evidentiary. Alternatively, support is characterized using principles. We discuss the interpretation of support using a legal divorce action. Technical results and proofs can be found in an accompanying technical report.
Legislative drafters use plain language drafting techniques to increase the readability of statutes in several Anglo-American jurisdictions. Existing readability metrics, such as Flesch-Kincaid, however, are a poor proxy for how effectively drafters incorporate these guidelines. This paper proposes a rules-based operationalization of the literature’s readability measures and tests them on legislation that underwent plain language rewriting. The results suggest that our readability metrics provide a more holistic representation of a statute’s readability compared to traditional techniques. Future machine-learning classifications promise to further improve the detection of complex features, such as nominalizations.
Data privacy and protection has been a trending topic in recent years. The COVID 19 pandemic has brought about additional challenges and tensions. For example, sharing health data across several organizations is crucial for significant control and reduction of massive infection and death risks. This implies the need for broadly collecting and using personal and sensitive data, which raises the complexity of data protection and privacy challenges. Permissioned blockchain technology is one way to empower users in controlling how their data flows through the net, in a transparent and secure way, through an immutable, unified, and distributed database ruled by smart contracts. Given this background, we developed a second layer data governance model for permissioned blockchains based on the Governance Analytical Framework principles to be applied in pandemic situations. The model has been designed to organize the relationship between data subjects, data controller, and data processor. Regarding privacy concerns, our proposal complies with the Brazilian General Data Protection Law.