Ebook: Legal Knowledge and Information Systems
As with almost every other part of our daily lives, information technology is now indispensable in the legal sphere. The variety of applications has grown, keeping pace with developments in the wider field of artificial intelligence: logic and argument have been joined by statistical methods and data, and knowledge engineering has been enriched by machine learning.
This book presents the papers delivered at the 29th International Conference on Legal Knowledge and Information Systems – JURIX 2016, held in Nice, France, in December 2016. From the 56 submissions received for the conference, 11 were selected for publication as full papers, 10 as short papers, and 10 as posters, which are included in the proceedings for the first time. The papers address a wide range of topics at the interface of Artificial Intelligence (AI) and Law, such as argumentation, norms and evidence, network science, information retrieval, and natural language processing. Many of the theories and technologies explored in the papers are drawn from real-life materials, including cases brought before the European Court of Human Rights, Dutch and Greek legal texts, and international investment agreements and contracts.
Reflecting the many facets and the interdisciplinary character of AI and Law, the book will be of interest to all those whose work involves them in these fields.
I am delighted to present to you the proceedings of the 29th International Conference on Legal Knowledge and Information Systems – JURIX 2016. For nearly three decades the JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems (www.jurix.nl). JURIX has far outgrown its humble beginnings as a local, Dutch conference, with editions in every corner of Europe, from west (Belgium, France, the Netherlands and the UK) to east (Austria, Germany, Poland) and south (Italy, Portugal). The number of topics has also grown, keeping pace with developments in the wider field of artificial intelligence: logic and argument have been joined by statistical methods and data, and knowledge engineering has been enriched with machine learning.
The 2016 edition of JURIX, which runs from 14–16 December, takes place on the beautiful French Riviera, at the University of Nice Sophia Antipolis. We received 56 submissions for this edition, 11 of which were selected for publication as full papers (10 pages in the proceedings), 10 as short papers (six pages in the proceedings), and 10 as poster papers, each allotted four pages in the proceedings for the first time. As always, the JURIX conference aims to be inclusive rather than exclusive, with a total acceptance rate of 54% and again a Doctoral Consortium aimed at helping young researchers enter the JURIX community. In addition to being an open conference, JURIX also promotes research of the highest quality. The full-paper acceptance rate is only 23%, and all papers have undergone a rigorous reviewing process, in which borderline or weakly acceptable papers were accepted as short papers only. The papers address a wide range of topics in AI & Law, such as argumentation, norms and evidence (theory), network science, information retrieval and natural language processing (technologies). Many of these theories and technologies have been applied to real legal materials, such as cases brought before the European Court of Human Rights, Dutch and Greek legal texts, International Investment agreements and contracts.
This year, we have the honour of welcoming two ERC grant recipients as invited speakers. Jan Broersen of Utrecht University has received an ERC Consolidator grant for his project on Responsible Intelligent Systems, in which he is investigating how to automate responsibility, liability, and risk checking for autonomous systems using logical specifications and related model checkers. Norman Fenton, of Queen Mary University, London, has received an ERC Advanced grant for his project Bayes Knowledge, which aims to use Bayesian Network techniques to improve evidence-based decision-making in areas where there is little or no statistical data, such as complex legal cases. These high-profile projects demonstrate that the interdisciplinary combination of Artificial Intelligence and Law is a fruitful one, with exciting possibilities for the future.
The interdisciplinary character of AI & Law is also evident in the various workshops at the conference. The first ever MIREL workshop aims to bridge the gap between researchers working on legal ontologies and NLP parsers on the one hand, and researchers working on reasoning methods and formal logic on the other. The seventh edition of the AICOL workshop welcomes research in AI, political and legal theory, jurisprudence, philosophy and the social sciences to address the ways in which the current information revolution affects the basic pillars of today's legal and political systems. The fourth NaiL workshop aims to bring together researchers from computational social science, computational legal theory, network science, data science and related disciplines to discuss the use and usefulness of network analysis and data mining in the legal domain. Finally, the third CEILI LDA workshop will focus on representation, analysis and reasoning with legal data in information systems from the lawyer's and citizen's perspectives.
It only remains for me to thank the various people who have helped to make JURIX 2016 a success. Serena Villata, who with her team of local organisers has made this year's edition possible; Monica Palmirani, who together with her committee has tirelessly assisted those students who have submitted to the Doctoral Consortium; the 49 reviewers and sub-reviewers who have conducted the thorough reviews and participated in the lively discussions which have ensured a strict but fair reviewing process; the 117 authors who have submitted papers, demos and posters; the workshop organisers who have expanded the JURIX conference beyond the boundaries of the central programme; and finally, the members of the Jurix Steering Committee, and of the former and current Jurix board for taking care of all things Jurix all year round.
Floris Bex
JURIX 2016 Programme Chair
In this paper we present an overview of the process of argumentation with legal cases, from evidence to verdict. We identify the various different types of statement involved in the various stages, and describe how the various types relate to one another. In particular we show how we can obtain the legally accepted facts which form the basis for consideration of the law governing the cases from facts about the world. We also explain how we can determine which particular facts are relevant. In so doing we bring together several important pieces of AI and Law research and clarify their relationships.
This paper contributes to the formal research on legal interpretation by presenting a structure of normative agents. Each normative agent consists of a knowledge base, a set of preferences, and certain procedures related to the interpretation conducted by this agent. A part of typology of normative agents is presented. The investigations are illustrated with a model of a real-life example.
Natural language techniques have been employed in attempts to automatically translate legal texts, and specifically contracts, into formal models that allow automatic reasoning. However, such techniques suffer from incomplete coverage, typically resulting in parts of the text being left uninterpreted, and which, in turn, may result in the formal models failing to identify potential problems due to these unknown parts. In this paper we present a formal approach to deal with partiality, by syntactically and semantically permitting unknown subcontracts in an action-based deontic logic, with accompanying formal analysis techniques to enable reasoning under incomplete knowledge.
We report on prototype experiments expanding on prior work [2] in retrieving and ranking vaccine injury decisions using semantic information and classifying sentences as legal rules or findings about vaccine-injury causation. Our positive results include that query element coverage features and aggregate citation information using a BM25-like score can improve ranking results, and that larger amounts of annotated sentence data improve classification performance. Negative observations include that LUIMA-specific sentence features do not impact sentence classification, and that synthetic oversampling improves classification only for the sparser of the two predicted sentence types.
There is an increasing need for norms to be embedded in technology as the widespread deployment of applications such as autonomous driving and warfare becomes ever closer. Current approaches to norms in multi-agent systems tend either to simply make prohibited actions unavailable, or to provide a set of rules (principles) which the agent is obliged to follow. We argue that both these approaches are inadequate: in order to meet unexpected situations agents must be capable of violating norms, when it is appropriate to do so. This in turn requires that agents be able to reason about what they should do without reference to the norms. One way to achieve this is to conduct value based reasoning using an argumentation scheme designed for practical reasoning. Such reasoning requires that agents have an acceptable set of values and an acceptable ordering on them. We discuss what might count as an acceptable ordering on values, and how such an ordering might be determined. Law breaking is illustrated through a simple road traffic example.
What is a case decided by the European Court of Human Rights about? The Courts own case database, HUDOC, lists all the articles mentioned in a specific case in their metadata. They also supply a number of keywords, but these keywords for the most part are reduced to repeating phrases from the relevant articles. In order to enhance information retrieval about case content, without relying on manual labor and subjective judgment, we propose in this paper a quantitative method that gives a better indication of case content in terms of which articles a given case is more closely associated with. To do so, we rely on the network structure induced by existing case-to-case and case-to-article citations and propose two computational approaches (referred to as MAININ and MAINOUT) which result in assigning one representative article to each case. We validate the approach by selecting a sample of important cases and comparing manual investigation of real content of those cases with the MAININ and MAINOUT articles. Results show that MAININ in particular is able to infer correctly the real content in most of the cases.
Bayesian models are a useful tool to propagate the rational implications of human beliefs expressed as probabilities. They can yield surprising, counterintuitive and, if based on valid models, useful results. However, human users can be reluctant to accept their results if they are unable to find explanations providing clear reasons for how and why they were arrived at, which existing explanation methods struggle with. This is particularly important in the legal domain where explanatory justifications are as important as the result and where the use of Bayesian models is controversial. This paper presents a novel approach to explain how the outcome of a query of Bayesian network was arrived at. In the process, it augments the recently developed support graph methodology and shows how support graphs can be integrated with qualitative probabilistic reasoning approaches. The usefulness of the approach is illustrated by means of a small case study, demonstrating how a seemingly counterintuitive Bayesian query result can be explained with qualitative arguments.
Traditional full text search allows fast search for exact matches. However, full text search is not optimal to deal with synonyms or semantically related terms and phrases. In this paper we explore a novel method that provides the ability to find not only exact matches, but also semantically similar parts for arbitrary length search queries. We achieve this without the application of ontologies, but base our approach on Word Embeddings. Recently, Word Embeddings have been applied successfully for many natural language processing tasks. We argue that our method is well suited for legal document collections and examine its applicability for two different use cases: We conduct a case study on a stand-alone law, in particular the EU Data Protection Directive 94/46/EC (EU-DPD) in order to extract obligations. Secondly, from a collection of publicly available templates for German rental contracts we retrieve similar provisions.
In this paper we extend a formal framework presented in [6] to model reasoning across legal systems. In particular, we propose a logical system that encompasses the various interpretative interactions occurring between legal systems in the context of private international law. This is done by introducing meta-rules to reason with interpretive canons.
We present a novel approach to detecting syntactic structures that are inadequate for their domain context. We define writing style in terms of the choices between alternatives, and conducted an experiment in the legislative domain on the syntactic choice of nominalization in German, i.e. complex noun phrase vs. relative clause. In order to infer the stylistic choices that are conventional in the domain, we capture the contexts that affect the syntactic choice. Our results showed that a data-driven binary classifier can be a viable method for modelling syntactic choices in a style-checking tool.
Today's AI applications are so successful that they inspire renewed concerns about AI systems becoming ever more powerful. Addressing these concerns requires AI systems that are designed as ethical systems, in the sense that their choices are context-dependent, value-guided and rule-following. It is shown how techniques connecting qualitative and quantitative primitives recently developed for evidential argumentation in the law can be used for the design of such ethical systems. In this way, AI and Law techniques are extended to the theoretical understanding of intelligent systems guided by embedded values.
The ANGELIC (ADF for kNowledGe Encapsulation of Legal Information from Cases) project provided a methodology for implementing a system to predict the outcome of legal cases based on a theory of the relevant domain constructed from precedent cases and other sources. The method has been evaluated in several domains, including US Trade Secrets Law. Previous systems in this domain were based on factors, which are either present or absent in a case, and favour one of the parties with the same force for every factor. Evaluations have, however, suggested that the ability to represent different degrees of presence and absence, and different strengths, could improve performance. Here we extend the methodology to allow for different degrees of presence and support, by using dimensions as a bridge between facts and factors. This new program is evaluated using a standard set of test cases.
Negotiating international investment agreements is costly, complex, and prone to power asymmetries. Would it then not make sense to let computers do part of the work? In this contribution, we train a character-level recurrent neural network (RNN) to write international investment agreements. Benefitting from the formulaic nature of treaty language, the RNN generates texts of lawyer-like quality on the article-level, but fails to compose treaties in a legally sensible manner. By embedding RNNs in a user-controlled pipeline we overcome this problem. First, users can specify the treaty content categories ex ante on which the RNN is trained. Second, the pipeline allows a filtering of output ex post by identifying output that corresponds most closely to a user-selected treaty design benchmark. The result is an improved system that produces meaningful texts with legally sensible composition. We test the pipeline by comparing predicted treaties to actually concluded ones and by verifying that our filter captures latent policy preferences by predicting the outcome of current investment treaty negotiations between China and the United States.
This paper presents a theoretical account of legal validity. We begin with a very simple criterial account of validity and discuss the possibility of elimination of such concept by means of procedure described in Ross' paper Tû-Tû. Then we discuss more ambitious theoretical proposals concerning validity, advocated by Grabowski and Sartor. We make an attempt to reconcile and further generalize these accounts. Finally, we focus on the broadest view encompassing the role of institutions with regard to validity. The notion of intermediate anchoring institutions is key in the new social scenarios created through linked-data systems. Some examples are provided.
Since the OpenLaws portal is envisioned as an open environment for collaboration between legal professionals, recommendation will eventually become a collaborative filtering problem. This paper addresses the cold start problem for such a portal, where initial recommendations will have to be given, while collaborative filtering data is initially too sparse to produce recommendations. We implemented a hybrid recommendation approach, starting with a latent dirichlet allocation topic model, and progressing to collaborative filtering, and critically evaluated it. Tentative conclusion is that giving recommendations, even bad ones, will influence user selections.
Section 16(b) of the Securities Exchange Act of 1934 allows for the recovery of profits realized by certain insiders from trading in a corporation's stock within a period of less than six months. For more than seventy years, U.S. courts and corporate attorneys have calculated this liability following the greedy algorithm described in Smolowe v. Delendo Corp. (2nd Cir. 1943), which the Securities and Exchange Commission proposed as a method of maximizing the recovery in that case. Even though Dantzig's simplex algorithm (1947) subsequently provided a more accurate method for calculating the maximum recovery as the solution to a linear programming problem, the legal community to date has resisted its adoption. This paper provides (1) a brief introduction to Section 16(b) and the Smolowe algorithm; (2) a review of the caselaw that has enshrined the Smolowe algorithm in legal precedent; (3) a proof that the Smolowe algorithm's worst case error is 50%; (4) a description of a new Web-based liability calculator for the legal community's use; and (5) a historically important case where the new calculator yields a larger recovery than the amount actually sought and obtained by the plaintiffs.
This paper investigates the application of text similarity techniques to automatically detect the transposition of European Union (EU) directives into the national law. Currently, the European Commission (EC) resorts to time-consuming and expensive manual methods like conformity checking studies and legal analysis for identifying national transposition measures. We utilize both lexical and semantic similarity techniques and supplement them with knowledge from EuroVoc to identify transpositions. We then evaluate our approach by comparing the results with the correlation tables (gold standard). Our results indicate that both similarity techniques proved to be effective in detecting transpositions. Such systems could be used to identify the transposed provisions by both EC and legal professionals.
It is well recognised that it is difficult to make the semantic content of legal texts machine readable. We propose a systematic methodology to begin to render a sample legal text into LegalRuleML, which is a proposed markup for legal rules. We propose three levels – coarse, medium, and fine-grained analyses – each of which is compatible with LegalRuleML and which facilitate development from text to formal LegalRuleML. This paper provides guidelines for a coarse-grained analysis, highlighting some of the challenges to address even at this level.
Tagging court decisions as to their importance is indispensable for the accessibility of voluminous case law repositories, but such an attribute has been implemented in only a few databases yet. In this paper some of these are briefly discussed and a lowest common denominator is proposed. This could be especially useful in federated search solutions like the ECLI Search Engine.
Legal scholars study international courts by analyzing only a fraction of available material, which leaves doubts as to whether their accounts correctly capture the dynamics of international law. In this paper we use dynamic topic modeling, a family of unsupervised machine learning techniques, to gauge the shifts in the content of the case-law of international courts over longer time spans. Our results indicate that dynamic topic modeling is a powerful and reliable tool to systematically and accurately track legal change over time and enhance our understanding of courts and their influence on the law.
A growing number of Dutch court judgments is openly distributed on Rechtspraak.nl. Currently, many documents are not marked up or marked up only very sparsely, hampering our ability to process these documents automatically. In this paper, we explore the problem of automatic assignment of a section structure to these texts. We experiment with Linear-Chain Conditional Random Fields to label text elements with their roles in the document (text, title or numbering). In this subtask, we report F1 scores of around 0.91 for tagging section titles, and around 1.0 for the other types. Given a list of labels, we experiment with Probabilistic Context-Free Grammars to generate a parse tree which represents the section hierarchy of a document. In this task, we report an F1 score of 0.92.