
Ebook: Legal Knowledge and Information Systems

Like every other walk of modern life, the law has embraced digital technology, and is increasingly reliant on information systems for its efficient functioning.
This book presents papers from the 30th International Conference on Legal Knowledge and Information Systems (JURIX 2017), held in Luxembourg City, Luxembourg, in December 2017. In the three decades since they began, the JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems, and have become a fully European conference series which addresses familiar topics and extends known techniques, as well as exploring newer topics such as question answering and the use of data mining and machine learning.
Of the 42 submissions received for this edition, 12 have been selected for publication as full papers and 13 as short papers, with an acceptance rate of around 59%. The papers address a wide range of topics in artificial intelligence and law, such as argumentation, norms, evidence, belief revision, citations, case-based reasoning and ontologies. Diverse techniques such as information retrieval and extraction, machine learning, semantic web, and network analysis were applied, among others, and textual sources include legal cases, bar examinations, and legislative/regulatory documents.
The book will be of interest to all those working in the legal system who wish to keep abreast of the latest developments in information systems.
We are pleased to present to you the proceedings of the 30th International Conference on Legal Knowledge and Information Systems – JURIX 2017. For three decades, the JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems (www.jurix.nl). In the time, it has become a European conference in terms of the diverse venues throughout Europe and the nationalities of participants. The conference continues to address familiar topics and extending known techniques as well as reaching out to newer topics such as question-answering and using data-mining and machine-learning.
The 2017 edition of JURIX, which runs from 13–15 December, takes place in Luxembourg City, Luxembourg, on the Kirchberg Campus of the University of Luxembourg. We received 42 submissions for this edition, 12 of which were selected for publication as full papers (10 pages in the proceedings) and 13 as short papers (six pages in the proceedings), for an acceptance rate of around 59%. All papers were rigorously reviewed. The strongest papers were accepted as full-papers, for an acceptance rate of 28.6%, while borderline or weakly acceptable papers were accepted as short papers, making up 30% of accepted papers. The papers address a wide range of topics in Artificial Intelligence and Law, such as argumentation, norms, evidence, belief revision, citations, case based reasoning, and ontologies; diverse techniques were applied such as information retrieval and extraction, machine learning, semantic web, and network analysis amongst others; the textual sources included legal cases, Bar Examinations, and legislative/regulatory documents.
This year, our invited speakers lead AI and Law research and development in industry and government. One speaker was Tonya Custis, who is a Research Director at Thomson Reuters, where she leads a team of Research Scientists performing applied research in Artificial Intelligence technologies. She is currently leading projects that explore Question Answering and Natural Language Understanding in the Legal domain. Our other speaker was Monica Palmirani, who is a professor in Computer Science and Law and Legal Informatics at University of Bologna, School of Law. Amongst other activities, she has been a lead on efforts to develop the OASIS standards LegalDocML and the LegalRuleML, which aim to make the structure and content of legal documents machine-readable. Our speakers highlight the impact of current work of Artificial Intelligence and Law on real work practice.
In addition to the main conference program, the workshops added opportunities for work focussed on research beyond the usual JURIX scope. The First Workshop on Technologies for Regulatory Compliance provided a forum for discussion of research on technologies for regulatory compliance which use semantic resources or Artificial Intelligence techniques. The Fourth Workshop on Legal Data Analysis of the Central European Institute of Legal Informatics (LDA: CEILI) focussed on the representation, analysis, and reasoning with legal data in information systems from a lawyer's and citizen's perspective. The Ninth Workshop on Artificial Intelligence and the Complexity of Legal Systems (AICOL) welcomed research in AI, political and legal theory, jurisprudence, philosophy of technology and the law, social intelligence, and normative multi-agent systems to address the ways in which the current information revolution affects the basic pillars of today's legal and political systems. Also, the Doctoral Consortium attracted additional papers and aimed to help young researchers enter the JURIX community.
Finally, we have the honour to thank the people who have contributed to make JURIX 2017 a success: the colleagues who supported local organisation; Tom van Engers and his Doctoral Consortium committee who worked with doctoral students on their submissions; the reviewers and sub-reviewers who ensured a strict but fair reviewing process; the authors who have submitted papers; the workshop organisers who added auxiliary meetings beyond the central programme of the main conference; and last but not least, the members of the JURIX Steering Committee as well as the current JURIX board who guide JURIX over the year.
Adam Wyner – JURIX 2017 Programme Chair
Giovanni Casini – JURIX 2017 Local Organisation
In this paper, we propose a proof of concept for the ontological representation of normative requirements as Linked Data on the Web. Starting from the LegalRuleML ontology, we present an extension of this ontology to model normative requirements and rules. Furthermore, we define an operational formalization of the deontic reasoning over these concepts on top of the Semantic Web languages.
This paper describes an extended machine learning approach to classify legal norms in German statutory texts. We implemented an active machine learning (AML) framework based on open-source software. Within the paper we discuss different query strategies to optimize the selection of instances during the learning phase to decrease the required training data.
The approach was evaluated within the domain of tenancy law. Thereby, we manually labeled the 532 sentences into eight different functional types and achieved an average F1 score of 0.74. Comparing three different classifiers and four query strategies the classification performance F1 varies from 0.60 to 0.93. We could show that in norm classification tasks AML is more efficient than conventional supervised machine learning approaches.
In this paper we present the results of a study to see whether automatically generated concept maps help users of legal information systems in understanding the topics of documents they retrieve. A small formative evaluation with novice users is presented. We did not find a significant difference between the ability to connect the correct visualisation to a document between a topic cloud and concept map approach. Topic clouds are probably a little easier to understand quickly in a superficial way.
We build on two recent attempts to formalise reasoning with dimensions which effectively map dimensions into factors. These enable propositional reasoning, but sometimes a balance between dimensions needs to be struck, and to permit trade offs we need to keep the magnitudes and so reason more geometrically. We discuss dimensions and values, arguing that values can play several distinct roles, both explaining preferences between factors and indicating the purposes of the law.
Despite that many real-life contracts include time constraints, for instance explicitly specifying deadlines by when to perform actions, or for how long certain behaviour is prohibited, the literature formalising such notions is surprisingly sparse. Furthermore, one of the major challenges is that compliance is typically computed with respect to timed event traces with event timestamps assumed to be perfect. In this paper we present an approach for evaluating compliance under the effect of imperfect timing information, giving a semantics to analyse contract violation likelihood.
Case law analysis is a significant component of research on almost any legal issue and understanding which agents are involved and mentioned in a decision is integral part of the analysis. In this paper we present a first experiment in detecting mentions of different agents in court decisions automatically. We defined a light-weight and easily extensible hierarchy of agents that play important roles in the decisions. We used the types from the hierarchy to annotate a corpus of US court decisions. The resulting data set enabled us to test the hypothesis that the mentions of agents in the decisions could be detected automatically. Conditional random fields models trained on the data set were shown to be very promising in this respect. To support research in automatic case-law analysis we release the agent mentions data set with this paper.
This paper presents a belief revision operator for legal systems that considers time intervals. This model relates techniques about belief revision formalisms and time intervals with temporalised rules for legal systems. Our goal is to formalise a temporalised belief base and corresponding timed derivation, together with a proper revision operator. This operator may remove rules when needed or adapt intervals of time when contradictory norms are added in the system.
In this article we propose a novel methodology, which uses text similarity techniques to infer precise citations from the judgments of the Court of Justice of the European Union (CJEU), including their content. We construct a complete network of citations to judgments on the level of singular text units or paragraphs. By contrast to previous literature, which takes into account only explicit citations of entire judgments, we also infer implicit citations, meaning the repetitions of legal arguments stemming from past judgments without explicit reference. On this basis we can differentiate between different categories and modes of citations. The latter is crucial for assessing the actual legal importance of judgments in the citation network. Our study is an important methodological step forward in integrating citation network analysis into legal studies, which significantly enhances our understanding of European Union law and the decision making of the CJEU.
In this paper two discussions between experts about Bayesian modellings of complex criminal cases are analysed on their argumentation structure. The usefulness of several recognised argument schemes is confirmed, two new schemes for interpretation arguments and for arguments from statistics are proposed, and an analysis is given of debates about the validity of arguments. From a practical point of view the case study yields insights into the design of support software for discussions about Bayesian modellings of complex criminal cases.
We describe the use of the ANGELIC methodology, developed to encapsulate knowledge of particular legal domains, to build a full scale practical application for internal use by a firm of legal practitioners. We describe the application, the sources used, the stages in development and the application. Some evaluation of the project and its potential for further development is given. The project represents an important step in demonstrating that academic research can prove useful to legal practitioners confronted by real legal tasks.
In Brazil, all legal professionals must demonstrate their knowledge of the law and its application by passing the OAB exams, the national Bar exams. This article describes the construction of a new data set and some preliminary experiments on it, treating the problem of finding the justification for the answers to questions. The results provide a baseline performance measure against which to evaluate future improvements. We discuss the reasons to the poor performance and propose next steps.
The availability of large collections of digitalized legal texts raises an opportunity for new methodologies in legal scholarship. Analysis of citation networks of case law gives insight into which cases are related and to determine their relevance. Software tools that provide an graphical interface to case law networks are required in order to enable non-technical researches to use network analysis methodologies. In this study, we present open source software for the analysis and visualization of networks of Dutch case law, aimed for use by legal scholars. This technology assists in answering legal research questions, including determining relevant precedents, comparing the precedents with those identified in the literature, and determining clusters of related cases. The technology was used to analyze a network of cases related to employer liability.
LegalRuleML is a developing standard for representing the fine-grained semantic contents of legal texts. Such a representation would be highly useful for Semantic Web applications, but deriving formal rules from the textual source is problematic; there is currently little in the way of methodology to systematically transform language to LegalRuleML. To address this, we outline the purposes, processes, and outputs of a pilot study on the annotation of the contents of Scottish legal instruments, using key LegalRuleML elements as annotations. The resulting annotated corpus is assessed in terms of how well it answers the users' queries.
The paper presents a general formal framework representing the role of balancing of values in interpretation of statutory rules. The model developed here is an extension of the model of teleological interpretation, where a given interpretive outcome is justified if it satisfies a given goal (or a set of goals). Herein, a richer argumentative structure is discussed: an interpretive proposition concerning the interpretation of a statutory condition is justified if it is in accordance with the proper balance of applicable legally relevant values.
In this paper we present the BO-ECLI Parser, an open framework for the extraction of legal references from case-law issued by judicial authorities of European member States. The problem of automatic legal links extraction from texts is tackled for multiple languages and jurisdictions by providing a common stack which is customizable through pluggable extensions in order to cover the linguistic diversity and specific peculiarities of national legal citation practices. The aim is to increase the availability in the public domain of machine readable references metadata for case-law by sharing common services, a guided methodology and efficient solutions to recurrent problems in legal references extraction, that reduce the effort needed by national data providers to develop their own extraction solution.
Law professionals generally need to investigate a large number of items to make their decisions. However, the frameworks they use are often limited to a simple full-text search. In this paper, we propose to score the results of such searches investigating ontological and non-ontological solutions. We examine their applicabilities in a real use case dealing with jurisprudences of regional federal courts in Brazil.
A semi-supervised approach was introduced to develop a semantic search system, capable of finding legal cases whose fact-asserting sentences are similar to a given query, in a large legal corpus. First, an unsupervised word embedding model learns the meaning of legal words from a large immigration law corpus. Then this knowledge is used to initiate the training of a fact detecting classifier with a small set of annotated legal cases. We achieved 90% accuracy in detecting fact sentences, where only 150 annotated documents were available. The hidden layer of the trained classifier is used to vectorize sentences and calculate cosine similarity between fact-asserting sentences and the given queries. We reached 78% mean average precision score in searching semantically similar sentences.
We present an approach for constructing a legal knowledge-base that is sufficiently scalable to allow for large-scale corpus-level analyses. We do this by creating a polymorphic knowledge representation that includes hybrid ontologies, semistructured representations of sentences, and unsupervised statistical extraction of topics. We apply our approach to over one million judicial decision documents from Henan, China. Our knowledge-base allows us to make corpus-level queries that enable discovery, retrieval, and legal pattern analysis that shed new light on everyday law in China.
Consumer contracts too often present clauses that are potentially unfair to the subscriber. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses in online contracts. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike.
We explore how deep learning methods can be used for contract element extraction. We show that a BILSTM operating on word, POS tag, and token-shape embeddings outperforms the linear sliding-window classifiers of our previous work, without any manually written rules. Further improvements are observed by stacking an additional LSTM on top of the BILSTM, or by adding a CRF layer on top of the BILSTM. The stacked BILSTM-LSTM misclassifies fewer tokens, but the BILSTM-CRF combination performs better when methods are evaluated for their ability to extract entire, possibly multi-token contract elements.
Regulations and legislations are regularly updated, which significantly burdens up the lawyers and compliance officers with a firehose of changes. However, not all changes are significant, and only a percentage of them are of legal importance. This percentage can certainly vary in different types of regulations. This paper focuses on automatic detection or ranking of meaningful legal changes, and presents a preliminary approach based on machine learning for the same, in the domain of Internal Revenue Code (IRC) related regulatory documents. Such system would provide the users with a means to quickly identify significant legal changes.
Responsibility, as referred to in everyday life, as explored in moral philosophy and debated in jurisprudence, is a multiform, ill-defined but inescapable notion for reasoning about actions. Its presence in all social constructs suggests the existence of an underlying cognitive base. Following this hypothesis, and building upon simplicity theory, the paper proposes a novel computational approach.
In this paper we present initial results from our effort to automatically detect references in decisions of the courts in the Czech Republic and link these references to their content. We focus on references to case-law and legal literature. To deal with wide variety in how references are expressed we use a novel distributed approach to reference recognition. Instead of attempting to recognize the references as a whole we focus on their lower level constituents. We assembled a corpus of 350 decisions and annotated it with more than 50,000 annotations corresponding to different reference constituents. Here we present our first attempt to detect these constituents automatically.
Vector Space Models (VSMs) represent documents as points in a vector space derived from term frequencies in the corpus. This level of abstraction provides a flexible way to represent complex semantic concepts through vectors, matrices, and higher-order tensors. In this paper we utilize a number of VSMs on a corpus of judicial decisions in order to classify cases in terms of legal factors, stereotypical fact patterns that tend to strengthen or weaken a side's argument in a legal claim. We apply different VSMs to a corpus of trade secret misappropriation cases and compare their classification results. The experiment shows that simple binary VSMs work better than previously reported techniques but that more complex VSMs including dimensionality reduction techniques do not improve performance.