Ebook: Legal Knowledge and Information Systems
Traditionally concerned with computational models of legal reasoning and the analysis of legal data, the field of legal knowledge and information systems has seen increasing interest in the application of data analytics and machine learning tools to legal tasks in recent years.
This book presents the proceedings of the 34th annual JURIX conference, which, due to pandemic restrictions, was hosted online in a virtual format from 8 – 10 December 2021 in Vilnius, Lithuania. Since its inception as a mainly Dutch event, the JURIX conference has become truly international and now, as a platform for the exchange of knowledge between theoretical research and applications, attracts academics, legal practitioners, software companies, governmental agencies and judiciary from around the world. A total of 65 submissions were received for this edition, and after rigorous review, 30 of these were selected for publication as long papers or short papers, representing an overall acceptance rate of 46 %. The papers are divided into 6 sections: Visualization and Legal Informatics; Knowledge Representation and Data Analytics; Logical and Conceptual Representations; Predictive Models; Explainable Artificial Intelligence; and Legal Ethics, and cover a wide range of topics, from computational models of legal argumentation, case-based reasoning, legal ontologies, smart contracts, privacy management and evidential reasoning, through information extraction from different types of text in legal documents, to ethical dilemmas.
Providing an overview of recent advances and the cross-fertilization between law and computing technologies, this book will be of interest to all those working at the interface between technology and law.
For more than 30 years, the Dutch Foundation for Knowledge Based Systems JURIX (https://jurix.nl) has organised annual conferences on artificial intelligence & law. Starting as a mostly Dutch event, is has spread out to Europe, having taken place in many countries (inter alia in Malta, Austria, Belgium, France, Poland, and Czech Republic). This year, the already 34th International Conference on Legal Knowledge and Information Systems (JURIX 2021) takes place in Vilnius, Lithuania. From the point of geography, Lithuania is the heart of Europe; it is not yet but may become so also in the mind of people, reminding us about the richness diversity of the “old continent”. Considering participants and speakers, JURIX2021 is now truly a European conference on artificial intelligence & law, with strong outreach to the Americas and Australasia.
This annual international conference has been open for all, in particular academics, legal practitioners, software companies, administrations, parliaments and the judiciary. It is now a place of virtuous exchange of knowledge between theoretical research and applications on artificial intelligence & law. Traditionally, this field has been concerned with legal knowledge representation and engineering, computational models of legal reasoning, and analyses of legal data. However, recent years have witnessed an increasing interest in the application of data analytics and machine learning tools to relevant tasks.
The 2021 edition of JURIX, which runs from December 8 to 10, is hosted by the Mykolas Romeris University in Vilnius. Due to the Covid-19 health crisis, the conference is organised in a virtual format. For this edition, we have received 65 submissions. 13 of these submissions were selected for publication as long papers (10 pages each), 17 as short papers (6–8 pages each) for a total of 30 presentations. We were inclusive in making our selection, but the competition was stiff and the submissions were put through a rigorous review process with a total acceptance rate (long and short papers) of 46%, and a competitive 20% acceptance rate for long papers.
The accepted papers cover a broad array of topics, from computational models of legal argumentation, case-based reasoning, legal ontologies, smart contracts, privacy management and evidential reasoning, through information extraction from different types of text in legal documents, to ethical dilemmas.
Invited speakers have honored JURIX 2021 by kindly agreeing to deliver a keynote lecture: Friedrich Lachmayer and Vytautas Čyras. Friedrich Lachmayer is a retired high-level lawyer of the Austrian administration – the legal service of the Federal Chancellery, a glorified docent (Professor at the University of Innsbruck) and a well-known expert on legal theory and legal visualization. Vytautas Čyras is a professor at the University of Vilnius and has worked for more than 15 years on these topics.
We are very grateful to them for having accepted our invitation and for their interesting and inspiring talks.
Traditionally, the main JURIX conference is accompanied by co-located events comprising workshops and tutorials. This year’s edition welcomes six workshops and one tutorial:
-
1st Workshop in Agent-based Modeling & Policy-Making (AMPM 2021)
-
AI Approaches to the Complexity of Legal Systems (AICOL 2021)
-
CEILI Workshop on Legal Data Analysis (LDA21)
-
EXplainable & Responsible AI in Law (XAILA 2021)
-
The First International Workshop on Intelligent Regulatory Systems (IRS 2021)
-
Use of Information Technology in Judicial Processes (MRU 2021)
-
Tutorial on Legal Informatics Topics: Legal Tech & Privacy Impact Assessment (TLIT2021)
We would like to thank the workshops’ and tutorials’ organizers for their excellent proposals and for the effort involved in organizing the events.
The continuation of well-established events and the organization of entirely new ones provide a great added value to the JURIX conference, enhancing its thematic and methodological diversity and attracting members of the broader community.
Since 2013, JURIX has also hosted the Doctoral Consortium, now in its ninth edition. This initiative aims to attract and promote Ph.D. researchers in the area of AI & Law so as to enrich the community with original and fresh contributions. We owe our gratitude to Monica Palmirani who started the Doctoral Consortium.
Organizing this conference would not have been possible without the support of many people and institutions. Special thanks are due to the local organizing team chaired by Lyra Jakulevičienė and Paulius Pakutinskas of the Legal Tech Centre and Law School, Mykolas Romeris University (Lithuania).
Thanks are also due to the University of Vienna, Arbeitsgruppe Rechtsinformatik, Juridicum and its related organisations, in particular the Wiener Zentrum für Rechtsinformatik (WZRI) and IRI§-Conferences. These efforts were sponsored also by Cybly, Wien/Salzburg and Weblaw, Bern.
This year, we are particularly grateful to the members of the Program Committee for their excellent work in the rigorous review process and for their participation in the discussions concerning borderline papers. Senior Members have provided additional support. Sub-reviewers have done a rigorous check on some papers. Their work has been even more appreciated provided the complex situation we are experiencing due to the pandemic.
Last but not least, this year’s conference was organized in partnership with GO Vilnius, Lithuanian Bar Association and Amberlo.
Finally, we would like to thank the former and current JURIX executive committee and steering committee members.
Erich Schweighofer
JURIX 2021 Programme Chair
This paper explores the subject matter of legal informatics. The life-long work of the first author concerning the visualization and coding of statutes is generalized. Besides positive law and customary law, the emergence of machine law is a current topic of focus in the literature. In machine law, legal acts are posited by machines and not by humans (primarily in a situational context). The transformation of a legal act to a legal document can happen in two ways. First, it is a transformation of the legal act into explicit punctuation, for example, for announcement in the case of laws or for written execution in the case of judgments, and, second, as a trend towards electronic documents. Legal theory forms a meta-level to the law and similarly legal informatics forms a meta-level to legal information. Legal informatics in Austria is based on the work of Ota Weinberger, Ilmar Tammelo and Leo Reisinger and has been developed by Erich Schweighofer in the framework of the IRIS conferences. Legal informatics is distinguished from legal information, whereas legal logic and meta-theories appear on top of legal informatics. In terms of syntax, machine culture is characterized by formal notations. Notations of legal logic are just the beginning; the target is a technical notation, a basis for programming. Visualizations are in the middle. On the one hand, visualizations serve to understand people by breaking away from the textual; on the other hand, by emphasizing the formal they form a bridge to machines. Legal text can be translated directly into formal languages, but visualizations can facilitate this task as an intermediate methodological step. Hans-Georg Fill’s metamodeling can be seen as a metameta-level.
In this paper we attempt to identify eviction judgements within all case law published by Dutch courts in order to automate data collection, previously conducted manually. To do so we performed two experiments. The first focused on identifying judgements related to eviction, while the second focused on identifying the outcome of the cases in the judgements (eviction vs. dismissal of the landlord’s claim). In the process of conducting the experiments for this study, we have created a manually annotated dataset with eviction-related judgements and their outcomes.
Tools must be developed to help draft, consult, and explore textual legal sources. Between statistical information retrieval and the formalization of textual rules for automated legal reasoning, we defend a more pragmatic third way that enriches legal texts with a coarse-grained, interpretation-neutral, semantic annotation layer. The aim is that legal texts can be enriched on a large scale at a reasonable cost, paving the way for new search capabilities that will facilitate mining of legal sources. This new approach is illustrated on a proof-of-concept experiment that consisted in semantically annotating a significant part of the French version of the GDPR. The paper presents the design methodology of the annotation language, a first version of a Core Legal Annotation Language (CLAL), together with its formalization in XML, the gold standard resulting from the annotation of GDPR, and examples of user questions that can be better answered by semantic than by plain text search. This experimentation demonstrates the potential of the proposed approach and provides a basis for further development. All resources developed for that GDPR experiment are language independent and are publicly available.
In this paper, we treat sentence annotation as a classification task. We employ sequence-to-sequence models to take sentence position information into account in identifying case law sentences as issues, conclusions, or reasons. We also compare the legal domain specific sentence embedding with other general purpose sentence embeddings to gauge the effect of legal domain knowledge, captured during pre-training, on text classification. We deployed the models on both summaries and full-text decisions. We found that the sentence position information is especially useful for full-text sentence classification. We also verified that legal domain specific sentence embeddings perform better, and that meta-sentence embedding can further enhance performance when sentence position information is included.
Various online databases exist to make judgments accessible in the digital age. Before a legal practitioner can utilize state-of-the-art information retrieval features to retrieve relevant court rulings, the textual document must be processed. More importantly, many verdicts lack crucial semantic information which can be utilized within the search process. One piece of information that is frequently missed, as the judge is not adding it during the publication process within the court, is the so-called norm chain. This list contains the most relevant norms for the underlying decision.
Therefore this paper investigates the feasibility of automatically extracting the most relevant norms of a court ruling. A dataset constituting over 42k labeled court rulings was used in order to train different classifiers. While our models provide F1 performances of up to 0.77, they can undoubtedly be utilized within the editorial publication process to provide helpful suggestions.
Machine learning research typically starts with a fixed data set created early in the process. The focus of the experiments is finding a model and training procedure that result in the best possible performance in terms of some selected evaluation metric. This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance of a trained deep learning classifier. Our experiments suggest that analyzing how data set properties affect performance can be an important step in improving the results of trained classifiers, and leads to better understanding of the obtained results.
We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches competitive performance with deep learning models. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We discuss some hypotheses for these results to support future discussions.
We present a study aimed at testing the CLAUDETTE system’s ability to generalise the concept of unfairness in consumer contracts across diverse market sectors. The data set includes 142 terms of services grouped in five sub-sets: travel and accommodation, games and entertainment, finance and payments, health and well-being, and the more general others. Preliminary results show that the classifier has satisfying performance on all the sectors.
This paper presents an AI use-case developed in the project “Study on legislation in the era of artificial intelligence and digitization” promoted by the EU Commission Directorate-General for Informatics. We propose a hybrid technical framework where AI techniques, Data Analytics, Semantic Web approaches and LegalXML modelisation produce benefits in legal drafting activity. This paper aims to classify the corrigenda of the EU legislation with the goal to detect some criteria that could prevent errors during the drafting or during the publication process. We use a pipeline of different techniques combining AI, NLP, Data Analytics, Semantic annotation and LegalXML instruments for enriching the non-symbolic AI tools with legal knowledge interpretation to offer to the legal experts.
Legal case summarization is an important problem, and several domain-specific summarization algorithms have been applied for this task. These algorithms generally use domain-specific legal dictionaries to estimate the importance of sentences. However, none of the popular summarization algorithms use document-specific catchphrases, which provide a unique amalgamation of domain-specific and document-specific information. In this work, we assess the performance of two legal document summarization algorithms, when two different types of catchphrases are incorporated in the summarization process. Our experiments confirm that both the summarization algorithms show improvement across all performance metrics, with the incorporation of document-specific catchphrases.
The Indian court system generates huge amounts of data relating to administration, pleadings, litigant behaviour, and court decisions on a regular basis. But the existing Judiciary is incapable of managing these vast troves of data efficiently that causes delays and pendency of a large volume of cases in the courts. Some of these time-consuming tasks involve case briefing, examining the legal issues, facts, legal principles, observations, and other significant aspects submitted by the contending parties in the court. In other words, computational methods to understand the underlying structure of a case document will directly aid the lawyers to perform these tasks efficiently and improve the overall efficiency of the Justice delivery system. Application of Computational techniques (such as Natural Language Processing) can help to gather and sift through these vast troves of information, identify patterns, extract the document structure, draft documents and make the information available online.
Traditionally lawyers are trained to examine cases using the Case Law Analysis approach for case briefing. In this article, the authors aim to establish the importance and relevance of the automated case analysis problem in the legal domain. They introduce a novel case analysis structure for the supreme court judgment documents and define twelve different case law labels that are used by legal professionals to identify the structure. Finally the authors propose a method for automated case analysis, which will directly aid the lawyers to prepare speedy and efficient case briefs and drastically reduce the time taken by them in litigation.
Automatic summarization of legal case documents is an important and challenging problem, where algorithms attempt to generate summaries that match well with expert-generated summaries. This work takes the first step in analyzing expert-generated summaries and algorithmic summaries of legal case documents. We try to uncover how law experts write summaries for a legal document, how various generic as well as domain-specific extractive algorithms generate summaries, and how the expert summaries vary from the algorithmic summaries. We also analyze which important sentences of a legal case document are missed by most algorithms while generating summaries, in terms of the rhetorical roles of the sentences and the positions of the sentences in the legal document.
Online legal document libraries, such as WorldLII, are indispensable tools for legal professionals to conduct legal research. We study how topic modeling techniques can be applied to such platforms to facilitate searching of court judgments. Specifically, we improve search effectiveness by matching judgments to queries at semantics level rather than at keyword level. Also, we design a system that summarizes a retrieved judgment by highlighting a small number of paragraphs that are semantically most relevant to the user query. This summary serves two purposes: (1) It explains to the user why the machine finds the retrieved judgment relevant to the user’s query, and (2) it helps the user quickly grasp the most salient points of the judgment, which significantly reduces the amount of time needed by the user to go through the returned search results. We further enhance our system by integrating domain knowledge provided by legal experts. The knowledge includes the features and aspects that are most important for a given category of judgments. Users can then view a judgement’s summary focusing on particular aspects only. We illustrate the effectiveness of our techniques with a user evaluation experiment on the HKLII platform. The results show that our methods are highly effective.
Legal definitions are an integral part of legal drafting practice to understand legal documents easily and prevent ambiguity. This research aims to describe how legal definitions are used among regulations in the domain of Indonesian Treasury and Budget. Simple text mining techniques are used to perform and deliver the process. We extracted definitions from more than 1.362 related regulations enacted through the period 2003–2020. We found that legal definitions were used in many variations which may lead to inconsistencies.
In this paper, we introduce BART2S a novel framework based on BART pretrained models to generate terms of service in high quality. The framework contains two parts: a generator finetuned with multiple tasks and a discriminator fine-tuned to distinguish the fair and unfair terms. Besides the novelty in design and the implementation contributions, the proposed framework can support drafting terms of service, a growing need in the digital age. Our proposed approach allows the system to reach a balance between automation and the will expression of the service provider. Through experiments, we demonstrate the effectiveness of the method and discuss potential future directions.
This work investigates information retrieval methods to address the existing difficulties on the Preliminary Search, part of the law making process from the Brazilian Chamber of Deputies. For such, different preprocessing approaches, stemmers, language models, and BM25 variants were compared. Two legislative corpora from Chamber were used to build and validate the pipeline. All texts were converted to lowercase and had stopwords, accentuation, and punctuation removed. Words were represented by their stem combined with word unigram and bigram language models. Retrieving the bill that was originated from a specific job request, the BM25L with Savoy stemmer reached a R@20 of 0.7356. After removing queries with inconsistencies or which made reference exclusively to attachments, to other job requests, or to bills, the R@20 increased to 0.94.
NLP-based techniques can support in improving understanding of legal text documents. In this work we present a semi-automatic framework to extract signal phrases from legislative texts for an arbitrary European language. Through a case study using Dutch legislation, we demonstrate that it is feasible to extract these phrases reliably with a small number of supporting domain experts. Finally, we argue how in future works our framework could be utilized with existing methods to be applied to different languages.
In this article, I present the results of the human evaluation experiment of three commonly used methods in legal information retrieval and a new “multilayered” approach. I use the doc2vec model, citation network analysis and two topic modelling algorithms for the Czech Supreme Court decisions retrieval and evaluate their performance. To improve the accuracy of the results of these methods, I combine the methods in a “multilayered” way and perform the subsequent evaluation. Both evaluation experiments are conducted with a group of legal experts to assess the applicability and usability of the methods for legal information retrieval. The combination of the doc2vec and citations is found satisfactory accurate for practical use for the Czech court decisions retrieval.
Inspired by Kelsen’s view that norms establish causal-like connections between facts and sanctions, we develop a deontic logic in which a proposition is obligatory iff its complement causes a violation. We provide a logic for normative causality, define non-contextual and contextual notions of illicit and duty, and show that the logic of such duties is well-behaved and solves the main deontic paradoxes.
This paper presents a Semantic Web–based model for detecting contradictions in regulations. We introduce a conceptual model of contradictions and, on the basis of this model, a knowledge representation–based model is used, which is able to represent the semantics of provision types and related properties. The usefulness of the model is shown through an example.
This paper describes a tool using an extended Data Privacy Vocabulary (the DPV) to audit and monitor GDPR compliance of international transfers of personal data. New terms were identified which have been proposed as extensions to the DPV W3C Working Group. A prototype software tool was built based on the model plus a set of validation rules, and synthetic use-cases created to test the capabilities of the model and tool (together a compliance framework). This framework was created because the rules around international transfer compliance are complex and changing, there is an absence of a common approach to ensuring compliance, few tools exist to assist, and those that do lack interoperability. Evaluation results demonstrate that the proposed model improves compliance identification and standardisation. The tool received positive feedback from the data protection practitioners who participated in the evaluation, and an initial version of is now in use in one financial services organisation. While currently the tool only addresses international transfers, in theory the framework can be extended through further work to the broader area of compliance of other aspects of the GPDR.