Ebook: Legal Knowledge and Information Systems
Artificial intelligence as applied to the legal domain has gained momentum thanks to the large, annotated corporate legal and case-law collections, human chats, and social media information now available in open data. Often represented in XML or other Semantic Web technologies, these now make it possible to use the AI theory developed by the JURIX community in over thirty years of research. Innovative machine and deep-learning techniques with which to classify legal texts and detect terms, principles, concepts, evidence, named entities, and rules are also emerging, and the last five years have seen a gradual increase in their practical application.
This book presents papers from the 31st International Conference on Legal Knowledge and Information Systems (JURIX 2018), held in Groningen, the Netherlands, in December 2018. The support of the Dutch Foundation for Legal Knowledge Based Systems for the JURIX conference has transformed a domestic workshop into an international event, with theoretical contributions, applied work, demo prototypes, a hackathon, and a doctoral consortium. Of the 72 submissions received, 17 full papers and 11 short papers were selected for publication, representing an acceptance rate of approximately 38%.
Machine learning for the legal domain prevails in the JURIX 2018 program, with traditional research mainstreams concerning legal reasoning and argumentation, natural-language processing, legal-text retrieval, and legal semantic modelling. An emerging topic is blockchain, which has graduated from the workshop area to the main program. The book offers an overview of the ways in which innovative information technologies are merging with legal theory, argumentation, and practice.
We are delighted to present the proceedings volume of the 31st International Conference on Legal Knowledge and Information Systems (JURIX 2018). For more than three decades, JURIX has organized an annual international conference for academics and practitioners, recently also including demos and a hackathon. The intention is to create a virtuous exchange of knowledge between theoretical research and applications in concrete legal use-cases. JURIX is also a good community where different skills work together to advance research by way of cross-fertilisation between law and computer technologies.
The JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems (www.jurix.nl). It has been hosted in a variety of European locations, extending the borders of its action and becoming an international conference in virtue of the the various nationalities of its participants and attendees.
The 2018 edition of JURIX, which runs from December 12 to 14, is hosted by the Faculty of Law and the Department of Artificial Intelligence in the Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence of the Faculty of Science and Engineering of the University of Groningen. JURIX 2018 is organised in cooperation with the Dutch Research School for Information and Knowledge Systems (SIKS). Special thanks go to Jeanne Mifsud Bonnici, Henry Prakken, and Bart Verheij and to their team for inviting us, hosting the event, and making this conference possible (http://jurix2018.ai.rug.nl).
For this edition we have received 72 submissions by 221 authors from 26 countries; 17 of these submissions were selected for publication as full papers (10 pages each) and 11 as short papers (five pages each), for a total of 28 presentations. We were inclusive in making our selection, but the competition stiff and the submissions were put through a rigorous review process with an acceptance rate of 38.8%. Borderline submissions, including those that received widely divergent marks, were accepted as short papers only.
The accepted papers have been grouped under six headings: (i) Machine Learning for the Legal Domain, a session presenting different methodologies and theoretical models applied to legislative texts and case-law (seven full papers and two short papers); (ii) Legal Reasoning and Argumentation, a session that ranges from theoretical aspects and demonstrations (three full papers and four short papers); (iii) Legal Knowledge Extraction, a session that presents natural-language processing of text for detecting terms, principles, concepts, evidence, rules, and named entities, and also speech in chatbots (two full papers and two short papers); (iv) Legal Knowledge Retrieval, a session focused on the answer-and-query approach (two full papers); (v) Legal Knowledge Modelling and Visualization, devoted to Semantic Web techniques, such as legal thesauri and ontologies (three full papers and one short paper); and (vi) Legal Blockchain, a session that has been growing in significance for several years in the workshop area, and now gains entry into the main conference (two full papers, one short paper).
This year we are honoured to have Marie-Francine Moens, full professor in the Department of Computer Science at Katholieke Universiteit Leuven, director of the Language Intelligence and Information Retrieval (LIIR) Research Lab, member of the Human Computer Interaction Group, and head of the Informatics section. She is a well-known researcher who experiments with novel methods for automated content recognition in text and multimedia, using statistical machine learning and exploiting insights from linguistic and cognitive theory. She has also successfully applied these techniques to the legal domain.
Also noteworthy is that the ethical aspects are increasingly relevant in big data and AI applications. For this reason we have asked to Jeroen van den Hoven to provide us with an overview of the ethical issues that emerge in connection with the use of emerging technologies. He is full professor of Ethics and Technology at the Delft University of Technology and editor in chief of Ethics and Information Technology.
We are very grateful to them for having accepted our invitation and for their interesting and inspiring talks.
Since 2013, JURIX has also hosted the Doctoral Consortium, now in its sixth edition. This initiative aims to attract and promote Ph.D. researchers in the area of AI & Law so as to enrich the community with original and fresh contributions. Many thanks are owed to Pompeu Casanovas, Ugo Pagallo, and Giovanni Sartor for organising the consortium this year, helped by other senior scholars.
As the previous editions, also this year the conference is growing richer with six co-located workshops. With long-running workshop like AICOL and LDA, we are continuing the TeReCom event and are hosting four new initiatives: XAILA, Legal Design, ManyLaws, and Legal Data Analytic Hackathon.
The Workshop on Artificial Intelligence and the Complexity of Legal Systems (AICOL), now in its tenth edition, is a stable event whose aim is to cut across multiple disciplines so as to examine the complexity of legal systems. The LDA workshop on Legal Data Analysis of the Central European Institute of Legal Informatics (CEILI), at its fifth edition, is devoted to the representation and analysis of legal data and documents, and to reasoning on such data and documents, using corpora and information systems.
The second edition of the Workshop on Technologies for Regulatory Compliance provides a forum for discussion of research on technologies for regulatory compliance on the basis of Semantic Web and Artificial Intelligence techniques. This workshop is supported by the LYNX European project: Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe (http://lynx-project.eu/).
The first-ever EXplainable AI in Law (XAILA) workshop aims to investigate the intersection of law and AI in order to provide a conceptual framework for ethical concepts and values in AI systems.
The first-ever ManyLaws workshop focuses on the semantic annotation of Big Legal Open Data, easily searchable and exploitable, on the basis of text-mining tools and algorithms offered through proper visualization techniques.
In this regard the Legal Design workshop integrates the previous ones with interdisciplinary and human-centered design principles to prevent or solve legal problems.
This year JURIX is also hosting a new challenge with the the Legal Data Analytics Hackathon (LeDAH), aiming to create applications and concrete projects using data-analytics methods applied to legal documents and data, with a specific focus on bringing out cognitive biases and providing visualization to support the transparency of legal information.
The JURIX 2018 conference was supported by IOS Press, BNKVI (the Benelux Association for Artificial Intelligence), and the OASIS LegalXML Steering Committee: many thanks to them, whose help made it possible to organise this event, and whose technical support contributed to attracting many participants from around the world.
Finally, we want to thank the program committee and sub-reviewers for reviewing the submissions with a professional and scientific attitude, enriched by active discussions that have ensured a fair reviewing process; the 221 authors who have submitted papers; the workshop organisers, who have enhanced the JURIX conference with emerging topics; the hackathon organizers for the applicative approach; and, finally, the members of the JURIX Steering and Executive Committees for supporting the conference year after year.
Monica Palmirani
CIRSFID, University of Bologna, Italy
We investigate named entity recognition in Greek legislation using state-of-the-art deep neural network architectures. The recognized entities are used to enrich the Greek legislation knowledge graph with more detailed information about persons, organizations, geopolitical entities, legislation references, geographical landmarks and public document references. We also interlink the textual references of the recognized entities to the corresponding entities represented in other open public datasets and, in this way, we enable new sophisticated ways of querying Greek legislation. Relying on the results of the aforementioned methods we generate and publish a new dataset of geographical landmarks mentioned in Greek legislation. We make available publicly all datasets and other resources used in our study. Our work is the first of its kind for the Greek language in such an extended form and one of the few that examines legal text in a full spectrum, for both entity recognition and linking.
We discuss the lessons learned from implementing a CATO style system using factors with magnitude. In particular we identify that giving factors magnitudes enables a diversity of reasoning styles and arguments. We distinguish a variety of ways in which factors combine to determine abstract factors. We discuss several different roles for values. Finally we identify the additional value related information required to produce a working program: thresholds and weights as well as a simple preference ordering.
Smart contracts have been proposed as executable implementations enforcing real-life contracts. Unfortunately, the semantic gap between these allows for the smart contract to diverge from its intended deontic behaviour. In this paper we show how a deontic contract can be used for real-time monitoring of smart contracts specifically and request-based interactive systems in general, allowing for the identification of any violations. The deontic logic of actions we present takes into account the possibility of action failure (which we can observe in smart contracts), allowing us to consider novel monitorable semantics for deontic norms. For example, taking a rights-based view of permissions allows us to detect the violation of a permission when a permitted action is not allowed to succeed. A case study is presented showing this approach in action for Ethereum smart contracts.
Representation and reasoning over legal rules is an important application domain and a number of related approaches have been developed. In this work, we investigate legal reasoning in practice based on three use cases of increasing complexity. We consider three representation and reasoning approaches: (a) Answer Set Programming, (b) Argumentation and (c) Defeasible Logic. Representation and reasoning approaches are evaluated with respect to semantics, expressiveness, efficiency, complexity and support.
In this work, we outline an approach for question answering over regulatory documents. In contrast to traditional means to access information in the domain, the proposed system attempts to deliver an accurate and precise answer to user queries. This is accomplished by a two-step approach which first selects relevant paragraphs given a question; and then compares the selected paragraph with user query to predict a span in the paragraph as the answer. We employ neural network based solutions for each step, and compare them with existing, and alternate baselines. We perform our evaluations with a gold-standard benchmark comprising over 600 questions on the MaRisk regulatory document. In our experiments, we observe that our proposed system outperforms other baselines.
Two years after its entry into force, the EU General Data Protection Regulation became applicable on the 25th May 2018. Despite the long time for preparation, privacy policies of online platforms and services still often fail to comply with information duties and the standard of lawfulness of data processing. In this paper we present a new methodology for processing privacy policies under GDPR's provisions, and a novel annotated corpus, to be used by machine learning systems to automatically check the compliance and adequacy of privacy policies. Preliminary results confirm the potential of the methodology.
Legal contract analysis is an important research area. The classification of clauses or sentences enables valuable insights such as the extraction of rights and obligations. However, datasets consisting of contracts are quite rare, particularly regarding German language.
Therefore this paper experiments the portability of machine learning (ML) models with regard to different document types. We trained different ML classifiers on the tenancy law of the German Civil Code (BGB) to apply the resulting models on a set of rental agreements afterwards. The performance of our models varies on the contract set. Some models perform significantly worse, while certain settings reveal a portability. Additionally, we trained and evaluated the same classifiers on a dataset consisting solely of contracts, to be able to observe a reference performance. We could show that the performance of ML models may depend on the document type used for training, while certain setups result in portable models.
The paper presents three experimental platforms for legal analytics, online environments integrating heterogeneous computational heuristics, information processing, and visualization techniques to extract actionable knowledge from legal data. Our goal is to explore innovative approaches to issues spanning from information retrieval to the quantitative analysis of legal corpora or to the study of criminal organizations for research and investigative purposes. After a brief introduction to the e-science paradigm and to the role played in it by research platforms, we focus on visual analytics as a viable way to interact with legal data. We then present the tools, their main features and the results so far obtained. The paper ends up with some considerations about the computational turn of science and its role in promoting a much needed interdisciplinary and empirical evolution of legal research.
GDPR abiding blockchain systems are feasible. Jurists, programmers, and other experts are increasingly working on this aim nowadays. Still, manifold blockchain networks functioning out there suggest a new generation of data protection issues brought about by this technology. Some of these issues will likely concern the right to erasure set up by Art. 17 of the EU data protection regulation (‘GDPR’). These cases will soon be discussed before national authorities and courts, and will likely test the technical solutions explored in this paper, such as hashing-out methods, keys destruction, chameleon hash functions, and more. By taking into account matters of design and the complex architecture of blockchains, we shall distinguish between blockchains that have been thought about to expressly meet the requirements of the EU regulation, and blockchains that, for one reason or another, e.g. ante GDPR designed blockchains, trigger some sort of clash with the legal order, that is, (i) matters of principle on e.g. political decentralization; (ii) standards on security and data protection; (iii) a mix of them; and, (iv) social clash. It is still unclear how the interplay of legal regulation, technological constraints, social norms, and market interests, will end up in this context. Rulings and court orders will be instructive. It is a clash foretold, after all.
This paper introduces PrOnto, the privacy ontology that models the GDPR main conceptual cores: data types and documents, agents and roles, processing purposes, legal bases, processing operations, and deontic operations for modelling rights and duties. The explicit goal of PrOnto is to support legal reasoning and compliance checking by employing defeasible logic theory (i.e., the LegalRuleML standard and the SPINDle engine).
In the last fifteen years, Semantic Web technologies have been successfully applied to the legal domain. By composing all those techniques and theoretical methods, we propose an integrated framework for modelling legal documents and legal knowledge to support legal reasoning, in particular checking compliance. This paper presents a proof-of-concept applied to the GDPR domain, with the aim to detect infringements of privacy compulsory norms or to prevent possible violations using BPMN and Regorous engine.
In common law jurisdictions, legal research often involves an analysis of relevant case law. Court opinions comprise several high-level parts with different functions. A statement's membership in one of the parts is a key factor influencing how the statement should be understood. In this paper we present a number of experiments in automatically segmenting court opinions into the functional and the issue specific parts. We defined a set of seven types including Background, Analysis, and Conclusions. We used the types to annotate a sizable corpus of US trade secret and cyber crime decisions. We used the data set to investigate the feasibility of recognizing the parts automatically. The proposed framework based on conditional random fields proved to be very promising in this respect. To support research in automatic case law analysis we plan to release the data set to the public.
In this paper an automated solution for finding cases for analysing the impact of legal change is proposed and the results are analysed with the help of a legal expert. It focuses on the automatic classification of 15,000 judgments within civil law. We investigated to what extent several machine learning algorithms were able to classify cases ‘correctly’. This was done with accuracies around 0.85. However, the data were scarce and the initial labelling not perfect, so further research should focus on these aspects to improve the analysis of the impact of legal change.
In this paper we report on the experience gathered in producing two gold-standard alignment datasets between the European Union thesaurus EuroVoc and two other notable resources adopted in legal environments: the thesaurus of the Italian Senate TESEO and the IATE European terminological resource. The realization of these two resources has been performed in the context of the PMKI project, an European Commission action aiming at creating a Public Multilingual Knowledge management Infrastructure to support e-commerce solutions in a multilingual environment. As of the numerous lexical and terminological resources involved in this project, ontology and thesaurus alignment and, as a consequence, the evaluation of automatically generated alignments, play a pivotal role for the success of the project.
This paper is concerned with the task of finding majority opinion (MO) in UK House of Lords (UKHL) case law by analysing agreement statements (AS) that explicitly express the appointed judges' acceptance of each other's reasoning. We introduce a corpus of 300 UKHL cases in which the relevant AS and MO have been annotated by three legal experts; and we introduce an AI system that automatically identifies this AS and MO with a performance comparable to humans.
In this paper, we propose a structured approach for transforming legal arguments to a Bayesian network (BN) graph. Our approach automatically constructs a fully specified BN graph by exploiting causality information present in legal arguments. Moreover, we demonstrate that causality information in addition provides for constraining some of the probabilities involved. We show that for undercutting attacks it is necessary to distinguish between causal and evidential attacked inferences, which extends on a previously proposed solution to modelling undercutting attacks in BNs. We illustrate our approach by applying it to part of an actual legal case, namely the Sacco and Vanzetti legal case.
We propose a method that assists legislation officers in finding inappropriate Japanese legal terms in Japanese statutory sentences and suggests corrections. In particular, we focus on sets of similar legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms in statutory sentences using Random Forest classifiers, each of which is optimized for each set of similar legal terms. Our experiment shows that our method outperformed existing modern word prediction methods using neural language models.
The decision whether to accept or reject a new case is a well established task undertaken in legal work. This task frequently necessitates domain knowledge and is consequently resource expensive. In this paper it is proposed that early rejection/acceptance of at least a proportion of new cases can be effectively achieved without requiring significant human intervention. The paper proposes, and evaluates, five different AI techniques whereby early case reject-accept can be achieved. The results suggest it is possible for at least a proportion of cases to be processed in this way.
In this work we enrich a formalism for argumentation by including a formal characterization of features related to the knowledge, in order to capture proper reasoning in legal domains. We add meta-data information to the arguments in the form of labels representing quantitative and qualitative data about them. These labels are propagated through an argumentative graph according to the relations of support, conflict, and aggregation between arguments.
This work investigates legal concepts and their expression in Portuguese, concentrating on the “Order of Attorneys of Brazil” Bar exam. Using a corpus formed by a collection of multiple-choice questions, three norms related to the Ethics part of the OAB exam, language resources (Princeton WordNet and OpenWordNet-PT) and tools (AntConc and Freeling), we began to investigate the concepts and words missing from our repertory of concepts and words in Portuguese, the knowledge base OpenWordNet-PT. We add these concepts and words to OpenWordNet-PT and hence obtain a representation of these texts that is mostly “contained” in the lexical knowledge base.
We demonstrate CISpaces.org, a tool to support situational understanding in intelligence analysis that complements but not replaces human expertise, for the first time applied to a judicial context. The system combines argumentation-based reasoning and natural language generation to support the creation of analysis and summary reports, and to record the process of forming hypotheses from relationships among information.
Legal standards for suspicion involve seemingly limitless possible factors, leaving them vague and subject to concerns of illegitimate biases by decision makers. Beginning with the relatively small number of factors present in drug interdiction stops, a model can be developed that not only predicts judicial behavior but the odds of discovering drugs. This technology will require legislatures or judges to begin the process of determining what numerical threshold of suspicion justifies investigatory detentions and searches.
Governments across the world are testing different uses of the blockchain for the delivery of their public services. Blockchain hashing–or the insertion of data in the blockchain–is one of the potential applications of the blockchain in this space. With this method, users can apply special scripts to add their data to blockchain transactions, ensuring both immutability and publicity. Blockchain hashing also secures the integrity of the original data stored on central governmental databases. The paper starts by analysing possible scenarios of hashing on the blockchain and assesses in which cases it may work and in which it is less likely to add value to a public administration. Second, the paper also compares this method with traditional digital signatures using PKI (Public Key Infrastructure) and discusses standardisation in each domain. Third, it also addresses issues related with concepts such as “distributed ledger technology” and “permissioned blockchains.” Finally, it raises the question of whether blockchain hashing is an effective solution for electronic governance, and concludes that its value is controversial, even if it is improved by PKI and other security measures. In this regard, we claim that governments need first to identify pain points in governance, and then consider the trade-offs of the blockchain as a potential solution versus other alternatives.
The growing amount of textual data in the legal domain leads to a demand for better text analysis tools adapted to legal domain specific use cases. Semantic Text Matching (STM) is the general problem of linking text fragments of one or more document types. The STM problem is present in many legal document analysis tasks, such as argumentation mining. A common solution approach to the STM problem is to use text similarity measures to identify matching text fragments. In this paper, we recapitulate the STM problem and a use case in German tenancy law, where we match tenancy contract clauses and legal comment chapters. We propose an approach similar to local interpretable model-agnostic explanations (LIME) to better understand the behavior of text similarity measures like TFIDF and word embeddings. We call this approach eXplainable Semantic Text Matching (XSTM).