Natural Language Processing Applications in Case-Law Text Publishing

Tarasconi, Francesco; Botros, Milad; Caserio, Matteo; Sportelli, Gianpiero; Giacalone, Giuseppe; Uttini, Carlotta; Vignati, Luca; Zanetta, Fabrizio

doi:10.3233/FAIA200859

Abstract

Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.

This website uses cookies

This website uses cookies