Unsupervised Keyword Extraction for Japanese Legal Documents

Le, Tho Thi Ngoc; Nguyen, Minh Le; Shimazu, Akira

doi:10.3233/978-1-61499-359-9-97

Abstract

This study proposes a novel unsupervised approach for extracting keywords from Japanese legal documents by applying knowledge of Japanese syntax. Japanese keywords usually occur in chunks; the task of extracting Japanese keywords is treated as a matter of finding chunks that yield documents' important content. To find these chunks, all chunks in a given document are assigned weights to indicate their importance. Highly weighted chunks are recognized as candidate keywords, which are post-processed to obtain keywords. Although the proposed method employs simple techniques, the experimental results on Japanese legal documents show that the proposed chunk-based approach achieves better performance (10.5% higher on F1-score) than the graph-based ranking approach, the most popular unsupervised method.

This website uses cookies

This website uses cookies