A semi-supervised approach was introduced to develop a semantic search system, capable of finding legal cases whose fact-asserting sentences are similar to a given query, in a large legal corpus. First, an unsupervised word embedding model learns the meaning of legal words from a large immigration law corpus. Then this knowledge is used to initiate the training of a fact detecting classifier with a small set of annotated legal cases. We achieved 90% accuracy in detecting fact sentences, where only 150 annotated documents were available. The hidden layer of the trained classifier is used to vectorize sentences and calculate cosine similarity between fact-asserting sentences and the given queries. We reached 78% mean average precision score in searching semantically similar sentences.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org