As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The representation of texts by term vectors with element values calculated by a TFIDF method yields to significant results in text similarity problems, such as finding related documents in bibliographic or full-text databases and identifying MeSH concepts from medical texts by lexical approach and also harmonizing journal citation in ISI/SciELO references and normalizing author's affiliation in MEDLINE. Our work considered “trigrams” as the terms (elements) of a term vector representing a text, according to the Trigram Phrase Matching published by the NLM's Indexing Initiative and its logarithmic Term Frequency – Inverse Document Frequency method for term weighting. Trigrams are overlapping 3-char strings from a text, extracted by a couple of rules, and a trigram matching method may improve the probability of identifying synonym phrases or similar texts. The matching process was implemented as a simple algorithm, and requires a certain amount of computer resources. An efficiency-focused C-programming was adopted. In addition, some heuristic rules improved the efficiency of the method and made it feasible a regular “find your scientific production in SciELO collection” information service. We describe an implementation of the Trigram Matching method, the software tool we developed and a set of experimental parameters for the above results.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.