lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From teko <>
Subject Re: How to locate a Phrase inside text (like a Browser text searcher)
Date Fri, 16 May 2014 17:53:32 GMT
Emanuel Buzek,

Well, I tried using the method 'ShingleFilter' first, and I thought it
worked well, but, at last, it still did not work like I want..
So, I tried use NGram... I created a new analyzer to use it, and, I did a
test... Well, it works, but, I still need do some manually validation to
each result. Now, it works and return exactly all I want, but, is a bit

Let me explain what I do...

1 - First, I create a personal Analyzer, using NGramTokenizer as tolken...
in this class, contains too a StopFilter and LowercaseFitler... the default.

2 - I do query, but, I didn't use a 'literal' phrase(like that: "\"John
Mayer\"")... I search this way:
     -> to locate 'John Mayer' I do: queryParser.parser("John Mayer");
  2.1 -  It will return to me all documents that have occurrences of this
two words... but, still is not I want yet...
  2.2 - I get the Document, read the text inside document, and now, I search
(using regex) the exact combination:
     -> to locate 'John Mayer' I do: "John( |)Mayer"  -> regex

The problem, is the processe time... I did a test with a thousand phrases to
locate... and well, it takes a bit more than 40 minutes.

Is a time too long man.. Now, I'm trying solve this question... do you have
some tips?

Note: I edited the title and I removed: 'SOLVED'

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message