lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From teko <tec...@gmail.com>
Subject Re: How to locate a Phrase inside text (like a Browser text searcher)
Date Fri, 16 May 2014 17:53:32 GMT
Emanuel Buzek,


Well, I tried using the method 'ShingleFilter' first, and I thought it
worked well, but, at last, it still did not work like I want..
So, I tried use NGram... I created a new analyzer to use it, and, I did a
test... Well, it works, but, I still need do some manually validation to
each result. Now, it works and return exactly all I want, but, is a bit
slow...

Let me explain what I do...

1 - First, I create a personal Analyzer, using NGramTokenizer as tolken...
in this class, contains too a StopFilter and LowercaseFitler... the default.

2 - I do query, but, I didn't use a 'literal' phrase(like that: "\"John
Mayer\"")... I search this way:
     -> to locate 'John Mayer' I do: queryParser.parser("John Mayer");
  2.1 -  It will return to me all documents that have occurrences of this
two words... but, still is not I want yet...
  2.2 - I get the Document, read the text inside document, and now, I search
(using regex) the exact combination:
     -> to locate 'John Mayer' I do: "John( |)Mayer"  -> regex

The problem, is the processe time... I did a test with a thousand phrases to
locate... and well, it takes a bit more than 40 minutes.

Is a time too long man.. Now, I'm trying solve this question... do you have
some tips?

Thanks...
Note: I edited the title and I removed: 'SOLVED'
    



--
View this message in context: http://lucene.472066.n3.nabble.com/SOLVED-How-to-locate-a-Phrase-inside-text-like-a-Browser-text-searcher-tp4135075p4136236.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message