lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: How to locate a Phrase inside text (like a Browser text searcher)
Date Thu, 15 May 2014 11:32:37 GMT
True, for the first two use cases, but as I indicated, the third use case is 
problematic since the token needs to be split. The n-gram solution does seem 
to cover it though, sort of.

The n-gram solution doesn't cover "good morning, john" or "good morning - 
john", but that could be handled by having a tokenizer that that simply 
ignored punctuation and whitespace and generated one big original token and 
then n-grammed it based on some maximal query phrase size. And... the 
original requirement spec didn't list that as a use case anyway.

-- Jack Krupansky

-----Original Message----- 
From: Michael Sokolov
Sent: Monday, May 12, 2014 8:39 PM
To: java-user@lucene.apache.org
Subject: Re: How to locate a Phrase inside text (like a Browser text 
searcher)

ShingleFilter can help with this; it concatenates neighboring tokens.
So a search for "good morning john" becomes a search for

"goodmorning john" OR
"good morningjohn" OR
"good morning john"

it makes your index much bigger because of all the terms, but you may
find it's worth the cost

-Mike

On 5/11/2014 9:46 PM, Jack Krupansky wrote:
> The word delimiter filter can help for "MorningJohn" by setting its option 
> to split on case change.
>
> You might be able to handle "Mailhow" using the 
> DictionaryCompoundWordTokenFilter, but that requires that you create a 
> complete dictionary of terms that can split off. That's not very 
> practical. In truth, Lucene/Solr doesn't have a good out of the box 
> solution for this use case.
>
> -- Jack Krupansky
>
> -----Original Message----- From: teko
> Sent: Thursday, May 8, 2014 9:03 AM
> To: java-user@lucene.apache.org
> Subject: How to locate a Phrase inside text (like a Browser text searcher)
>
> Hi, someone can help me with it??
> I need do a search to locate a phrase inside text, but, I need locate this
> phrase on texts like that:
> 'John Mail' <- phrase I want locate
> ' Good Morning John Mail how are you? ' < I need find this phrase here
> ' Good MorningJohn Mail how are you? ' < here too
> ' GoodMorning John Mailhow are you? ' < and here
>
> I tried using with 'WhiteSpaceAnalyzer' and 'QueryParser'... but not work
> (locate just in the first sample above... but not the others)
>
> Please, I really need help with it!
> Thanks (note: Sorry my english!! xD)
>
>
>
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-locate-a-Phrase-inside-text-like-a-Browser-text-searcher-tp4135075.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message