lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Why PhraseQuery translate stopwords to "?"
Date Tue, 10 Dec 2013 12:05:05 GMT
In theory, the query with holes (position increments) for stop words should 
work... unless you originally indexed the data without the stop word filter. 
Any time you change the filters, you typically need to reindex the data.

-- Jack Krupansky

-----Original Message----- 
From: Jean-Claude Dauphin
Sent: Tuesday, December 10, 2013 4:21 AM
To: java-user@lucene.apache.org
Subject: Re: Why PhraseQuery translate stopwords to "?"

Thanks a lot Jack for this explanation!

I changed the custom query analyzer to avoid incrementing the position of
the subsequent term for each stop word  as follow:
        // stop words removal
        StopFilter stopFilter = new StopFilter(Lucene.MATCH_VERSION,
                result,
                stopSet);
        // Needed to get rid of Question mark placeholders for stopwords
        stopFilter.setEnablePositionIncrements(false);

And now the translation of stopwords to "?" is not done and it works
as Iexpected, i.e:
"Biology of fresh, brackish and saline water as it contributes to tropical
delta formation" is translated to:
Title:"BIOLOGY FRESH BRACKISH SALINE WATER CONTRIBUTES TROPICAL DELTA
FORMATION"

The problem with the stopword placeholder "?" query is that the search will
not find any results while the query without "?" gives the correct results

Thanks again, I was struggling with this issue the last 2 days.

Jean-Claude Dauphin




On Mon, Dec 9, 2013 at 11:02 PM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> The analyzer is generating holes for the stop words - the position of the
> subsequent term is incremented an extra time for each stop word so that
> their positions are maintained.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jean-Claude Dauphin
> Sent: Monday, December 09, 2013 4:15 PM
> To: java-user@lucene.apache.org
> Subject: Why PhraseQuery translate stopwords to "?"
>
>
> Hi,
>
> My application uses an analyzer with a StopWordFilter. PhraseQuery
> translates queries with stopwords by replacing stopwords to "?" 
> characters.
> For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
> contribute" is replaced by "? contribute" . Sequence of terms are indexed
> without stopwords. Query Searching works when the stopword starts the
> phrase but no results when the "?"  is not at the beginning.
>
> Searching for phrases without stopwords works well.
>
> Any explanation/FAQ/user-list-message that explains why PhraseQuery
> translate stopwords to "?" would be appreciated.
>
> Thank you in advance
>
> Jean-Claude Dauphin
>
> --
> Jean-Claude Dauphin
>
> jc.dauphin@gmail.com
> jc.dauphin@afus.unesco.org
>
> http://kenai.com/projects/j-isis/
> http://www.unesco.org/isis/
> http://www.unesco.org/idams/
> http://www.greenstone.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message