lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Claude Dauphin <jc.daup...@gmail.com>
Subject Re: Why PhraseQuery translate stopwords to "?"
Date Tue, 10 Dec 2013 09:21:11 GMT
Thanks a lot Jack for this explanation!

I changed the custom query analyzer to avoid incrementing the position of
the subsequent term for each stop word  as follow:
        // stop words removal
        StopFilter stopFilter = new StopFilter(Lucene.MATCH_VERSION,
                result,
                stopSet);
        // Needed to get rid of Question mark placeholders for stopwords
        stopFilter.setEnablePositionIncrements(false);

 And now the translation of stopwords to "?" is not done and it works
as Iexpected, i.e:
"Biology of fresh, brackish and saline water as it contributes to tropical
delta formation" is translated to:
Title:"BIOLOGY FRESH BRACKISH SALINE WATER CONTRIBUTES TROPICAL DELTA
FORMATION"

The problem with the stopword placeholder "?" query is that the search will
not find any results while the query without "?" gives the correct results

Thanks again, I was struggling with this issue the last 2 days.

Jean-Claude Dauphin




On Mon, Dec 9, 2013 at 11:02 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> The analyzer is generating holes for the stop words - the position of the
> subsequent term is incremented an extra time for each stop word so that
> their positions are maintained.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jean-Claude Dauphin
> Sent: Monday, December 09, 2013 4:15 PM
> To: java-user@lucene.apache.org
> Subject: Why PhraseQuery translate stopwords to "?"
>
>
> Hi,
>
> My application uses an analyzer with a StopWordFilter. PhraseQuery
> translates queries with stopwords by replacing stopwords to "?" characters.
> For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
> contribute" is replaced by "? contribute" . Sequence of terms are indexed
> without stopwords. Query Searching works when the stopword starts the
> phrase but no results when the "?"  is not at the beginning.
>
> Searching for phrases without stopwords works well.
>
> Any explanation/FAQ/user-list-message that explains why PhraseQuery
> translate stopwords to "?" would be appreciated.
>
> Thank you in advance
>
> Jean-Claude Dauphin
>
> --
> Jean-Claude Dauphin
>
> jc.dauphin@gmail.com
> jc.dauphin@afus.unesco.org
>
> http://kenai.com/projects/j-isis/
> http://www.unesco.org/isis/
> http://www.unesco.org/idams/
> http://www.greenstone.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message