lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Help in resolving the below retrieval issue
Date Tue, 10 Sep 2013 13:39:39 GMT
Try adding &debug=query to the url. What I think you'll find is that you're
running into
a common issue, the difference between query parsing and analysis.

when you submit anything with whitespace in it, the query parser will break
it up
_before_ it gets to the analysis part, you should see something in the debug
portion of the query like
field:rahul field:kumar and possibly even field:-

These are searched as separate tokens. By specifying KeywordTokenizer, at
index time you'll have exactly one token, rahul-kumar in the index which
will not
match any of the separated tokens

Try escaping the spaces with backslash. You could also try quoting the
input although
that has some phrase implications.

Do you really want this search to fail on just searching "rahul" though?
Perhaps
keywordTokenizer isn't best here, it depends upon your use-case...

Best,
Erick


On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
prathik.puthran87@gmail.com> wrote:

> Hi,
>
> I am facing the below issue where in Solr is not retrieving the indexed
> word for some cases.
>
> This happens whenever the indexed word has string " - " (quotes for
> clarity) as substring i.e word prefix followed by a space which is followed
> by '-' again followed by a space and followed by the rest of the word
> suffix.
> When I search with search query being the exact string Solr returns no
> results.
>
> Example:
> Indexed word --> "Rahul - kumar"  (quotes for clarity)
> If I search with the search query as below Solr gives no results
> Search query --> "Rahul - kumar"  (quotes for clarity)
>
> However the below search query returns the results
> Search query --> "Rahul kumar"
>
> Can you please let me know what I am doing wrong here and what should I do
> to ensure the first query i.e. "Rahul - kumar" returns the documents
> indexed using it.
>
> Below are the analyzers I am using:
> Index time analyzer components:
> 1) <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([^A-Za-z0-9 ])" replacement=""/>
>  2) <tokenizer class="solr.KeywordTokenizerFactory"/>
>  3) <filter class="solr.LowerCaseFilterFactory"/>
>  4) <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> preserveOriginal="1"/>
>  5) <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="50" side="front"/>
>  6) <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="50" side="back"/>
>
> Query time analyzer components:
>  1) <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([^A-Za-z0-9 ])" replacement=""/>
>  2) <tokenizer class="solr.KeywordTokenizerFactory"/>
>  3) <filter class="solr.LowerCaseFilterFactory"/>
>  4) <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> preserveOriginal="1"/>
>
>
> Can you please let me know how I can fix this?
>
> Thanks,
> Prathik
>
>

Mime
View raw message