lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Searching for tokens does not return any results
Date Thu, 01 May 2014 14:19:12 GMT
Hi Yetkin,

You are on the right track by examining analysis page. How is your query analyzed using query
analyzer?

According to what you pasted q=CRD should return your example document.

Did you change something in schema.xml and forget to re-start solr and  re-index?

By the way simple letter tokenizer based lowercase tokenizer seems a better fit to your use-case.
With this you dont have deal with WDF's parameters.

https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-LowerCaseTokenizer

Ahmet





On Thursday, May 1, 2014 5:04 PM, Yetkin Ozkucur <Yetkin.Ozkucur@asg.com> wrote:
Hello everyone,

I am new to SOLR and this is my first post in this list. 
I have been working on this problem for a couple of days. I tried everything which I found
in google but it looks like I am missing something.

Here is my problem:
I have a field called: DBASE_LOCAT_NM_TEXT
It contains values like: CRD_PROD
The goal is to be able to search this field either by putting the exact string "CRD_PROD"
or part of it (tokenized by "_")  like "CRD" or "PROD"

Currently: 
This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD
But this does not: q=DBASE_LOCAT_NM_TEXT:CRD
I want to understand why the second query does not return any results

Here is how I configured the field:
<field name="DBASE_LOCAT_NM_TEXT" type="text_general" indexed="true" stored="true" required="false"
multiValued="false"/>

And Here is how I configured the field type :
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"  ignoreCase="true" words="stopwords.txt"/>
         <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

      </analyzer>
    </fieldType>

I am also using the analysis panel in the SOLR admin console. It shows this:
WT    CRD_PROD

WDF    CRD_PROD
    CRD
    PROD
    CRDPROD

SF    CRD_PROD
    CRD
    PROD
    CRDPROD

LCF    crd_prod
    crd
    prod
    crdprod

SKMF    crd_prod
    crd
    prod
    crdprod

RDTF    crd_prod
    crd
    prod
    crdprod


I am not sure if it is related or not but this index was created using a Java program using
Lucene interface. It used StandardAnalyzer for writing and the field was configured as tokenized,
indexed and stored.  Does this affect the SOLR configuration?
    
Can you please help me understand what I am missing and how I can debug it?

Thanks,
Yetkin 

Mime
View raw message