lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Searching for tokens does not return any results
Date Thu, 01 May 2014 14:31:12 GMT
Hi Yetkin, welcome!

I think StandardAnalyzer of Lucene is the problem you are facing.

Why don't you have another field using StandardAnalyzer and see how it tokenizes CRD_PROD
on Solr admin GUI?

I forgot in the detail but we can use Lucene's Analyzer in schema.xml something like this:

<fieldType ...>
    <analyzer class="solr.StandardAnalyzer"/>
</fieldType>

Koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/05/01 23:04), Yetkin Ozkucur wrote:
> Hello everyone,
>
> I am new to SOLR and this is my first post in this list.
> I have been working on this problem for a couple of days. I tried everything which I
found in google but it looks like I am missing something.
>
> Here is my problem:
> I have a field called: DBASE_LOCAT_NM_TEXT
> It contains values like: CRD_PROD
> The goal is to be able to search this field either by putting the exact string "CRD_PROD"
or part of it (tokenized by "_")  like "CRD" or "PROD"
>
> Currently:
> This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD
> But this does not: q=DBASE_LOCAT_NM_TEXT:CRD
> I want to understand why the second query does not return any results
>
> Here is how I configured the field:
> <field name="DBASE_LOCAT_NM_TEXT" type="text_general" indexed="true" stored="true"
required="false" multiValued="false"/>
>
> And Here is how I configured the field type :
>      <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
>        <analyzer type="index">
>        <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory"  ignoreCase="true" words="stopwords.txt"/>
>           <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>
>        </analyzer>
>      </fieldType>
>
> I am also using the analysis panel in the SOLR admin console. It shows this:
> WT	CRD_PROD
>
> WDF	CRD_PROD
> 	CRD
> 	PROD
> 	CRDPROD
>
> SF	CRD_PROD
> 	CRD
> 	PROD
> 	CRDPROD
>
> LCF	crd_prod
> 	crd
> 	prod
> 	crdprod
>
> SKMF	crd_prod
> 	crd
> 	prod
> 	crdprod
>
> RDTF	crd_prod
> 	crd
> 	prod
> 	crdprod
>
>
> I am not sure if it is related or not but this index was created using a Java program
using Lucene interface. It used StandardAnalyzer for writing and the field was configured
as tokenized, indexed and stored.  Does this affect the SOLR configuration?
> 	
> Can you please help me understand what I am missing and how I can debug it?
>
> Thanks,
> Yetkin
>




Mime
View raw message