lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: WordDelimiterFilterFactory - tokenizer question
Date Sun, 05 Apr 2015 13:23:10 GMT
You have to tell the filter what types of tokens to generate - words,
numbers. You told it to generate... nothing. You did tell it to preserve
the original, unfiltered token though, which is fine.

-- Jack Krupansky

On Sun, Apr 5, 2015 at 3:39 AM, Mike L. <javaone123@yahoo.com.invalid>
wrote:

> Solr User Group,
>     I have a non-multivalied field with contains stored values similar to
> this:
>
> US100AUS100BUS100CUS100-DUS100BBA
> My assumption is - If I tokenized with the below fieldType definition,
> specifically the WDF -splitOnNumbers and the LowerCaseFilterFactory would
> have have provided me solr matches on the following query words:
> ?q=US 100?q=US100
> across on field values. In other words, all US100A, US100B, US100C,
> US100-D would have matched and scored against my qf weights. However - I'm
> not seeing that sort of behavior and have tried various combinations and
> starting to question my assumptions on the tokenizer.
>
> Ideally - I would like to return all values (US100A, US100B, US100C,
> US100-D) when for example, q=US100A is searched on this field.
>
> I know I should probably provide the debugQuery results, but was hoping
> this was a quick hit for somebody and also I'm reindexing.
> WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping
> to get some clarification or if something sticks out here.
>
> Below is the field type definition being used:
>  <fieldType name="field_tokenized" class="solr.TextField" omitNorms="true">
>        <analyzer type="index">
>         <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.TrimFilterFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> splitOnNumerics="1" preserveOriginal="1" generateWordParts="0"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/>
>        </analyzer>
>
>       <analyzer type="query">
>         <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.TrimFilterFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> splitOnNumerics="1"  generateWordParts="0" generateNumberParts="0"
> catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>      </analyzer>
>     </fieldType>
>
>
> Thanks in advance.
> Mike
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message