lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: matching "starts with" only
Date Wed, 09 Oct 2013 19:45:43 GMT
On 10/9/2013 12:57 PM, adm1n wrote:
> My index contains documents which could be a single word or a short sentence
> which contains up to 4-5 words. I need to return documents, which "starts
> with" only from the searched pattern.
> in regex it would be [^my_query].
>
> for example, for a docs:
>
> black
> beautiful black cat
> cat
> cat is black
> black cat
>
> and for the query: "black"
>
> only "black" and "black cat" should be returned.
>
> The text field I'm using is as follows:
> <fieldType name="text_general_aa" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.NGramFilterFactory" minGramSize="4"
> maxGramSize="15" side="front"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.NGramFilterFactory" minGramSize="4"
> maxGramSize="15" side="front"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>      </fieldType>
> Solr version is 4.2
>
> thanks!

The presence of either the whitespace tokenizer or the NGram filter make 
this impossible, because they both break the indexed value into smaller 
pieces.  Together, they *really* break things up.  Matching is done on a 
per-term basis, and these two components in your analysis chain ensure 
that "black" will be a term for all of those input documents, whether it 
appears at the beginning, middle, or end.

If you set up a copyField to a new field whose fieldType uses the 
Keyword tokenizer (which treats the entire string as a single token) and 
the lowercase filter, you would be able use the regex support in Solr 
4.x and have this as your query string:

newfield:/^black/

Thanks,
Shawn


Mime
View raw message