lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter S <pete...@hotmail.com>
Subject RE: Non-leading wildcard search
Date Mon, 04 Jan 2010 23:47:48 GMT

FYI:

 

I have found the root of this behaviour. It has to do with a test patch I've been working
on for working 'round pre SOLR-219 (case insensitive wildcard searching).

With the test patch switched out, it works as expected. Although the case insensitive wildcard
search reverts to pre-SOLR-219 behaviour.

 

I believe I can work 'round this by using a copyField that holds the lower-case text for wildcarding.

 

Many thanks, Yonik for your help.

 

Peter

 


 
> From: peter4u@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: Non-leading wildcard search
> Date: Mon, 4 Jan 2010 23:29:04 +0000
> 
> 
> Hi Yonik,
> 
> 
> 
> Thanks for your quick reply.
> 
> No, the queries themselves aren't in quotes.
> 
> 
> 
> Since I sent the initial email, I have managed to get non-leading wildcard queries to
work with this, but by unexpected means (for me at least :-).
> 
> 
> 
> If I add a LowerCaseFilterFactory to the fieldType, queries like s* (or S*) work as expected.
> 
> 
> 
> So the fieldType schema element now looks like:
> 
> <fieldType name="text_verbatim" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
> </analyzer>
> </fieldType>
> 
> 
> 
> I wasn't expecting this, as I would have thought this would change only the case behaviour,
not the wildcard behaviour (or at least not just the non-leading wildcard behaviour). Perhaps
I'm just not understanding how the terms (term in this case as not tokenized) is indexed and
subsequently matched.
> 
> 
> 
> What I've noticed is that with the LowerCaseFilterFactory in place, document queries
return results with case intact, but facet queries show the results in lower-case
> 
> (e.g. document->appname=Something facet.field.appname=something). (I kind of expected
the document->appname field to be lower case as well)
> 
> 
> 
> Does this sound like correct behaviour to you?
> 
> If it's correct, that's ok, I'll manage to work 'round it (maybe there's a way to map
the facet field back to the document field?), but if it sounds wrong, perhaps it warrants
further investigation.
> 
> 
> 
> Many thanks,
> 
> Peter
> 
> 
> 
> 
> 
> > Date: Mon, 4 Jan 2010 17:42:30 -0500
> > Subject: Re: Non-leading wildcard search
> > From: yonik@lucidimagination.com
> > To: solr-user@lucene.apache.org
> > 
> > On Mon, Jan 4, 2010 at 5:38 PM, Peter S <peter4u@hotmail.com> wrote:
> > > When I query: "Something" or "Something Else" or "*thing" or "*omething*",
I get back the expected results.
> > > If, however, I query: "Some*" or "S*" or "s*" etc, I get no results (although
this type of non-leading wildcard works fine with other fieldType schema elements that don't
use KeywordTokenizer).
> > 
> > Is your query string actually in quotes? Wildcards aren't currently
> > supported in quotes.
> > So text_verbatim:Some* should work.
> > 
> > -Yonik
> > http://www.lucidimagination.com
> 
> _________________________________________________________________
> View your other email accounts from your Hotmail inbox. Add them now.
> http://clk.atdmt.com/UKM/go/186394592/direct/01/
 		 	   		  
_________________________________________________________________
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message