lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hermida <>
Subject Re: how to do auto-suggest case-insensitive match and return original case field values
Date Tue, 08 Dec 2009 21:29:35 GMT


Thanks for the reply (see below)

hossman wrote:
> The type of approach you are describing (doing a prefix based query for 
> autosuggest) probably won't work very well unless your index is 100% 
> designed just for the autosuggest ... if it's an index about products, and 
> you're just using one of hte fields for autosuggest, you aren't going to 
> get good autosuggest results because the same word is going to appear in 
> multiple products.  what you need is an index of *words* that you want to 
> autosuggest, with fields indicating how important those words are that you 
> can use in a function query (this replaces the term freq that 
> TermComponent would use)
> the fact that your "test" field is multivalued and stores widly different 
> things in each doc is an example of what i mean.

I am using Solr to index biological annotations about proteins (which my
documents). There is no tokenization or special analysis of the annotation
text strings as they are not free text, each annotation is a single token. 
Also, for the purpose of my auto-suggest and searching there are actually no
different types of annotations, that's why they all go into the same
multivalued field for each protein document.  I want to use the auto-suggest
and search to help biologists (who know the annotation terminology) find all
the protein documents with the annotation they are thinking of, and to
suggest what is available as they type.  The thing is that in my field
letter case can be important define the meaning of an annotation, but the
biologist might not remember the exact case.  Therefore I want them to be
able to type in what ever case and the auto-suggest will pull up as they
type annotations with the correct case to assist them.

Let's just take the fundamental question, independent of any example:  is it
possible to do a case-insensitive prefix search using faceting (to get the
term suggestions) that also returns the originally mixed case terms of *all*
those terms listed in lowercase in the facet list?  The only other post I
saw in this forum on this topic a user seemed to think this was easily
doable, but I don't think they actually tried to do it because the faceted
search doesn't seem possible, you run into all these problems.  It just
isn't something Solr/Lucene can actually do the way it is organized.

hossman wrote:
> Have you considered the possibility of just indexing the lowercase value 
> concatenated with the regular case value using a special delimiter, and 
> ten returning to your TermComponent based solution?  index "PowerPoint" 
> as "powerpoint|PowerPoint" and just split on the "\" character when you 
> get hte data back from your prefix based term lookup.

I think this is a good workaround, will definitely try it!


View this message in context:
Sent from the Solr - User mailing list archive at

View raw message