lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Case Insensitive Matching in Solr/Lucene
Date Tue, 25 Nov 2014 12:44:41 GMT
The index size will not increase as quickly as you might think, and is 
not an issue in most cases.  An alternative to two fields, though, is to 
index both upper- and lower-case tokens at the same position in a single 
field, and then to perform no case folding at query time.  There is no 
standard analysis component that does this, but see LUCENE-5620 for more 
discussion; the ticket describes a component that will get you there.

-Mike

On 11/25/14 6:52 AM, Apurv Verma wrote:
> Hii Ahmet,
>   Thanks for your reply. Creating two separate fields is a viable solution
> where one contains the original value and the other contains the lowercased
> value. But this leads to index bloat up. (~ 2x)
> I am looking for any other alternative solutions.
>
>
> --
> Regards,
> Apurv Verma
>
>
>
> On Tue, Nov 25, 2014 at 5:15 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
>> Hi Apurv,
>>
>> You can create an additional field for case sensitive search, and then you
>> can switch at query time. You will have two fields (text_ci and text_lower)
>> with different analysers populated with copyField.
>>
>> Ahmet
>>
>>
>> On Tuesday, November 25, 2014 1:39 PM, Apurv Verma <apurv@bloomreach.com>
>> wrote:
>> Hey all,
>> The standard solution to doing a case-insensitive match in lucene is to
>> use a Lowercase filter at index and query time. However this does not
>> preserve the content of the original document. For example if my inverted
>> index is.
>>
>> Term      Doc_1  Doc_2
>> -------------------------
>> Quick   |       |  X
>> The     |   X   |
>> brown   |   X   |  X
>> dog     |   X   |
>> dogs    |       |  X
>> fox     |   X   |
>> foxes   |       |  X
>> in      |       |  X
>> jumped  |   X   |
>> lazy    |   X   |  X
>> leap    |       |  X
>> over    |   X   |  X
>> quick   |   X   |
>> summer  |       |  X
>> the     |   X   |
>> ------------------------
>>
>> Is it possible to choose between case insensitive/ case sensitive match at
>> query time. The index is stored in memory in solr. My question is, if this
>> is stored as a hashmap with string key can I override the hashcode so that
>> "Quick" and "quick" return the same hash value?
>>
>> Has anyone attempted this before? Is my assumption about index right? What
>> would be the classes and code flow to look at?
>>
>> --
>> Regards,
>> Apurv
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message