lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Wang" <AW...@sharedvision.com>
Subject RE: Lucene sorting case-sensitive by default?
Date Mon, 14 Jan 2008 16:48:18 GMT
Thanks a lot Erik for the great tip! I do need to display all the fields
and allow the users to sort by each field as they wish. My index is
currently about 200 mb.

Your suggestion about storing (but not index) the cased version, and
indexing (but not store) the lower-case version is an excellent solution
for me. 

Is it possible to do it in the same field or do I have to do it in 2
separate fields? If I do it in one field, what are the Lucene
class/methods I need to overwrite?

Thanks again for your help!

Alex
 

This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation.


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, January 14, 2008 11:24 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene sorting case-sensitive by default?

Several things:

1> do you need to display all the fields? Would just storing them
lower-case work? The only time I've needed to store fields case-
sensitive is when I'm showing them to the user. If the user is just
searching on them, I can store them any way I want and she'll never
know.

2> You might very well be surprised at how little extra it takes to
index (but not store) the lower-case version. How big is your index
anyway? And be warned that the size increase is not linear, so
just comparing the index sizes for, say, 10 document is misleading.
If your index is 10M, there's no reason at all not to store twice. If
it's
10G........

3> You could store (but not index) the cased version. You could
index (but not store) the lower-case version. The total size of
your index is (I believe) about the same as indexing AND storing
the fields. That gives you a way to search caselessly and display
case-sensitively.

Best
Erick

On Jan 14, 2008 10:58 AM, Alex Wang <AWang@sharedvision.com> wrote:

> Thanks everyone for your replies! Guess I did not fully understand the
> meaning of "natural order" in the Lucene Java doc.
>
> To add another all-lower-case field for each sortable field in my
index
> is a little too much, since the app requires sorting on pretty much
all
> fields (over 100).
>
> Toke, you mentioned "Using a Collator works but does take a fair
amount
> of memory", can you please elaborate a little more on that. Thanks.
>
> Alex
>
> -----Original Message-----
> From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
> Sent: Monday, January 14, 2008 3:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene sorting case-sensitive by default?
>
> On Fri, 2008-01-11 at 11:40 -0500, Alex Wang wrote:
> > Looks like Lucene is separating upper case and lower case while
> sorting.
>
> As Tom points out, default sorting uses natural order. It's worth
noting
> that this implies that default sorting does not produce usable results
> as soon as you use non-ASCII characters. Using a Collator works but
does
> take a fair amount of memory.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message