lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <>
Subject [jira] Commented: (LUCENE-1435) CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
Date Tue, 11 Nov 2008 18:29:44 GMT


Steven Rowe commented on LUCENE-1435:

Hi Mike,

bq.Could we, alternatively, push this change into DocumentsWriter, such that on writing a
segment it uses a per-field Collator (FieldInfo would be extended to record this) to sort
the terms dict?

Are you suggesting to not store collation keys in the index?

bq. I haven't fully thought through the tradeoffs... but it seems like this'd be simpler to
use? Ie rather than putting a CollationKeyFilter in your analyzer chain, and then doing the
reverse of this for all searches at search time, you simply set the Collator on the fields
(at indexing & searching time, since I agree we should for now not try to serialize into
the index which field has which Collator)?

The query-time process in this patch is not the reverse - it is exactly the same.  The String-encoded
collation keys stored in the index are compared directly with those from query terms.  Neither
the String-encoding nor the CollationKey needs to be reversed.

bq. I guess there is a performance cost to using the Collator to do live binary search (during
searching) and sorting (during indexing) vs doing unicode String comparisions but in practice
at search time this is probably a tiny part of the net cost of searching?

In the current code base, for range searching on a collated field, every single term has to
be collated with the search term.  This patch allows skipTo to function when using collation,
potentially providing a significant speedup.

> CollationKeyFilter: convert tokens into CollationKeys encoded using IndexableBinaryStringTools
> ----------------------------------------------------------------------------------------------
>                 Key: LUCENE-1435
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1435.patch, LUCENE-1435.patch
> Converts each token into its CollationKey using the provided collator, and then encodes
the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term.
> This will allow for efficient range searches and Sorts over fields that need collation
for proper ordering.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message