lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Compressing docValues with variable length bytes[] by block of 16k ?
Date Sun, 09 Aug 2015 12:30:18 GMT
That makes no sense at all, it would make it slow as shit.

I am tired of repeating this:
Don't use BINARY docvalues
Don't use BINARY docvalues
Don't use BINARY docvalues

Use types like SORTED/SORTED_SET which will compress the term
dictionary and make use of ordinals in your application instead.



On Sat, Aug 8, 2015 at 10:19 AM, Olivier Binda <olivier.binda@wanadoo.fr> wrote:
> Greetings
>
> are there any plans to implement compression of the variable length bites[]
> binary doc Values,
> say in blocks of 16k like for stored values ?
>
> my .cfs file goes from 2MB to like 400k when I zip it
>
> Best regards,
> Olivier
>
>
>
> On 08/08/2015 02:32 PM, jamie wrote:
>>
>> Greetings
>>
>> Our app primarily uses Lucene for its intended purpose i.e. to search
>> across large amounts of unstructured text. However, recently our requirement
>> expanded to perform look-ups on specific documents in the index based on
>> associated custom defined unique keys. For our purposes, a unique key is the
>> string representation of a 128 bit murmur hash, stored in a Lucene field
>> named uid.  We are currently using the TermsFilter to lookup Documents in
>> the Lucene index as follows:
>>
>> List<Term> terms = new LinkedList<>();
>>             for (String id : ids) {
>>                 terms.add(new Term("uid", id));
>> }
>> TermsFilter idFilter = new TermsFilter(terms);
>> ... search logic...
>>
>> At any time we may need to lookup say a couple of thousand documents. Our
>> problem is one of performance. On very large indexes with 30 million records
>> or more, the lookup can be excruciatingly slow. At this stage, its not
>> practical for us to move the data over to fit for purpose database, nor
>> change the uid field to a numeric type. I fully appreciate the fact that
>> Lucene is not designed to be a database, however, is there anything we can
>> do to improve the performance of these look-ups?
>>
>> Much appreciate
>>
>> Jamie
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message