lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Re: Why Two Levels of Indirection in BytesRefHash class ?
Date Mon, 09 May 2016 08:47:10 GMT
You could try to implement this refactoring, which would combine
linear storage of values (without the need to save the length of each
key explicitly) with their incremental addition order.

https://issues.apache.org/jira/browse/LUCENE-5854

The outcome may or may not be faster in practice (due to locality of
reference and JVM optimizations of array index ops), but it'd
certainly be worth investigating.

Dawid

On Mon, May 9, 2016 at 10:05 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> E.g. see FreqProxTermsWriterPerField.FreqProxPostingsArray, which stores 5
> parallel arrays indexed by that counter (called "term id" in the code,
> sometimes) to hold meta-data about each term until we can write it to the
> index.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, May 8, 2016 at 10:20 PM, shanghaihyj <shanghaihyj@163.com> wrote:
>
>> I see.
>> Yes, if a logical mapping of "byte[] ---> (offset and*arbitrary data)" is
>> required, this indirection is necessary.
>>
>> Thanks.
>> Yijian Huang
>>
>>
>> At 2016-05-08 23:06:14,"Adrien Grand" <jpountz@gmail.com> wrote:
>> >That would work if you are only interested in using BytesRefHash as a hash
>> >set for byte[]. However these incremental ids are useful if you want to
>> >associate data with each byte[]: you can create parallel arrays and use
>> the
>> >ids returned by the BytesRefHash as indices in these arrays.
>> >
>> >Le dim. 8 mai 2016 à 14:45, shanghaihyj <shanghaihyj@163.com> a écrit
:
>> >
>> >> I'm studying the BytesRefHash class, a mapping from bytes to a generated
>> >> ID for the bytes.
>> >>
>> >> In the BytesRefHash class, there are two levels of reference:
>> >> (1) ids[bytes' hash code] ---> count, where count is the
>> self-incremental
>> >> size of the this hashmap.
>> >> (2) bytesStart[count] ---> offset in the ByteBlockPool, where the
>> original
>> >> bytes are stored.
>> >>
>> >>
>> >> My question is, can the above two references be collapsed into one, as
>> >> follows ?
>> >> ids[bytes' hash code] ---> offset in the ByteBlockPool.
>> >>
>> >>
>> >> I've searched the code, and cannot grab an idea what's the benefit to
>> have
>> >> another indirection via bytesStart.
>> >>
>> >>
>> >> p.s. Regarding such questions about Lucene source code, should I ask in
>> >> dev@lucene.apache.org instead ? These questions may be too easy and
>> thus
>> >> bothering to the developers...
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message