lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Re: Why Two Levels of Indirection in BytesRefHash class ?
Date Mon, 09 May 2016 08:05:09 GMT
E.g. see FreqProxTermsWriterPerField.FreqProxPostingsArray, which stores 5
parallel arrays indexed by that counter (called "term id" in the code,
sometimes) to hold meta-data about each term until we can write it to the
index.

Mike McCandless

http://blog.mikemccandless.com

On Sun, May 8, 2016 at 10:20 PM, shanghaihyj <shanghaihyj@163.com> wrote:

> I see.
> Yes, if a logical mapping of "byte[] ---> (offset and*arbitrary data)" is
> required, this indirection is necessary.
>
> Thanks.
> Yijian Huang
>
>
> At 2016-05-08 23:06:14,"Adrien Grand" <jpountz@gmail.com> wrote:
> >That would work if you are only interested in using BytesRefHash as a hash
> >set for byte[]. However these incremental ids are useful if you want to
> >associate data with each byte[]: you can create parallel arrays and use
> the
> >ids returned by the BytesRefHash as indices in these arrays.
> >
> >Le dim. 8 mai 2016 à 14:45, shanghaihyj <shanghaihyj@163.com> a écrit :
> >
> >> I'm studying the BytesRefHash class, a mapping from bytes to a generated
> >> ID for the bytes.
> >>
> >> In the BytesRefHash class, there are two levels of reference:
> >> (1) ids[bytes' hash code] ---> count, where count is the
> self-incremental
> >> size of the this hashmap.
> >> (2) bytesStart[count] ---> offset in the ByteBlockPool, where the
> original
> >> bytes are stored.
> >>
> >>
> >> My question is, can the above two references be collapsed into one, as
> >> follows ?
> >> ids[bytes' hash code] ---> offset in the ByteBlockPool.
> >>
> >>
> >> I've searched the code, and cannot grab an idea what's the benefit to
> have
> >> another indirection via bytesStart.
> >>
> >>
> >> p.s. Regarding such questions about Lucene source code, should I ask in
> >> dev@lucene.apache.org instead ? These questions may be too easy and
> thus
> >> bothering to the developers...
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message