lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Iterating TermsEnum for Long field produces zero values at the end
Date Mon, 17 Nov 2014 18:30:35 GMT
It is expected: those are the "prefix" terms, which come after all the
full-precision numeric terms.

But I'm not sure why you see 0s ... the bytes should be unique for
every term you get back from the TermsEnum.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan <b.coughlan2@gmail.com> wrote:
> Hi all,
>
> I'm using 4.10.2. I have a Long "id" field. Each document has one "id"
> value. I am creating a look-up between Lucene's internal document id and my
> "id" values by enumerating the inverted index:
>
>     private long[] cacheDocIds() throws IOException {
>         long[] ourIds = new long[reader.maxDoc()];
>
>         Bits liveDocs = MultiFields.getLiveDocs(reader);
>         Fields fields = MultiFields.getFields(reader);
>         Terms terms = fields.terms("id");
>
>         TermsEnum iterator = terms.iterator(null);
>         BytesRef bytesRef = null;
>         while ((bytesRef = iterator.next()) != null) {
>             DocsEnum docsEnum = iterator.docs(liveDocs, null,
> DocsEnum.FLAG_NONE);
>
>             int luceneId = docsEnum.nextDoc();
>             long ourId = NumericUtils.prefixCodedToLong(bytesRef);
>             System.out.println(luceneId + " " + ourId);
>             ourIds[luceneId] = ourId;
>         }
>
>         return ourIds;
>     }
>
> With 5 documents (1, 2, 3, 4, 5) I get this output from the above code:
>
> 0 1
> 1 2
> 2 3
> 3 4
> 4 5
> 0 0
> 0 0
> 0 0
>
> I don't understand why there are three zeroes at the end.
>
> - reader.maxDoc is 5 and no documents have been deleted.
> - I have tried this with a varying number of documents and there are always
> three zeroes at the end.
> - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the same
> behavior occurs.
>
> I can work around this with but I'm just curious if this behavior is
> expected?
>
> Regards,
> Barry

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message