lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: getting Lucene Docid from inside score()
Date Fri, 09 Mar 2018 20:04:59 GMT
You almost certainly do _not_ want this unless you are absolutely and
totally sure that your index does not change between the time you ask
for for the internal Lucene doc ID and the time you use it. No docs
may be added. No forceMerges are done. In fact, I'd go so far as to
say you shouldn't open any new searchers.

Here's the reason. Say I have a single segment index with internal doc
IDs 1, 2, 3, 4, 5. Say I delete docs 2 and 3. Now say I optimize, the
new segment has IDs 1, 2, 3. This a simplification to illustrate that
_whenever_ a segment gets rewritten for any reason, internal Lucene
doc IDs may change. All this goes on in the background and you have no
control over when.

Docs may even get renumbered relative to each other. Let's claim that
your SOlr ID is doc1 and its associated internal ID is 1. doc100 has
internal id 100. Segment merging could assign doc1 an id of 200 and
doc100 an id of 150. You just don't know.

Luke and the like are using a point-in-time snapshot of the index.

If you still want to get the internal ID, just specify the
pseudo-field [docid], as: "fl=id,[docid]"

Best,
Erick

On Fri, Mar 9, 2018 at 3:50 AM, dwaipayan.roy@gmail.com
<dwaipayan.roy@gmail.com> wrote:
> Thank you very much for your reply. Yes, I really want this (for
> implementing a retrieval function that extends the LMDir function).
> Precisely, I want the document numbering same as that we see in
> Lucene-Index-Viewers like Luke.
>
> I am not sure what you meant by "segment offset, held by a leaf reader"..
> Can you please explain a little, exactly when and what I need to do?
>
> Many thanks.
>
> On 2018/03/09 11:25:44, Michael Sokolov <msokolov@gmail.com> wrote:
>> Are you sure you want this? Lucene docids aren't generally useful outside a
>> narrow internal context. They can change over time for example.
>>
>> But if you do, it sounds like maybe what you are seeing is the per segment
>> docid. To get a global one you have to add the segment offset, held by a
>> leaf reader.
>>
>> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dwaipayan.roy@gmail.com> wrote:
>>
>> > While searching, I want to get the lucene assigned docid (that starts from
>> > 0 to the number of documents -1) of a document having a particular query
>> > term.
>> >
>> > From inside the score(), printing 'doc' or calling docId() is returning a
>> > docid which, I think, is the internal docid of a segment in which the
>> > document is indexed. However, I want to have the lucene assigned docid. How
>> > to do that?
>> >
>> > Dwaipayan..
>> >
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message