lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ziqi Zhang <>
Subject offsets of a term in a document
Date Mon, 21 Sep 2015 15:08:32 GMT

Given a document in a lucene index, I would like to get a list of terms 
in that document and their offsets. I suppose starting with 
IndexReader.getTermVector can get me going with this. I have some code 
as below (Lucene 5.3) of which I have some questions:

IndexReader reader = ....
Terms termVector = reader.getTermVector(docId, "content");
//now iterate through the terms
TermsEnum ti = termVector.iterator();
BytesRef luceneTerm =;
         String tString =luceneTerm.utf8ToString();

         //each term can have >1 occurrence, so I need to get each 
         PostingsEnum postingsEnum=ti.postings(???, PostingsEnum.OFFSETS);
         int totalOccurrence=postingsEnum.freq();
         for(int i=0; i<totalOccurrence; i++) { //api says calling 
"nextPosition" more than "freq()" times is undefined, so...
                 postingsEnum.nextPosition();   //move cursor to next 
                 int start=postingsEnum.startOffset(); //get the startoffset
                 int end=postingsEnum.endOffset();    //get the endoffset

The first question is if the code makes sense.
The second question if where I should put in place of "???". The API 
says "pass a prior PostingsEnum for possible reuse", but I don't get how 
to create an instance of it.

Many thanks!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message