lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidya Kanigiluppai Sivasubramanian <>
Subject How to get the term offsets for wild card queries?
Date Thu, 10 Nov 2011 11:31:33 GMT

I am using 2.9.2 version of lucene.
For my project I need to find the term positions in the document for it to be highlighted
in the display.
For normal queries it works fine. But with wild card queries, there is no offset info available.
This is my code:
            QueryParser qp = new QueryParser("contents", analyzer);
            Query query = qp.parse(searchTerm);
            TopDocs hits =, 8);

            QueryScorer queryScorer = new QueryScorer(query, reader, "contents");
            for (int i = 0; i < hits.scoreDocs.length; i++) {
                  SearchResult sr = new SearchResult();
                  int docId = hits.scoreDocs[i].doc;
                  Document doc = searcher.doc(docId);
            TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");
                  int termidx = tfvector.indexOf(searchTerm);
                  TermPositionVector tpvector = (TermPositionVector)tfvector;
                  int[] termposx = tpvector.getTermPositions(termidx);
                  TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);

                  for (int j=0;j<termposx.length;j++) {
                        System.out.println("termpos : "+termposx[j]);
                  for (int j=0;j<tvoffsetinfo.length;j++) {
                        int offsetStart = tvoffsetinfo[j].getStartOffset();
                        int offsetEnd = tvoffsetinfo[j].getEndOffset();
                        System.out.println("offsets : "+offsetStart+" "+offsetEnd);
                  System.out.println((i+1)+"." +doc.get("filepath"));

Is it because the search term contains partial word followed by *?
I saw various solutions in the forums but nothing worked.
Please help!

Thanks & Regards
Vidya K S


The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or
opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of
HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and
/ or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited.
If you have
received this email in error please delete it and notify the sender immediately. Before opening
any mail and
attachments please check them for viruses and defect.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message