lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Dimitrov Vasilev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-4490) TermPositions misses some terms in some cases
Date Thu, 18 Oct 2012 15:38:02 GMT
Ivan Dimitrov Vasilev created LUCENE-4490:
---------------------------------------------

             Summary: TermPositions misses some terms in some cases
                 Key: LUCENE-4490
                 URL: https://issues.apache.org/jira/browse/LUCENE-4490
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/search
    Affects Versions: 3.6.1, 3.4
            Reporter: Ivan Dimitrov Vasilev


I have the following code:

public static void main(String[] args) throws Exception {
        RAMDirectory dir = new RAMDirectory();
        IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
        org.apache.lucene.index.IndexWriter iw = new org.apache.lucene.index.IndexWriter(dir,
iwc);
        Document doc = new Document();
        doc.add(new Field("name", "a", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS));
        iw.addDocument(doc);

	iw.close();

        IndexReader ir = IndexReader.open(dir);
        Term t = new Term("name", "a");
        TermPositions tp = ir.termPositions();
        tp.seek(t);
        boolean flag = false;
        while (tp.next()) {
            System.out.println(tp.doc());
            flag = true;
        }
        if (!flag) { System.out.println("Missing term"); }

	System.out.println(ir.document(0));

        tp.close();
        ir.close();
}

The output is:
Missing term
Document<stored,indexed,tokenized,omitNorms<name:a>>

So the document contains term <name:a> but the TermPositions can not find it.

When replacing the line:
doc.add(new Field("name", "a", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS));

with the line:
doc.add(new Field("name", "b", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS));

and line:
Term t = new Term("name", "a");

with the line:
Term t = new Term("name", "b");

Everything is OK. The output is:
0
Document<stored,indexed,tokenized,omitNorms<name:b>>.

I did some debugging on it and found that when executing tp.seek(t); when I reached the line
68 of constructor of SegmentTermEnum:

size = input.readLong();                    // read the size

In the case of term <name:b> - the size was assigned 1, while in the case term <name:a>
it was assigned 0.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message