lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: TermsEnum.docFreq() returns 0
Date Tue, 14 May 2013 10:01:36 GMT
On Tue, May 14, 2013 at 3:03 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> We ran the checkIndex and a simple test case. It passes. Actually, I had
> assumed problem with lucene, whereas it was an issue with our custom codec.

Phew, thanks for bringing closure!

> I do not know how to confirm whether a new codec works correctly. Are there
> any tools/existing test-cases available for validation?

One really healthy way to test your new codec is to run all Lucene
tests against it (assume your codec is general, i.e. implements
everything).

You just need to 1) get your codec onto the test classpath and 2) pass
-Dtests.codec=YourCodecName to force tests to use it.

I'm not certain about step 1) ... it could be passing -lib to ant does
that?  But I'm not sure that will propagate to the classpath when ant
runs the tests ...

Mike McCandless

http://blog.mikemccandless.com



> --
> Ravi
>
>
>
> On Mon, May 13, 2013 at 9:19 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> That code looks correct.
>>
>> But can you tie it all together into a runnable test case?  Ie add in
>> the terms enum, calling docFreq and getting 0 when it should be 1.
>>
>> Also, if you run CheckIndex on the index produced by the code below,
>> how many terms/freqs/positions does it report?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, May 13, 2013 at 9:25 AM, Ravikumar Govindarajan
>> <ravikumar.govindarajan@gmail.com> wrote:
>> > Indexing code below. Looks very simple. Is this correct?
>> >
>> >            IndexWriterConfig conf = new
>> > IndexWriterConfig(Version.LUCENE_42, new
>> > StandardAnalyzer(Version.LUCENE_42));
>> >             conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
>> >             String indexPath = "<some-file-path>";
>> >             Directory dir=FSDirectory.open(new File(indexPath));
>> >             writer = new IndexWriter(dir,conf);
>> >             FieldType type = new FieldType();
>> >             type.setTokenized(true);
>> >             type.setIndexed(true);
>> >  type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
>> >         Field field = new Field("content", "one two two three", type);
>> >         luceneDoc.add(field);
>> >         writer.addDocument(luceneDoc);
>> >         writer.close();
>> >
>> > Reading docFreq and totalTermFreq through terms-enum returns 0 and -1,
>> for
>> > all terms
>> >
>> > --
>> > Ravi
>> >
>> >
>> > On Fri, May 10, 2013 at 10:19 PM, Michael McCandless <
>> > lucene@mikemccandless.com> wrote:
>> >
>> >> It should not be 0, as long as TermsEnum.next() does not return null
>> >> ... can you make a small test case?  Thanks.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Fri, May 10, 2013 at 8:26 AM, Ravikumar Govindarajan
>> >> <ravikumar.govindarajan@gmail.com> wrote:
>> >> > I have to add that the above code is wrong.
>> >> >
>> >> > It has to be
>> >> >
>> >> >  while((ref=tEnum.next())!=null)
>> >> >                     {
>> >> >                         ref = tEnum.term();
>> >> >                         tEnum.docFreq(); // Even here VAL=0
>> >> >                     }
>> >> >
>> >> > Apologies for the mistake, but the problem remains
>> >> >
>> >> >
>> >> >
>> >> > On Fri, May 10, 2013 at 5:54 PM, Ravikumar Govindarajan <
>> >> > ravikumar.govindarajan@gmail.com> wrote:
>> >> >
>> >> >> We have the following code
>> >> >>
>> >> >> SegmentInfos segments = new SegmentInfos();
>> >> >>  segments.read(luceneDir);
>> >> >>  for(SegmentInfoPerCommit sipc: segments)
>> >> >> {
>> >> >> String name = sipc.info.name;
>> >> >> SegmentReader reader = new SegmentReader(sipc, 1, new IOContext());
>> >> >> Terms terms = reader.terms("content");
>> >> >> TermsEnum tEnum = terms.iterator(null);
>> >> >>  tEnum.docFreq(); //VAL=0
>> >> >>  tEnum.totalTermFreq(); //VAL=-1
>> >> >> }
>> >> >>
>> >> >> The field "content" is indexed as DOCS_FREQ_AND_POSITION
>> >> >>
>> >> >> Why does the docFreq returned as 0 for all terms. Is this expected
or
>> >> am I
>> >> >> doing something wrong?
>> >> >>
>> >> >> --
>> >> >> Ravi
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message