lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: getting number of terms in a document/field
Date Fri, 06 Feb 2015 09:07:03 GMT
How will you know how large to allocate that array?  The within-doc
term freq can in general be arbitrarily large...

Lucene does not directly store the total number of terms in a
document, but it does store it approximately in the doc's norm value.
Maybe you can use that?  Alternatively, you can store this statistic
yourself, e.g as a doc value.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 5, 2015 at 7:24 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid> wrote:
> Hello Lucene Users,
>
> I am traversing all documents that contains a given term with following code :
>
> Term term = new Term(field, word);
> Bits bits = MultiFields.getLiveDocs(reader);
> DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, term.bytes());
>
> while (docsEnum.nextDoc() != DocsEnum.NO_MORE_DOCS) {
>
> array[docsEnum.freq()]++;
>
> // how to retrieve term count for this document?
>    xxxxx(docsEnum.docID(), field);
>
>
> }
>
> How can I get field term count values for these documents using Lucene 4.10.3?
>
> Is above code OK for traversing posting list of term?
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message