lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ziqi Zhang <ziqi.zh...@sheffield.ac.uk>
Subject Re: get frequency of each term from a document
Date Sun, 20 Sep 2015 14:28:12 GMT
Thanks but TermsEnum has two methods that returns frequency-related 
info, both are corpus-level, not document specific:

-docFreq() Returns the number of documents containing the current term.
-totalTermFreq() Returns the total number of occurrences of this term 
across all documents (the sum of the freq() for each doc that has this 
term).

However I will need document specific frequency, i.e., freq of term A in 
Doc 1, 2, ... N

Thanks

On 20/09/2015 15:07, Uwe Schindler wrote:
> Hi,
>
> With the terms enum you can iterate over all terms. Each one returns its term frequency.
Of course, you need to enable term vectors during indexing. The pattern how to use terms enum
can be looked up at various places in Lucene source code. It's a very expert API but it is
the way to go here.
>
> Uwe
>
> Am 20. September 2015 15:35:40 MESZ, schrieb Ziqi Zhang <ziqi.zhang@sheffield.ac.uk>:
>> Hi
>>
>> Is it possible to get a list of terms within a document, and also TF of
>>
>> each of these terms *in that document only*? (Lucene 5.3)
>>
>> IndexReader has a method "Terms getTermVector(int docID, String
>> field)",
>> which gives me a "Terms" object, on which I can get a TermsEnum. But I
>> do not know where to go then.
>>
>> thanks
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de


-- 
Ziqi Zhang
Research Associate
Department of Computer Science
University of Sheffield


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message