lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Total of term frequencies
Date Tue, 18 Apr 2017 21:00:50 GMT
Ahh I see.

Term vectors are actually an inverted index for a single document, and they
also have the same postings API as the whole index (including
TermsEnum.totalTermFreq), but that method likely always returns -1 for term
vectors because it's not implemented?  Maybe Lucene's default codec should
be improved to store this; maybe open an issue?

In the meantime you could make your own codec that does store it.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Apr 18, 2017 at 9:12 AM, Manjula Wijewickrema <manjula53@gmail.com>
wrote:

> Hi Mike,
>
> Thanks for the answer. I think this returns the total number of
> occurrences of a specified term across all the documents in the corpus
> right?
>
> But I need the total number of terms (including multiple occurrences of
> the same term) in each document of the corpus. Any suggestion?
>
> Thanks!
>
> On Tue, Apr 18, 2017 at 2:53 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I think you want to use the TermsEnum.totalTermFreq method?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Apr 16, 2017 at 11:36 AM, Manjula Wijewickrema <
>> manjula53@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there any way to get the total count of terms in the Term Frequency
>>> Vector  (tvf)? I need to calculate the Normalized term frequency of each
>>> term in my tvf. I know how to obtain the length of the tvf, but it
>>> doesn't
>>> work since I need to count duplicate occurrences as well.
>>>
>>> Highly appreciate your kind response.
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message