lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <>
Subject Re: Total Freq for Bigrams, Trigrams, etc.
Date Tue, 02 Dec 2014 22:31:18 GMT
If you index the n-grams in their own field using ShingleFilter, you can 
get statistics using the same term api on that field, in which the terms 
*are* n-grams, and similarly for queries.


On 12/02/2014 03:38 PM, Peter Organisciak wrote:
> It is possible to get a total corpus frequency for bigram queries or
> higher? i.e. How many times does the query occur in the corpus.
> I'm looking to implement a count of occurrences per million terms. I know
> for a single term I can use  `TermsEnum.totalTermFreq()`, is there any
> comparable way to do so for a bigram or other simple query?
> Thank you,
> Peter

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message