lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Term Statistics for MultiTermQuery
Date Tue, 12 Mar 2013 17:00:09 GMT
here's another question involving MultiTermQuerys. My aim is to get a
frequency count for a MultiTermQuery while I don't need to execute the
query. The naive approach would be to create the Query, extract the
terms, and get each term's frequency, approximately as follows:

IndexSearcher searcher = ...;
PrefixQuery query = new PrefixQuery(new Term("field", "abc"));
Query rewritten = searcher.rewrite(query);
Set<Term> terms = rewritten.extractTerms();

And eventually read the term frequencies for each term. However, this
seems rather costly for a large number of terms and I am actually
interested in the total frequencies, so there would be no need for a
term-by-term analysis.
My use case is that I have an index containing part-of-speech tags in
the form <tag>:<token> and I may be searching for <tag> frequencies.
My alternative solution would be to create a dedicated index in which
the original tokens are completely replaced by the tags, so that I had
documents in the form "DET NN ..." and corresponding tokens. Would you
rather recommend this?


Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message