lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From renanmach <>
Subject Document frequency with multiple fields
Date Wed, 18 Nov 2015 13:56:59 GMT
Hello everyone,

I am indexing a collection of XML files. I select a few tags and each
selected tag of a XML file is indexed in a different field of a document.

I need to get the document frequency (the number of documents that have the
term) of each term. The problem is that I am getting a TermVector for each
field. If I sum the document frequency of each term in each field, the
documents that have the same term in different fields (tags) will be counted
more than once.

Is there any (efficient) way to get the document frequency without counting
one document more than once? 

I can't make another field while indexing with the content of every tag I
want to index because I use a different set of filters for each tag.

Thanks in advance.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message