lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Diego Ceccarelli (BLOOMBERG/ LONDON)" <dceccarel...@bloomberg.net>
Subject Re:Multi-IDF for a single term possible?
Date Tue, 03 Dec 2019 13:00:00 GMT
Hi Ravi, 
Can you give more details on how you store an entity into lucene? what is a doc type? 
what fields do you have? 

Cheers

From: java-user@lucene.apache.org At: 12/03/19 12:50:40To:  java-user@lucene.apache.org
Subject: Multi-IDF for a single term possible?

Hello,

We are using TF-IDF for scoring (Yet to migrate to BM25). Different
entities (DOC_TYPES) are crunched & stored together in a single index.

When it comes to IDF, I find that there is a single value computed across
documents & stored as part of TermStats, whereas our documents are not
homogeneous. So, a single IDF value doesn't work for us

We would like to compute IDF for each <Term/DOC_TYPE> pair, store it &
later use the paired-IDF values during query time. Is something like this
possible via Codecs or other mechanisms?

Any help is much appreciated

--
Ravi


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message