lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grdan Eenc <erdan.g...@googlemail.com>
Subject Payload TFIDF Similarity in Lucene 7.1.0
Date Tue, 13 Mar 2018 08:58:11 GMT
Hej there,

I want to extend the TFIDF Similarity class such that the term frequency is
neglected and the value in the payload used instead. Therefore I basically
do this:

    @Override
    public float tf(float freq) {
        return 1f;
    }

    public float scorePayload(int doc, int start, int end, BytesRef
payload) {
        if (payload != null) {
            return PayloadHelper.decodeFloat(payload.bytes, payload.offset);
        } else {
            return 1f;
        }
    }

Complete class can be found here:

https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08

Unfortunately the scorePayload never gets called and I end up with the
wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1 but
it should work in 7.1.0 or am I missing something?

I implemented the same thing by directly extending the basic Similarity
class and iterating through doc terms using the LeafReaderContext, based on
the code in this repo:

https://github.com/sdauletau/elasticsearch-position-similarity

This works but is horribly slow which is why I would prefer the first idea.

Any idea why scorePayload doesn't get called? I really couldn't find any
resources on the net.

Best, Erdan.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message