lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erdan Genc <erdan.g...@googlemail.com>
Subject Re: Payload TFIDF Similarity in Lucene 7.1.0
Date Tue, 13 Mar 2018 13:36:16 GMT
@Erik: I didn't know that, how can I figure out which query types support
payload scoring? The class I described is wrapped into an elasticsearch
plugin so I don't have full control over this. Currently I'm using the
SpanTermQuery, maybe another available query type will do, so I don't need
to implement a custom query parser as well. Thank you!

@Michael: This was my first thought as well but I couldn't find any
resources when I first searched for it. I just discovered LUCENE-7854
<https://issues.apache.org/jira/browse/LUCENE-7854>, the
DelimitedTermFrequencyTokenFilter, but it can't handle floating values
right? Thanks!

2018-03-13 12:14 GMT+01:00 Michael Sokolov <msokolov@gmail.com>:

> Also, if you are no longer using the term frequency at all, you might
> consider wiring your score (the one you are currently wiring into payloads)
> in there, in place of the term frequency.
>
> On Mar 13, 2018 6:57 AM, "Erik Hatcher" <erik.hatcher@gmail.com> wrote:
>
> > Payloads are only scored from certain query types.   What query are you
> > executing?
> >
> > > On Mar 13, 2018, at 04:58, Grdan Eenc <erdan.genc@googlemail.com>
> wrote:
> > >
> > > Hej there,
> > >
> > > I want to extend the TFIDF Similarity class such that the term
> frequency
> > is
> > > neglected and the value in the payload used instead. Therefore I
> > basically
> > > do this:
> > >
> > >    @Override
> > >    public float tf(float freq) {
> > >        return 1f;
> > >    }
> > >
> > >    public float scorePayload(int doc, int start, int end, BytesRef
> > > payload) {
> > >        if (payload != null) {
> > >            return PayloadHelper.decodeFloat(payload.bytes,
> > payload.offset);
> > >        } else {
> > >            return 1f;
> > >        }
> > >    }
> > >
> > > Complete class can be found here:
> > >
> > > https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> > >
> > > Unfortunately the scorePayload never gets called and I end up with the
> > > wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1
> but
> > > it should work in 7.1.0 or am I missing something?
> > >
> > > I implemented the same thing by directly extending the basic Similarity
> > > class and iterating through doc terms using the LeafReaderContext,
> based
> > on
> > > the code in this repo:
> > >
> > > https://github.com/sdauletau/elasticsearch-position-similarity
> > >
> > > This works but is horribly slow which is why I would prefer the first
> > idea.
> > >
> > > Any idea why scorePayload doesn't get called? I really couldn't find
> any
> > > resources on the net.
> > >
> > > Best, Erdan.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message