lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar R. Aiyengar" <>
Subject Re: Indexing Weighted Tags per Document
Date Tue, 28 Oct 2014 08:31:45 GMT
There are a few approaches possible here, we had a similar use case and
went for the second one below. I primarily deal with Solr, so I don't know
of Lucene-only examples, but hopefully you can dig this up..

(1) You can attach payloads to each occurrence of the tag, and modify the
scoring to use the payload..

(2) Use term frequency as a proxy. You could scale the probability by a
factor and introduce the term as many times as the scaled value
(essentially making it the term frequency). Scoring will know account for
this. Advantage is that you also can achieve score normalisation with
keywords and amongst tags, and you can also filter results by probability.

(3) There potentially is also a solution using child documents and block
join, but I may be mistaken, haven't explored this a lot..
 On 27 Oct 2014 16:10, "Ralf Bierig" <> wrote:

> I want to index documents together with a list of tags (usually between
> 10-30) that represent meta information about this document. Normally, i
> would create an extra field "tag" store every tag, by its name, inside that
> field and create my 10-30 fields that and adding it to the document before
> adding the document to the index and writing the index.
> However, I have the following extra requirements:
> a) I need to have a weight in the range of [0,1] being associated with the
> tag that represents the probability of this tag being true.
> b) These tags must be associated with the document and not with the terms
> of the document.
> c) I must be able to associate many tags to a document instance.
> d) I must be able to use the weight in the weighting process of the search
> engine.
> e) The weight must be for the document instance, as the weight represents
> the probability for that tag for that particular document. E.g.
> fieldname: tag
> fieldvalue: tree
> fieldweight: 0.8
> meaning that this particular document is with a probability of 0.8 about
> trees.
> What is the best way to do that?
> Can somebody point me to an example or something quite similar that
> captures such a problem?
> Best,
> Ralf
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message