lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Nilsson <mnilsson2...@gmail.com>
Subject Re: extract multi-features for one solr feature extractor in solr learning to rank
Date Tue, 18 Apr 2017 17:12:40 GMT
Hi Jianxiong,

What you say is true.  If you want 100 different feature values extracted,
you need to specify 100 different features in the
features.json config so that there is a direct mapping of features in and
features out.  However, you more than likely need
to only implement 1 feature class that you will use for those 100 feature
values.  You can pass in different params in the
features.json config for each feature, even though they use the same
feature class.  In some cases you might be able to
just have 1 feature output 1 value that changes per document, if you can
collapse those features together.  This 2nd option
may or may not work for you depending on your data, what you are trying to
bucket, and what algorithm you are trying to
use because not all algorithms can easily handle this case.  To illustrate:


*A) Multiple binary features using the same 1 class*
{
    "name" : "isProductCheap",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "price:[0 TO 100]" ]
    }
},{
    "name" : "isProductExpensive",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "price:[101 TO 1000]" ]
    }
},{
    "name" : "isProductCrazyExpensive",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "price:[1001 TO *]" ]
    }
}


*B) 1 feature that outputs different values (some algorithms don't handle
discrete features well)*
{
    "name" : "productPricePoint",
    "class" : "org.apache.solr.ltr.feature.MyPricePointFeature",
    "params" : {

      // Either hard code price map in MyPricePointFeature.java, or
      // pass it in through params for flexible customization,
      // and return different values for cheap, expensive, and
crazyExpensive

    }
}

The 2 options above satisfy most use cases, which is what we were targeting.
In my specific use case, I opted for option A,
and wrote a simple script that generates the features.json so I wouldn't
have to write 100 similar features by hand.  You
also mentioned that you want to extract features sparsely.  You can change
the configuration of the Feature Transformer
<http://lucene.apache.org/solr/6_5_0/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html>

to return features that actually triggered in a sparse format
<https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank#LearningToRank-Advancedoptions>.
Your performance point about 100 features vs 1 feature is true,
and pull requests to improve the plugin's performance and usability would
be more than welcome!

-Michael



On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong <jdongca2003@gmail.com>
wrote:

> Hi,
>     I found that solr learning-to-rank (LTR) supports only ONE feature
> for a given feature extractor.
>
> See interface:
>
> https://github.com/apache/lucene-solr/blob/master/solr/
> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
>
> Line (281, 282) (in FeatureScorer)
> @Override
>       public abstract float score() throws IOException;
>
> I have a user case: given a <query, doc>, I like to extract multiple
> features (e.g.  100 features.  In the current framework,  I have to
> define 100 features in feature.json. Also more cost for scored doc
> iterations).
>
> I would like to have an interface:
>
> public abstract Map<String, Float> score() throws IOException;
>
> It helps support sparse vector feature.
>
> Can anybody provide an insight?
>
> Thanks
>
> Jianxiong
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message