lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianxiong Dong <jdongca2...@gmail.com>
Subject Re: extract multi-features for one solr feature extractor in solr learning to rank
Date Wed, 19 Apr 2017 04:56:36 GMT
Hi, Michael,
     Thank for very valuable feedbacks.

> You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class.
I used this idea to extract some features in this paper
(https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/letor3.pdf)
e.g.
Table 2 (1-15) features are just <query, doc> term features in various forms.

{
    "store" : "MyFeatureStore",
    "name" : "term_count_1",
    "class" : "com.apache.solr.ltr.feature.TermCountFeature",
    "params" : {
       "field" : "a_text",
       "terms" : "${user_terms}",
       "method"  : "1"
    }
  },

{
    "store" : "MyFeatureStore",
    "name" : "term_count_2",
    "class" : "com.apache.solr.ltr.feature.TermCountFeature",
    "params" : {
       "field" : "a_text",
       "terms" : "${user_terms}",
       "method"  : "2"
    }
  },

where method id corresponds to features on Table 2 (1-15).  Although
those features share the same class,  the differences are minor.  In
product deployment, this overhead may not be an issue. After feature
selection, probably only a small number of features are useful.

Another use case:
use convolution neural network or LSTM to extract embedded feature
vector for  both query and document, where dimension of the embedded
feature vectors should be 50-100. Then we feed those features into
learning-to-rank models.

> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
I will do some performance benchmark for some user cases to justify
whether supporting new multi-features for one feature class is worthy.
If yes, I will share the results and create pull request.

Thanks

Jianxiong

On 4/18/17, Michael Nilsson <mnilsson2323@gmail.com> wrote:
> Hi Jianxiong,
>
> What you say is true.  If you want 100 different feature values extracted,
> you need to specify 100 different features in the
> features.json config so that there is a direct mapping of features in and
> features out.  However, you more than likely need
> to only implement 1 feature class that you will use for those 100 feature
> values.  You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class.  In some cases you might be able to
> just have 1 feature output 1 value that changes per document, if you can
> collapse those features together.  This 2nd option
> may or may not work for you depending on your data, what you are trying to
> bucket, and what algorithm you are trying to
> use because not all algorithms can easily handle this case.  To illustrate:
>
>
> *A) Multiple binary features using the same 1 class*
> {
>     "name" : "isProductCheap",
>     "class" : "org.apache.solr.ltr.feature.SolrFeature",
>     "params" : {
>       "fq": [ "price:[0 TO 100]" ]
>     }
> },{
>     "name" : "isProductExpensive",
>     "class" : "org.apache.solr.ltr.feature.SolrFeature",
>     "params" : {
>       "fq": [ "price:[101 TO 1000]" ]
>     }
> },{
>     "name" : "isProductCrazyExpensive",
>     "class" : "org.apache.solr.ltr.feature.SolrFeature",
>     "params" : {
>       "fq": [ "price:[1001 TO *]" ]
>     }
> }
>
>
> *B) 1 feature that outputs different values (some algorithms don't handle
> discrete features well)*
> {
>     "name" : "productPricePoint",
>     "class" : "org.apache.solr.ltr.feature.MyPricePointFeature",
>     "params" : {
>
>       // Either hard code price map in MyPricePointFeature.java, or
>       // pass it in through params for flexible customization,
>       // and return different values for cheap, expensive, and
> crazyExpensive
>
>     }
> }
>
> The 2 options above satisfy most use cases, which is what we were
> targeting.
> In my specific use case, I opted for option A,
> and wrote a simple script that generates the features.json so I wouldn't
> have to write 100 similar features by hand.  You
> also mentioned that you want to extract features sparsely.  You can change
> the configuration of the Feature Transformer
> <http://lucene.apache.org/solr/6_5_0/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html>
>
> to return features that actually triggered in a sparse format
> <https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank#LearningToRank-Advancedoptions>.
> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
> be more than welcome!
>
> -Michael
>
>
>
> On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong <jdongca2003@gmail.com>
> wrote:
>
>> Hi,
>>     I found that solr learning-to-rank (LTR) supports only ONE feature
>> for a given feature extractor.
>>
>> See interface:
>>
>> https://github.com/apache/lucene-solr/blob/master/solr/
>> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
>>
>> Line (281, 282) (in FeatureScorer)
>> @Override
>>       public abstract float score() throws IOException;
>>
>> I have a user case: given a <query, doc>, I like to extract multiple
>> features (e.g.  100 features.  In the current framework,  I have to
>> define 100 features in feature.json. Also more cost for scored doc
>> iterations).
>>
>> I would like to have an interface:
>>
>> public abstract Map<String, Float> score() throws IOException;
>>
>> It helps support sparse vector feature.
>>
>> Can anybody provide an insight?
>>
>> Thanks
>>
>> Jianxiong
>>
>

Mime
View raw message