mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Solr-recommender
Date Wed, 02 Oct 2013 16:19:56 GMT
Excellent. From Ellen's description the first Music use may be an implicit preference based
recommender using synthetic  data? I'm quickly discovering how flexible Solr use is in many
of these cases.

Here's another use you may have thought of:

Shopping cart recommenders, as goes the intuition, are best modeled as recommending from similar
item-sets. If you store all shopping carts as your training data (play lists, watch lists
etc.) then as a user adds things to their cart you query for the most similar past carts.
Combine the results intelligently and you'll have an item set recommender. Solr is built to
do this item-set similarity. We tried to do this for a ecom site with pure Mahout but the
similarity calc in real time stymied us. We knew we'd need Solr but couldn't devote the resources
to spin it up.

On the Con-side Solr has a lot of stuff you have to work around. It also does not have the
ideal similarity measure for many uses (cosine is ok but llr would probably be better). You
don't want stop word filtering, stemming, white space based tokenizing or n-grams. You would
like explicit weighting. A good thing about Solr is how well it integrates with virtually
any doc store independent of the indexing and query. A bit of an oval peg for a round hole.

It looks like the similarity code is replaceable if not pluggable. Much of the rest could
be trimmed away by config or adherence to conventions I suspect. In the demo site I'm working
on I've had to adopt some slightly hacky conventions that I'll describe some day. 

On Oct 1, 2013, at 10:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:


Pat,

Ellen and some folks in Britain have been working with some data I produced from synthetic
music fans.


On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
Hi Ellen,


On Oct 1, 2013, at 12:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:


As requested, 

Pat, meet Ellen.

Ellen, meet Pat.




On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pat.ferrel@gmail.com> wrote:
Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.

Things to note:
1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job. Currently
there is only cooccurrence for sparsification, which is far from optimal. This might take
the form of a cross RSJ with two DRMs as input. I can't commit to this but would commit to
adding it to the XRecommenderJob.
2) output to Solr needs a lot of options implemented and tested. The hand-run test should
be made into some junits. I'm slowly doing this.
3) the Solr query API is unimplemented unless someone else is working on that. I'm building
one in a demo site but it looks to me like a static recommender API is not going to be all
that useful and maybe a document describing how to do it with the Solr query interface would
be best, especially for a first step. The reasoning here is that it is so tempting to mix
in metadata to the recommendation query that a static API is not so obvious. For the demo
site the recommender API will be prototyped in a bunch of ways using models and controllers
in Rails. If I'm the one to do the a Java Solr-recommender query API it will be after experimenting
a bit.

Can someone introduce me to Ellen and Tim?

On Sep 28, 2013, at 10:59 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

The one large-ish feature that I think would find general use would be a high performance
classifier trainer.

Flor cleanup sort of thing it would be good to fully integrate the streaming k-means into
the normal clustering commands while revamping the command line API.

Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can make 0.9.

For recommendations, I think that the demo system that pat started with the elaborations by
Ellen an Tim would be very good to have.

I would be happy to collaborate with somebody on these but am not at all likely to have time
to actually do them end to end.

Sent from my iPhone

On Sep 28, 2013, at 12:40, Grant Ingersoll <gsingers@apache.org> wrote:

> Moving closer to 1.0, removing cruft, etc.  Do we have any more major features planned
for 1.0?  I think we said during 0.8 that we would try to follow pretty quickly w/ another
release.
>
> -Grant
>
> On Sep 28, 2013, at 12:33 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> Sounds right in principle but perhaps a bit soon.
>>
>> What would define the release?
>>
>> Sent from my iPhone
>>
>> On Sep 27, 2013, at 7:48, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>
>>> -Grant
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>






Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message