mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Solr-recommender for Mahout 0.9
Date Thu, 07 Nov 2013 17:44:23 GMT
One difference is that a “text” field has analyzers like Porter stemming applied. I had
to take these out of the schema.xml. I think TFIDF is also applied to the tems in “text”
but may not be to MV fields. I think TFIDF is good in the application. The idea is that if
everyone likes a movie, it isn’t much of a differentiator. Also changing to MV fields is
simply applying a different type to the field in the schema I think, so trivial to try out.

At this point the only test is an eyeball test so measuring differences is problematic. If
anyone has intuition fire away.

On Nov 7, 2013, at 9:23 AM, Dominik Hübner <> wrote:

Does anyone know what the difference is between keeping the ids in a space delimited string
and indexing a multivalued field of ids? I recently tried the latter since ... it felt right,
however I am not sure which of both has which advantages.

On 07 Nov 2013, at 18:18, Pat Ferrel <> wrote:

> I have dismax (no edismax) but am not using it yet, using the default query, which does
use ‘AND’. I had much the same though as I slept on it. Changing to OR is now working
much much better. So obvious it almost bit me, not good in this case...
> With only a trivially small amount of testing I’d say we have a new recommender on
the block.
> If anyone would like to help eyeball test the thing let me know off-list. There are a
few instructions I’ll need to give. And it can’t handle much load right now due to intentional
design limits.
> On Nov 7, 2013, at 6:11 AM, Dyer, James <> wrote:
> Pat,
> Can you give us the query it generates when you enter "vampire werewolf zombie", q/qt/defType
> My guess is you're using the default query parser with "q.op=AND" , or, you're using
dismax/edismax with a high "mm" (min-must-match) value.
> James Dyer
> Ingram Content Group
> (615) 213-4311
> -----Original Message-----
> From: Pat Ferrel [] 
> Sent: Wednesday, November 06, 2013 5:53 PM
> To: Schelter;
> Subject: Re: Solr-recommender for Mahout 0.9
> Done,
> BTW I have the thing running on a demo site but am getting very poor results that I think
are related to the Solr setup. I'd appreciate any ideas.
> The sample data has 27,000 items and something like 4000 users. The preference data is
fairly dense since the users are professional reviewers and the items videos.
> 1) The number of item-item similarities that are kept is 100. Is this a good starting
point? Ted, do you recall how many you used before?
> 2) The query is a simple text query made of space delimited video id strings. These are
the same ids as are stored in the item-item similarity docs that Solr indexes.
> Hit thumbs up on one video you you get several recommendations. Hit thumbs up on several
videos you get no recs. I'm either using the wrong query type or have it set up to be too
restrictive. As I read through the docs if someone has a suggestion or pointer I'd appreciate
> BTW the same sort of thing happens with Title search. Search for "vampire werewolf zombie"
you get no results, search for "zombie" you get several.
> On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <> wrote:
> Hi Pat,
> can you create issues for 1) and 2) ? Then I will try to get this into
> trunk asap.
> Best,
> Sebastian
> On 06.11.2013 19:13, Pat Ferrel wrote:
>> Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. The project
uses a modified RecommenderJob because it needs SequenceFile output and to get the location
of the preparePreferenceMatrix directory. If #1 and #2 are addressed I can remove the modified
Mahout code from the project and rely on the default implementations in Mahout 0.9. #3 is
a longer term issue related to the creation of a CrossRowSimilarityJob. 
>> I have dropped the modified code from the Solr-recommender project and have a modified
build of the current Mahout 0.9 snapshot. If the following changes are made to Mahout I can
test and release a Mahout 0.9 version of the Solr-recommender.
>> 1. Option to change RecommenderJob output format
>> Can someone add an option to output a SequenceFile. I modified the code to do the
following, note the SequenceFileOutputFormat.class as the last parameter but this should really
be determined with an option I think.
>>   Job aggregateAndRecommend = prepareJob(
>>           new Path(aggregateAndRecommendInput), outputPath, SequenceFileInputFormat.class,
>>           PartialMultiplyMapper.class, VarLongWritable.class, PrefAndSimilarityColumnWritable.class,
>>           AggregateAndRecommendReducer.class, VarLongWritable.class, RecommendedItemsWritable.class,
>>           SequenceFileOutputFormat.class);
>> 2. Visibility of preparePreferenceMatrix directory location
>> The Solr-recommender needs to find where the RecommenderJob is putting it's output.

>> Mahout 0.8 RecommenderJob code was:
>> public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix";
>> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix" inline in the
>> Path prepPath = getTempPath("preparePreferenceMatrix");
>> This change to Mahout 0.9 works:
>> public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix";
>> and
>> Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
>> You could also make this a getter method on the RecommenderJob Class instead of using
a public constant.
>> 3. Downsampling
>> The downsampling for maximum prefs per user has been moved from PreparePreferenceMatrixJob
to RowSimilarityJob. The XRecommenderJob uses matrix math instead of RSJ so it will no longer
support downsampling until there is a hypothetical CrossRowSimilairtyJob with downsampling
in it.

View raw message