mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Solr-recommender for Mahout 0.9
Date Thu, 07 Nov 2013 19:28:16 GMT
Yes you are correct but my integration framework treats non-text fields as scalars so it is
easier to neuter text than implement fulltext searching on strings. I would do what you suggest
if were using raw Solr. My understanding was that string also does not get tfidf applied,
which is not what I had intended.

On Nov 7, 2013, at 9:57 AM, Andrew Psaltis <> wrote:

Perhaps I am missing something here, but why not use a String field if you
do not need any of the analysis? Seems like from your previous email "The
query is a simple text query made of space delimited video id strings" - -
that you basically have a keyword style query which would seem to fit
better with a String field and not a neutered Text field.


On 11/7/13 10:44 AM, "Pat Ferrel" <> wrote:

> One difference is that a ³text² field has analyzers like Porter stemming
> applied. I had to take these out of the schema.xml. I think TFIDF is also
> applied to the tems in ³text² but may not be to MV fields. I think TFIDF
> is good in the application. The idea is that if everyone likes a movie,
> it isn¹t much of a differentiator. Also changing to MV fields is simply
> applying a different type to the field in the schema I think, so trivial
> to try out.
> At this point the only test is an eyeball test so measuring differences
> is problematic. If anyone has intuition fire away.
> On Nov 7, 2013, at 9:23 AM, Dominik Hübner <> wrote:
> Does anyone know what the difference is between keeping the ids in a
> space delimited string and indexing a multivalued field of ids? I
> recently tried the latter since ... it felt right, however I am not sure
> which of both has which advantages.
> On 07 Nov 2013, at 18:18, Pat Ferrel <> wrote:
>> I have dismax (no edismax) but am not using it yet, using the default
>> query, which does use ŒAND¹. I had much the same though as I slept on
>> it. Changing to OR is now working much much better. So obvious it almost
>> bit me, not good in this case...
>> With only a trivially small amount of testing I¹d say we have a new
>> recommender on the block.
>> If anyone would like to help eyeball test the thing let me know
>> off-list. There are a few instructions I¹ll need to give. And it can¹t
>> handle much load right now due to intentional design limits.
>> On Nov 7, 2013, at 6:11 AM, Dyer, James <>
>> wrote:
>> Pat,
>> Can you give us the query it generates when you enter "vampire werewolf
>> zombie", q/qt/defType ?
>> My guess is you're using the default query parser with "q.op=AND" , or,
>> you're using dismax/edismax with a high "mm" (min-must-match) value.
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>> -----Original Message-----
>> From: Pat Ferrel []
>> Sent: Wednesday, November 06, 2013 5:53 PM
>> To: Schelter;
>> Subject: Re: Solr-recommender for Mahout 0.9
>> Done,
>> BTW I have the thing running on a demo site but am getting very poor
>> results that I think are related to the Solr setup. I'd appreciate any
>> ideas.
>> The sample data has 27,000 items and something like 4000 users. The
>> preference data is fairly dense since the users are professional
>> reviewers and the items videos.
>> 1) The number of item-item similarities that are kept is 100. Is this a
>> good starting point? Ted, do you recall how many you used before?
>> 2) The query is a simple text query made of space delimited video id
>> strings. These are the same ids as are stored in the item-item
>> similarity docs that Solr indexes.
>> Hit thumbs up on one video you you get several recommendations. Hit
>> thumbs up on several videos you get no recs. I'm either using the wrong
>> query type or have it set up to be too restrictive. As I read through
>> the docs if someone has a suggestion or pointer I'd appreciate it.
>> BTW the same sort of thing happens with Title search. Search for
>> "vampire werewolf zombie" you get no results, search for "zombie" you
>> get several.
>> On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <> wrote:
>> Hi Pat,
>> can you create issues for 1) and 2) ? Then I will try to get this into
>> trunk asap.
>> Best,
>> Sebastian
>> On 06.11.2013 19:13, Pat Ferrel wrote:
>>> Trying to integrate the Solr-recoemmender with the latest Mahout
>>> snapshot. The project uses a modified RecommenderJob because it needs
>>> SequenceFile output and to get the location of the
>>> preparePreferenceMatrix directory. If #1 and #2 are addressed I can
>>> remove the modified Mahout code from the project and rely on the
>>> default implementations in Mahout 0.9. #3 is a longer term issue
>>> related to the creation of a CrossRowSimilarityJob.
>>> I have dropped the modified code from the Solr-recommender project and
>>> have a modified build of the current Mahout 0.9 snapshot. If the
>>> following changes are made to Mahout I can test and release a Mahout
>>> 0.9 version of the Solr-recommender.
>>> 1. Option to change RecommenderJob output format
>>> Can someone add an option to output a SequenceFile. I modified the
>>> code to do the following, note the SequenceFileOutputFormat.class as
>>> the last parameter but this should really be determined with an option
>>> I think.
>>>  Job aggregateAndRecommend = prepareJob(
>>>          new Path(aggregateAndRecommendInput), outputPath,
>>> SequenceFileInputFormat.class,
>>>          PartialMultiplyMapper.class, VarLongWritable.class,
>>> PrefAndSimilarityColumnWritable.class,
>>>          AggregateAndRecommendReducer.class, VarLongWritable.class,
>>> RecommendedItemsWritable.class,
>>>          SequenceFileOutputFormat.class);
>>> 2. Visibility of preparePreferenceMatrix directory location
>>> The Solr-recommender needs to find where the RecommenderJob is putting
>>> it's output. 
>>> Mahout 0.8 RecommenderJob code was:
>>> public static final String DEFAULT_PREPARE_DIR =
>>> "preparePreferenceMatrix";
>>> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix"
>>> inline in the code:
>>> Path prepPath = getTempPath("preparePreferenceMatrix");
>>> This change to Mahout 0.9 works:
>>> public static final String DEFAULT_PREPARE_DIR =
>>> "preparePreferenceMatrix";
>>> and
>>> Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
>>> You could also make this a getter method on the RecommenderJob Class
>>> instead of using a public constant.
>>> 3. Downsampling
>>> The downsampling for maximum prefs per user has been moved from
>>> PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob
>>> uses matrix math instead of RSJ so it will no longer support
>>> downsampling until there is a hypothetical CrossRowSimilairtyJob with
>>> downsampling in it.

View raw message