mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Psaltis <Andrew.Psal...@Webtrends.com>
Subject Re: Solr-recommender for Mahout 0.9
Date Thu, 07 Nov 2013 17:57:53 GMT
Pat,
Perhaps I am missing something here, but why not use a String field if you
do not need any of the analysis? Seems like from your previous email "The
query is a simple text query made of space delimited video id strings" - -
that you basically have a keyword style query which would seem to fit
better with a String field and not a neutered Text field.

Thanks,
Andrew





On 11/7/13 10:44 AM, "Pat Ferrel" <pat.ferrel@gmail.com> wrote:

>One difference is that a ³text² field has analyzers like Porter stemming
>applied. I had to take these out of the schema.xml. I think TFIDF is also
>applied to the tems in ³text² but may not be to MV fields. I think TFIDF
>is good in the application. The idea is that if everyone likes a movie,
>it isn¹t much of a differentiator. Also changing to MV fields is simply
>applying a different type to the field in the schema I think, so trivial
>to try out.
>
>At this point the only test is an eyeball test so measuring differences
>is problematic. If anyone has intuition fire away.
>
>On Nov 7, 2013, at 9:23 AM, Dominik Hübner <contact@dhuebner.com> wrote:
>
>Does anyone know what the difference is between keeping the ids in a
>space delimited string and indexing a multivalued field of ids? I
>recently tried the latter since ... it felt right, however I am not sure
>which of both has which advantages.
>
>On 07 Nov 2013, at 18:18, Pat Ferrel <pat.ferrel@gmail.com> wrote:
>
>> I have dismax (no edismax) but am not using it yet, using the default
>>query, which does use ŒAND¹. I had much the same though as I slept on
>>it. Changing to OR is now working much much better. So obvious it almost
>>bit me, not good in this case...
>> 
>> With only a trivially small amount of testing I¹d say we have a new
>>recommender on the block.
>> 
>> If anyone would like to help eyeball test the thing let me know
>>off-list. There are a few instructions I¹ll need to give. And it can¹t
>>handle much load right now due to intentional design limits.
>> 
>> 
>> On Nov 7, 2013, at 6:11 AM, Dyer, James <James.Dyer@ingramcontent.com>
>>wrote:
>> 
>> Pat,
>> 
>> Can you give us the query it generates when you enter "vampire werewolf
>>zombie", q/qt/defType ?
>> 
>> My guess is you're using the default query parser with "q.op=AND" , or,
>>you're using dismax/edismax with a high "mm" (min-must-match) value.
>> 
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>> 
>> 
>> -----Original Message-----
>> From: Pat Ferrel [mailto:pat.ferrel@gmail.com]
>> Sent: Wednesday, November 06, 2013 5:53 PM
>> To: ssc@apache.org Schelter; user@mahout.apache.org
>> Subject: Re: Solr-recommender for Mahout 0.9
>> 
>> Done,
>> 
>> BTW I have the thing running on a demo site but am getting very poor
>>results that I think are related to the Solr setup. I'd appreciate any
>>ideas.
>> 
>> The sample data has 27,000 items and something like 4000 users. The
>>preference data is fairly dense since the users are professional
>>reviewers and the items videos.
>> 
>> 1) The number of item-item similarities that are kept is 100. Is this a
>>good starting point? Ted, do you recall how many you used before?
>> 2) The query is a simple text query made of space delimited video id
>>strings. These are the same ids as are stored in the item-item
>>similarity docs that Solr indexes.
>> 
>> Hit thumbs up on one video you you get several recommendations. Hit
>>thumbs up on several videos you get no recs. I'm either using the wrong
>>query type or have it set up to be too restrictive. As I read through
>>the docs if someone has a suggestion or pointer I'd appreciate it.
>> 
>> BTW the same sort of thing happens with Title search. Search for
>>"vampire werewolf zombie" you get no results, search for "zombie" you
>>get several.
>> 
>> On Nov 6, 2013, at 2:18 PM, Sebastian Schelter <ssc@apache.org> wrote:
>> 
>> Hi Pat,
>> 
>> can you create issues for 1) and 2) ? Then I will try to get this into
>> trunk asap.
>> 
>> Best,
>> Sebastian
>> 
>> On 06.11.2013 19:13, Pat Ferrel wrote:
>>> Trying to integrate the Solr-recoemmender with the latest Mahout
>>>snapshot. The project uses a modified RecommenderJob because it needs
>>>SequenceFile output and to get the location of the
>>>preparePreferenceMatrix directory. If #1 and #2 are addressed I can
>>>remove the modified Mahout code from the project and rely on the
>>>default implementations in Mahout 0.9. #3 is a longer term issue
>>>related to the creation of a CrossRowSimilarityJob.
>>> 
>>> I have dropped the modified code from the Solr-recommender project and
>>>have a modified build of the current Mahout 0.9 snapshot. If the
>>>following changes are made to Mahout I can test and release a Mahout
>>>0.9 version of the Solr-recommender.
>>> 
>>> 1. Option to change RecommenderJob output format
>>> 
>>> Can someone add an option to output a SequenceFile. I modified the
>>>code to do the following, note the SequenceFileOutputFormat.class as
>>>the last parameter but this should really be determined with an option
>>>I think.
>>> 
>>>   Job aggregateAndRecommend = prepareJob(
>>>           new Path(aggregateAndRecommendInput), outputPath,
>>>SequenceFileInputFormat.class,
>>>           PartialMultiplyMapper.class, VarLongWritable.class,
>>>PrefAndSimilarityColumnWritable.class,
>>>           AggregateAndRecommendReducer.class, VarLongWritable.class,
>>>RecommendedItemsWritable.class,
>>>           SequenceFileOutputFormat.class);
>>> 
>>> 2. Visibility of preparePreferenceMatrix directory location
>>> 
>>> The Solr-recommender needs to find where the RecommenderJob is putting
>>>it's output. 
>>> 
>>> Mahout 0.8 RecommenderJob code was:
>>> public static final String DEFAULT_PREPARE_DIR =
>>>"preparePreferenceMatrix";
>>> 
>>> Mahout 0.9 RecommenderJob code just puts "preparePreferenceMatrix"
>>>inline in the code:
>>> Path prepPath = getTempPath("preparePreferenceMatrix");
>>> 
>>> This change to Mahout 0.9 works:
>>> public static final String DEFAULT_PREPARE_DIR =
>>>"preparePreferenceMatrix";
>>> and
>>> Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
>>> 
>>> You could also make this a getter method on the RecommenderJob Class
>>>instead of using a public constant.
>>> 
>>> 3. Downsampling
>>> 
>>> The downsampling for maximum prefs per user has been moved from
>>>PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob
>>>uses matrix math instead of RSJ so it will no longer support
>>>downsampling until there is a hypothetical CrossRowSimilairtyJob with
>>>downsampling in it.
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>
>


Mime
View raw message