mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Zohar <disso...@gmail.com>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 15:26:20 GMT
Manuel, I starting running the evaluation as proposed. But it seems it will
take forever to complete. It does the evaluation for each user which takes
well over a minute. What am I doing wrong?
This is my code :

RecommenderBuilder itemBasedBuilder = new RecommenderBuilder() {

             public Recommender buildRecommender(DataModel model) {

                     // build and return the Recommender to evaluate here

                     try {

                     ItemSimilarity itemSimilarity = newCachingItemSimilarity(
new LogLikelihoodSimilarity(model), model);

              CandidateItemsStrategy candidateItemsStrategy = new
OptimizedItemStrategy(20,
2, 100);

              MostSimilarItemsCandidateItemsStrategy
mostSimilarItemsCandidateItemsStrategy = new OptimizedItemStrategy(20, 2,
100);

              ItemBasedRecommender recommender =
newGenericBooleanPrefItemBasedRecommender(
dataModel, itemSimilarity, candidateItemsStrategy,

               mostSimilarItemsCandidateItemsStrategy);

  return recommender;

                     } catch (TasteException e) {

                             // TODO Auto-generated catch block

                             e.printStackTrace();

                             return null;

                     }

             }

     };

RecommenderIRStatsEvaluator evaluator = new
GenericRecommenderIRStatsEvaluator();

 try {

 IRStatistics stats = evaluator.evaluate(itemBasedBuilder, null,
this.dataModel, null, 3, 0, 1.0);

 logger.info("Evalute returned:" + stats.toString());

 } catch (TasteException e) {

 // TODO Auto-generated catch block

 logger.error("",e);

 }

On Fri, Dec 2, 2011 at 1:29 PM, Daniel Zohar <dissoman@gmail.com> wrote:

> Hello Manuel,
> I will run the tests as requested and post the results later.
>
>
> On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de> wrote:
>
>> Hello Daniel,
>>
>> On 02.12.2011, at 12:02, Daniel Zohar wrote:
>>
>> > Hi guys,
>> >
>> > ...
>> > I just ran the fix I proposed earlier and I got great results! The query
>> > time was reduced to about a third for the 'heavy users'. Before it was
>> 1-5
>> > secs and now it's 0.5-1.5. The best part is that the accuracy level
>> should
>> > remain exactly the same. I also believe it should reduce memory
>> > consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets
>> > significantly smaller (in my case at least).
>>
>> It would be great if you could measure your run time performance and your
>> accuracy with the provided Mahout tools.
>>
>> In your case because you only have boolean feedback precision and recall
>> would make sense.
>>
>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html
>>
>> RecommenderIRStatsEvaluator evaluator = new
>> GenericRecommenderIRStatsEvaluator();
>> IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3,
>>      RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>>
>>
>> Here is some example code from me:
>>
>> public void testEvaluateRecommender() {
>>                try {
>>                        DataModel myModel = new
>> MyModelImplementationDataModel();
>>
>>                        // Users: 12858
>>                        // Items: 5467
>>                        // MaxPreference: 85850.0
>>                        // MinPreference: 50.0
>>                        System.out.println("Users:
>> "+myModel.getNumUsers());
>>                        System.out.println("Items:
>> "+myModel.getNumItems());
>>                        System.out.println("MaxPreference:
>> "+myModel.getMaxPreference());
>>                        System.out.println("MinPreference:
>> "+myModel.getMinPreference());
>>
>>                        RecommenderBuilder randomBased = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> RandomRecommender(model);
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder genericItemBased = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> GenericItemBasedRecommender(model,
>>                                                                new
>> PearsonCorrelationSimilarity(model));
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder genericItemBasedCosine = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> GenericItemBasedRecommender(model,
>>                                                                new
>> UncenteredCosineSimilarity(model));
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder genericItemBasedLikely = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        return new
>> GenericItemBasedRecommender(model,
>>                                                        new
>> LogLikelihoodSimilarity(model));
>>                                }
>>                        };
>>
>>
>>                        RecommenderBuilder genericUserBasedNN3 = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> GenericUserBasedRecommender(
>>                                                                model,
>>                                                                new
>> NearestNUserNeighborhood(
>>
>>      3,
>>
>>      new PearsonCorrelationSimilarity(model),
>>
>>      model),
>>                                                                new
>> PearsonCorrelationSimilarity(model));
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder genericUserBasedNN20 = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> GenericUserBasedRecommender(
>>                                                                model,
>>                                                                new
>> NearestNUserNeighborhood(
>>
>>      20,
>>
>>      new PearsonCorrelationSimilarity(model),
>>
>>      model),
>>                                                                new
>> PearsonCorrelationSimilarity(model));
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder slopeOneBased = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> SlopeOneRecommender(model);
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        RecommenderBuilder svdBased = new
>> RecommenderBuilder() {
>>                                public Recommender
>> buildRecommender(DataModel model) {
>>                                        // build and return the
>> Recommender to evaluate here
>>                                        try {
>>                                                return new
>> SVDRecommender(model, new ALSWRFactorizer(
>>                                                                model,
>> 100, 0.3, 5));
>>                                        } catch (TasteException e) {
>>                                                // TODO Auto-generated
>> catch block
>>                                                e.printStackTrace();
>>                                                return null;
>>                                        }
>>                                }
>>                        };
>>
>>                        // Data Set Summary:
>>                        // 12858 users
>>                        // 121304 preferences
>>
>>                        RecommenderEvaluator evaluator = new
>> AverageAbsoluteDifferenceRecommenderEvaluator();
>>
>>                        double evaluation =
>> evaluator.evaluate(randomBased, null, myModel,
>>                                        0.9, 1.0);
>>                        // Evaluation of randomBased (baseline):
>> 43045.380570443434
>>                        // (RandomRecommender(model))
>>                        System.out.println("Evaluation of randomBased
>> (baseline): "
>>                                        + evaluation);
>>
>>                        // evaluation =
>> evaluator.evaluate(genericItemBased, null, myModel,
>>                        // 0.9, 1.0);
>>                        // Evaluation of ItemBased with Pearson
>> Correlation:
>>                        // 315.5804958647985
>> (GenericItemBasedRecommender(model,
>>                        // PearsonCorrelationSimilarity(model))
>>                        // System.out
>>                        // .println("Evaluation of ItemBased with Pearson
>> Correlation: "
>>                        // + evaluation);
>>
>>                        // evaluation =
>> evaluator.evaluate(genericItemBasedCosine, null,
>>                        // myModel, 0.9, 1.0);
>>                        // Evaluation of ItemBase with uncentered Cosine:
>> 198.25393235323375
>>                        // (GenericItemBasedRecommender(model,
>>                        // UncenteredCosineSimilarity(model)))
>>                        // System.out
>>                        // .println("Evaluation of ItemBased with
>> Uncentered Cosine: "
>>                        // + evaluation);
>>
>>                        evaluation =
>> evaluator.evaluate(genericItemBasedLikely, null,
>>                                        myModel, 0.9, 1.0);
>>                        // Evaluation of ItemBase with log likelihood:
>> 176.45243607278724
>>                        // (GenericItemBasedRecommender(model,
>>                        // LogLikelihoodSimilarity(model)))
>>                        System.out
>>                                        .println("Evaluation of ItemBased
>> with LogLikelihood: "
>>                                                        + evaluation);
>>
>>
>>
>>                        // User based is slow and inaccurate
>>                        // evaluation =
>> evaluator.evaluate(genericUserBasedNN3, null,
>>                        // myModel, 0.9, 1.0);
>>                        // Evaluation of UserBased 3 with Pearson
>> Correlation:
>>                        // 1774.9897130330407
>> (GenericUserBasedRecommender(model,
>>                        // NearestNUserNeighborhood(3,
>> PearsonCorrelationSimilarity(model),
>>                        // model), PearsonCorrelationSimilarity(model)))
>>                        // took about 2 minutes
>>                        // System.out.println("Evaluation of UserBased 3
>> with Pearson Correlation: "+evaluation);
>>
>>                        // evaluation =
>> evaluator.evaluate(genericUserBasedNN20, null,
>>                        // myModel, 0.9, 1.0);
>>                        // Evaluation of UserBased 20 with Pearson
>>                        // Correlation:1329.137324225053
>> (GenericUserBasedRecommender(model,
>>                        // NearestNUserNeighborhood(20,
>> PearsonCorrelationSimilarity(model),
>>                        // model), PearsonCorrelationSimilarity(model)))
>>                        // took about 3 minutes
>>                        // System.out.println("Evaluation of UserBased 20
>> with Pearson Correlation: "+evaluation);
>>
>>                        // evaluation = evaluator.evaluate(slopeOneBased,
>> null, myModel,
>>                        // 0.9, 1.0);
>>                        // Evaluation of SlopeOne: 464.8989330869532
>>                        // (SlopeOneRecommender(model))
>>                        // System.out.println("Evaluation of SlopeOne:
>> "+evaluation);
>>
>>                        // evaluation = evaluator.evaluate(svdBased, null,
>> myModel, 0.9,
>>                        // 1.0);
>>                        // Evaluation of SVD based: 378.9776153202042
>>                        // (ALSWRFactorizer(model, 100, 0.3, 5))
>>                        // took about 10 minutes to calculate on a Mac
>> Book Pro
>>                        // System.out.println("Evaluation of SVD based:
>> "+evaluation);
>>
>>                } catch (TasteException e) {
>>                        // TODO Auto-generated catch block
>>                        e.printStackTrace();
>>                 }
>>
>>        }
>>
>> >
>> > The fix is merely adding two lines of code to one of
>> > the GenericBooleanPrefDataModel constructors. See
>> > http://pastebin.com/K5PB68Et, the lines I added are #11, #22.
>> >
>> > The only problem I see at the moment, is that the similarities
>> > implementations are using the num of users per item in the
>> > item-item similarity calculation. This _can_ be mitigated by creating an
>> > additional Map in the DataModel which maps itemID to numUsers.
>> >
>> > What do you think about the proposed solution? Perhaps I am missing some
>> > other implications?
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <srowen@gmail.com> wrote:
>> >
>> >> (Agree, and the sampling happens at the user level now -- so if you
>> sample
>> >> one of these users, it slows down a lot. The spirit of the proposed
>> change
>> >> is to make sampling more fine-grained, at the individual item level.
>> That
>> >> seems to certainly fix this.)
>> >>
>> >> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <ted.dunning@gmail.com>
>> >> wrote:
>> >>
>> >>> This may or may not help much.  My guess is that the improvement will
>> be
>> >>> very modest.
>> >>>
>> >>> The most serious problem is going to be recommendations for anybody
>> who
>> >> has
>> >>> rated one of these excessively popular items.  That item will bring
>> in a
>> >>> huge number of other users and thus a huge number of items to
>> consider.
>> >> If
>> >>> you down-sample ratings of the prolific users and kill super-common
>> >> items,
>> >>> I think you will see much more improvement than simply eliminating the
>> >>> singleton users.
>> >>>
>> >>> The basic issue is that cooccurrence based algorithms have run-time
>> >>> proportional to O(n_max^2) where n_max is the maximum number of items
>> per
>> >>> user.
>> >>>
>> >>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <dissoman@gmail.com>
>> wrote:
>> >>>
>> >>>> This is why I'm looking now into improving
>> GenericBooleanPrefDataModel
>> >> to
>> >>>> not take into account users which made one interaction under the
>> >>>> 'preferenceForItems' Map. What do you think about this approach?
>> >>>>
>> >>>
>> >>
>>
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message