mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "F.Ozgur Catak" <f.ozgur.ca...@gmail.com>
Subject Re: ItemBasedRecommender
Date Fri, 11 Dec 2009 14:47:17 GMT
I used Log Likelihood Similarity and Euclidean distance. My input file is
string

CustomerNo,Part No
TR433;SPTBY-1711
TR433;SPTBL-1711
TR433;SPTKP-1711
TR746;TDTBY-861
TR746;TDTBL-861
TR746;TDTKP-861

and Converted using MemoryIDMigrator to long values like

1903325046098094985,5192157078505275458,-3162216497309240828
2276278324672472631,496035984324855953,-3162216497309240828
2276278324672472631,2666580089560192147,-3162216497309240828
2276278324672472631,-3436879215117796241,-3162216497309240828
7260913912542566719,8688228931167592947,-3162216497309240828
7260913912542566719,5860894063367472580,-3162216497309240828


When i used Euclidean distance there is no recommendation, but Log
likelihood Based Item Similarity gives me results which seems very good.
So, If I use string based input data for recommendation, do I have to use
"Log likelihood Based Item Similarity"?

Thanks

Ozgur CATAK

Ph.D. Student
Istanbul University, Informatics

On Fri, Dec 11, 2009 at 12:13 PM, Sean Owen <srowen@gmail.com> wrote:

> You probably want a user-based recommender since you have very few
> users, relatively. Performance should not be a problem given the size
> of your input -- probably can compute recommendations in tens of
> milliseconds.
>
> You will need to use RecommenderEvaluator to find which of many
> possible implementations produces the best results on your input. For
> example, experiment with a nearest-n user neighborhood with small
> values of n, and try Euclidean distance-based and log-likelihood-based
> similarity metrics. Try several variations and see which produces the
> lowest evaluation score.
>
> On Fri, Dec 11, 2009 at 6:43 AM, F.Ozgur Catak <f.ozgur.catak@gmail.com>
> wrote:
> > approx. 100.000 rows and 2000 users
> >
> > On Fri, Dec 11, 2009 at 2:25 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> The best algorithm really depends on your data.
> >>
> >> How many items and how many users do you have? that will determine
> >> which algorithms will perform better.
> >>
> >> Which algorithms will produce the best recommendations is hard to
> >> tell. Usually you have to use RecommenderEvaluator with lots of
> >> implementations and your data to find which seems to work best.
> >>
> >> if you can say more about your data, maybe I can guess about the best
> >> implementations to try.
> >>
> >> On Thu, Dec 10, 2009 at 9:56 PM, F.Ozgur Catak <f.ozgur.catak@gmail.com
> >
> >> wrote:
> >> > Hi again,
> >> >
> >> > Finally I understand the item similarity :). In our b2b project we
> need
> >> to
> >> > develop a recommendation system. I want to use mahout. Is there any
> best
> >> > practice. And also another question, is mahout enogh mature to use our
> >> > production enviroment.
> >> >
> >> > thanks
> >> >
> >> > On Thu, Dec 10, 2009 at 9:31 PM, Sean Owen <srowen@gmail.com> wrote:
> >> >
> >> >> No, the similarity metric is passed in as an ItemSimilarity metric.
> >> >> There is no implementation based on a model, if that's what you mean.
> >> >> What else?
> >> >>
> >> >> On Thu, Dec 10, 2009 at 7:27 PM, F.Ozgur Catak <
> f.ozgur.catak@gmail.com
> >> >
> >> >> wrote:
> >> >> > Yes, I read the javadoc but i need the algorithms. For example,
> does
> >> >> > recommandation system uses apriori algorithm to find similar
> values?
> >> etc.
> >> >> >
> >> >> > Maybe it is mine problem, because I'm also a newbi about data
> mining.
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message