I used Log Likelihood Similarity and Euclidean distance. My input file is
string
CustomerNo,Part No
TR433;SPTBY1711
TR433;SPTBL1711
TR433;SPTKP1711
TR746;TDTBY861
TR746;TDTBL861
TR746;TDTKP861
and Converted using MemoryIDMigrator to long values like
1903325046098094985,5192157078505275458,3162216497309240828
2276278324672472631,496035984324855953,3162216497309240828
2276278324672472631,2666580089560192147,3162216497309240828
2276278324672472631,3436879215117796241,3162216497309240828
7260913912542566719,8688228931167592947,3162216497309240828
7260913912542566719,5860894063367472580,3162216497309240828
When i used Euclidean distance there is no recommendation, but Log
likelihood Based Item Similarity gives me results which seems very good.
So, If I use string based input data for recommendation, do I have to use
"Log likelihood Based Item Similarity"?
Thanks
Ozgur CATAK
Ph.D. Student
Istanbul University, Informatics
On Fri, Dec 11, 2009 at 12:13 PM, Sean Owen <srowen@gmail.com> wrote:
> You probably want a userbased recommender since you have very few
> users, relatively. Performance should not be a problem given the size
> of your input  probably can compute recommendations in tens of
> milliseconds.
>
> You will need to use RecommenderEvaluator to find which of many
> possible implementations produces the best results on your input. For
> example, experiment with a nearestn user neighborhood with small
> values of n, and try Euclidean distancebased and loglikelihoodbased
> similarity metrics. Try several variations and see which produces the
> lowest evaluation score.
>
> On Fri, Dec 11, 2009 at 6:43 AM, F.Ozgur Catak <f.ozgur.catak@gmail.com>
> wrote:
> > approx. 100.000 rows and 2000 users
> >
> > On Fri, Dec 11, 2009 at 2:25 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> The best algorithm really depends on your data.
> >>
> >> How many items and how many users do you have? that will determine
> >> which algorithms will perform better.
> >>
> >> Which algorithms will produce the best recommendations is hard to
> >> tell. Usually you have to use RecommenderEvaluator with lots of
> >> implementations and your data to find which seems to work best.
> >>
> >> if you can say more about your data, maybe I can guess about the best
> >> implementations to try.
> >>
> >> On Thu, Dec 10, 2009 at 9:56 PM, F.Ozgur Catak <f.ozgur.catak@gmail.com
> >
> >> wrote:
> >> > Hi again,
> >> >
> >> > Finally I understand the item similarity :). In our b2b project we
> need
> >> to
> >> > develop a recommendation system. I want to use mahout. Is there any
> best
> >> > practice. And also another question, is mahout enogh mature to use our
> >> > production enviroment.
> >> >
> >> > thanks
> >> >
> >> > On Thu, Dec 10, 2009 at 9:31 PM, Sean Owen <srowen@gmail.com> wrote:
> >> >
> >> >> No, the similarity metric is passed in as an ItemSimilarity metric.
> >> >> There is no implementation based on a model, if that's what you mean.
> >> >> What else?
> >> >>
> >> >> On Thu, Dec 10, 2009 at 7:27 PM, F.Ozgur Catak <
> f.ozgur.catak@gmail.com
> >> >
> >> >> wrote:
> >> >> > Yes, I read the javadoc but i need the algorithms. For example,
> does
> >> >> > recommandation system uses apriori algorithm to find similar
> values?
> >> etc.
> >> >> >
> >> >> > Maybe it is mine problem, because I'm also a newbi about data
> mining.
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >>
> >> >
> >>
> >
>
