From user-return-17709-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Mon Jun 24 21:08:04 2013 Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16A33D4E4 for ; Mon, 24 Jun 2013 21:08:04 +0000 (UTC) Received: (qmail 62125 invoked by uid 500); 24 Jun 2013 21:08:02 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 62089 invoked by uid 500); 24 Jun 2013 21:08:02 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 62080 invoked by uid 99); 24 Jun 2013 21:08:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 21:08:02 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dlieu.7@gmail.com designates 209.85.219.50 as permitted sender) Received: from [209.85.219.50] (HELO mail-oa0-f50.google.com) (209.85.219.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 21:07:55 +0000 Received: by mail-oa0-f50.google.com with SMTP id k7so12652216oag.9 for ; Mon, 24 Jun 2013 14:07:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Jpykl846WWqWCCwvT1faHIg7GeqxI6loNBBYIvcfliM=; b=USUNHuNqCk/7KGhr/ZWVlcn4rQNQjaEK+/748Y7cWvLJjI3Qg5YBklKtEK9i+h2FRL krcfp48zwfKNRtiaamk7JNlWqUtNj2i1GwjUYT0wplD2RrzG+4zChHaAIYZRmv+MAxWR WnvddHhTPee3PLOSPfqNzpgHA8mXtsQfgXvrfKnMMZ7OJzt73VJvwLqeAbMRlr98u2GF xr4yxnoDzfPbeiuvIjAS9dsp6jDziE9G2jZoUZDsoCKLzJlWGkSxBitGq1zwgXpksmO1 XDlCAol5RP/Ait+RB/wmp1eV0npyCB5r9Syy+Mz0NqCSHlS6IDkbBdhDFcAbruCqT6cK azPg== MIME-Version: 1.0 X-Received: by 10.60.44.168 with SMTP id f8mr11940775oem.133.1372108054533; Mon, 24 Jun 2013 14:07:34 -0700 (PDT) Received: by 10.76.109.163 with HTTP; Mon, 24 Jun 2013 14:07:34 -0700 (PDT) In-Reply-To: References: Date: Mon, 24 Jun 2013 14:07:34 -0700 Message-ID: Subject: Re: Consistent repeatable results for distributed ALS-WR recommender From: Dmitriy Lyubimov To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=001a113343d22f1c0a04dfeccdd9 X-Virus-Checked: Checked by ClamAV on apache.org --001a113343d22f1c0a04dfeccdd9 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin wrote: > I agree with you, I should have mentioned earlier that it would be good to > separate "noise from data" and deal with only what is separable. Of course > there is no truly deterministic implementation of any algorithm, I am pretty sure "2.0 + 2.0" is pretty deterministic :) > but I would expect to see "credible" results on a macro-level (in our case > it would be nice to see the same order of recommendations given the fixed > seed). It seems important for experiments (and for testing, as mentioned), > isn't it? > Yes for unit tests you usually would want to fix the seed if it means that assertion may fail with a non-zero probability. There are definitely a lot of such cases in Mahout. Another question is that afaik ALS-WR is deterministic by its inception, so > I'm trying to understand the reasons (and I'm assuming there are some) for > the specific implementation design. > > Thanks for a free lunch! ;) > Cheers,Mike. > > > Date: Mon, 24 Jun 2013 13:13:20 -0700 > > Subject: Re: Consistent repeatable results for distributed ALS-WR > recommender > > From: dlieu.7@gmail.com > > To: user@mahout.apache.org > > > > On Mon, Jun 24, 2013 at 1:07 PM, Michael Kazekin >wrote: > > > > > Thank you, Ted! > > > Any feedback on the usefulness of such functionality? Could it increase > > > the 'playability' of the recommender? > > > > > > > Almost all methods -- even deterministic ones -- will have a "credible > > interval" of prediction simply because method assumptions do not hold > 100% > > in real life, real data. So what you really want to know in such cases is > > the credible interval rather than whether method is deterministic or not. > > Non-deterministic methods might very well be more accurate than > > deterministic ones in this context, and, therefore, more "useful". Also > > see: "no free lunch theorem". > > > > > > > > From: ted.dunning@gmail.com > > > > Date: Mon, 24 Jun 2013 20:46:43 +0100 > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR > > > recommender > > > > To: user@mahout.apache.org > > > > > > > > See org.apache.mahout.common.RandomUtils#useTestSeed > > > > > > > > It provides the ability to freeze the initial seed. Normally this is > > > only > > > > used during testing, but you could use it. > > > > > > > > > > > > On Mon, Jun 24, 2013 at 8:44 PM, Michael Kazekin < > kazmikh@hotmail.com > > > >wrote: > > > > > > > > > Thanks a lot! > > > > > Do you know by any chance what are the underlying reasons for > including > > > > > such mandatory random seed initialization? > > > > > Do you see any sense in providing another option, such as filling > them > > > > > with zeroes in order to ensure the consistency and repeatability? > (for > > > > > example we might want to track and compare the generated > recommendation > > > > > lists for different parameters, such as the number of features or > > > number of > > > > > iterations etc.) > > > > > M. > > > > > > > > > > > > > > > > Date: Mon, 24 Jun 2013 19:51:44 +0200 > > > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR > > > > > recommender > > > > > > From: ssc@apache.org > > > > > > To: user@mahout.apache.org > > > > > > > > > > > > The matrices of the factorization are initalized randomly. If you > > > fix the > > > > > > random seed (would require modification of the code) you should > get > > > > > exactly > > > > > > the same results. > > > > > > Am 24.06.2013 13:49 schrieb "Michael Kazekin" < > kazmikh@hotmail.com>: > > > > > > > > > > > > > Hi! > > > > > > > Should I assume that under same dataset and same parameters for > > > > > factorizer > > > > > > > and recommender I will get the same results for any specific > user? > > > > > > > My current understanding that theoretically ALS-WR algorithm > could > > > > > > > guarantee this, but I was wondering could be there any numeric > > > method > > > > > > > issues and/or implementation-specific concerns. > > > > > > > Would appreciate any highlight on this issue. > > > > > > > Mike. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a113343d22f1c0a04dfeccdd9--