mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <>
Subject Re: To all the recommendation people..
Date Mon, 16 May 2011 02:25:16 GMT
On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <> wrote:

> Due to the whole Netflix data lawsuit, the training data is synthetic,
> which
> puts the contestants at a disadvantage, and another interesting fact:
> runtime
> performance is at issue: your code will be run *live*, with your model
> being
> used to produce recommendations with a hard timeout of 50ms - if you
> miss this more than 20% of the time, you fail to progress to the end of
> the semi-final round.

If the dataset is synthetic (and I assume not random) is the goal to just
guess the model that generated the dataset?  Assuming it performs well, how
far us the 'synthetic' model from the actual customer behavior so that there
are no 'surprises' when it runs 'live'?

Potentially, there are more avenues for a lawsuit than in the Netflix case
since money is involved (just a thought).

Alex K

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message