mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Taste-GenericItemBasedRecommender
Date Sat, 12 Dec 2009 23:08:20 GMT
On Sat, Dec 12, 2009 at 8:58 AM, Sean Owen <srowen@gmail.com> wrote:

> I've implemented this but it's still quite slow. Computing
> recommendations goes from a couple hundred ms to 10 seconds. Nothing
> wrong with this idea -- it's all the loading vectors and distributed
> stuff that's weighing it down.
>

You're not computing only one recommendation at a time, are you?
I really need to read through the hadoop.item code, but in general, what
is the procedure here?  If you're doing work on HDFS as a M/R job, you're
doing a huge batch, right?  You're saying the aggregate performance is
10 seconds per recomendation across millions of recommendations, or
doing a one-shot task?

I feel like too much of this conversation went by and I missed some
crucial piece describing the taks in a big-picture sense (and this
notion is backed up by the fact that we keep talking past each other
when it comes to which parts of this process are online and which are
offline).  Can you give a quick review of which part of this is supposed
to be on Hadoop, which parts are done live, a kind of big picture
description of what's going on?

I think that's the culprit in fact, having to load all the column
> vectors, since they're not light.
>
> One approach is to make the user vectors more sparse by throwing out
> data, though I don't like it so much.
>
> One question -- in SparseVector, can't we internally remove entries
> when they are set to 0.0? since implicitly missing entries are 0?
>

We should certainly add a "compact" method to both versions of
SparseVector, which could be periodically called to remove out any
zeroes and save on subsequent computational costs.

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message