mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: question about distributed recommendations
Date Sat, 04 Aug 2012 14:24:32 GMT
Yes, or anywhere else you want to publish static results to, if you don't
want to expose HDFS. HDFS isn't good at small random reads, so it would be
a question of bulk-loading shards of results. The MapReduce workers are not
relevant to serving. They would have produced the results, offline, at some
point in the past and the results would be published. HDFS's redundancy can
handle issues of node failure.

*Hadoop* is utterly unsuitable for anything real-time. The idea is to
compute results in batch and serve them after it's done. *HDFS* is probably
OK to serve static data if you are loading chunks at a time, on demand. I
am not sure this is a great architecture but it will work OK, certainly to
moderate scale.

If you're interested in more on what I'm doing, let's follow up off list,
but start here:

On Sat, Aug 4, 2012 at 10:04 AM, Matt Mitchell <> wrote:

> So, the front-end machine would need access to the HDFS, and then
> query the system in real-time? Each of the map-reduce nodes would need
> to be up and running to produce results right? Also, what happens if
> one of the nodes goes down for some reason?
> I haven't spent a lot of time with Hadoop but I'm curious about the
> performance/latency there vs data being in memory?
> My system is really only a prototype and mainly a way or me to learn,
> but Myrrix still looks interesting! I'd love to look under the hood
> and see what you've got going :)
> Thanks!
> - Matt

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message