mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Grant <trevor.d.gr...@gmail.com>
Subject Re: Running Mahout on a Spark cluster
Date Fri, 22 Sep 2017 05:09:32 GMT
Hi Hoa,

A few things could be happening here, I haven't run across that specific
error.

1) Spark 2.x - Mahout 0.13.0: Mahout 0.13.0 WILL run on Spark 2.x, however
you need to build from source (not the binaries).  You can do this by
downloading mahout source or cloning the repo and building with:
mvn clean install -Pspark-2.1,scala-2.11 -DskipTests

2) Have you setup spark with Kryo serialization? How you do this depends on
if you're in the shell/zeppelin or using spark submit.

However, for both of these cases- it shouldn't have even run local afaik so
the fact it did tells me you probably have gotten this far?

Assuming you've done 1 and 2, can you share some code? I'll see if I can
recreate on my end.

Thanks!

tg

On Thu, Sep 21, 2017 at 9:37 PM, Hoa Nguyen <hoa@insightdatascience.com>
wrote:

> I apologize in advance if this is too much of a newbie question but I'm
> having a hard time running any Mahout example code in a distributed Spark
> cluster. The code runs as advertised when Spark is running locally on one
> machine but the minute I point Spark to a cluster and master url, I can't
> get it to work, drawing the error: "WARN scheduler.TaskSchedulerImpl:
> Initial job has not accepted any resources; check your cluster UI to ensure
> that workers are registered and have sufficient memory"
>
> I know my Spark cluster is configured and working correctly because I ran
> non-Mahout code and it runs on a distributed cluster fine. What am I doing
> wrong? The only thing I can think of is that my Spark version is too recent
> -- 2.1.1 -- for the Mahout version I'm using -- 0.13.0. Is that it or am I
> doing something else wrong?
>
> Thanks for any advice,
> Hoa
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message