spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: KMeans for large training data
Date Sun, 13 Jul 2014 01:11:42 GMT
The "netlib.BLAS: Failed to load implementation" warning only means that
the BLAS implementation may be slower than using a native one. The reason
why it only shows up at the end is that the library is only used for the
finalization step of the KMeans algorithm, so your job should've been
wrapping up at this point. I am not familiar with the algorithm beyond
that, so I'm not sure if for some reason we're trying to collect too much
data back to the driver here.

SPARK_DRIVER_MEMORY can increase the driver memory, by the way (or by using
the --driver-memory option when using spark-submit).


On Sat, Jul 12, 2014 at 2:38 AM, durin <mail@simon-schaefer.net> wrote:

> Your latest response doesn't show up here yet, I only got the mail. I'll
> still answer here in the hope that it appears later:
>
> Which memory setting do you mean? I can go up with spark.executor.memory a
> bit, it's currently set to 12G. But thats already way more than the whole
> SchemaRDD of Vectors that I currently use for training, which shouldn't be
> more than a few hundred M.
> I suppose you rather mean something comparable to SHARK_MASTER_MEM in
> Shark.
> I can't find the equivalent for Spark in the documentations, though.
>
> And if it helps, I can summarize the whole code currently that I currently
> use. It's nothing really fancy at the moment, I'm just trying to classify
> Strings that each contain a few words (words are handled each as atomic
> items).
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9509.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message