spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Problems concerning implementing machine learning algorithm from scratch based on Spark
Date Tue, 30 Dec 2014 10:41:06 GMT
The you mentioned is the old one.The new code is now added to spark-packages and
is available at . Have a look at the new code.
We have used numpy functions in our code and didnt notice any slowdown because of this. Thanks
& Regards,
Meethu M 

     On Tuesday, 30 December 2014 11:50 AM, danqing0703 <> wrote:

 Hi all,

I am trying to use some machine learning algorithms that are not included
in the Mllib. Like Mixture Model and LDA(Latent Dirichlet Allocation), and
I am using pyspark and Spark SQL.

My problem is: I have some scripts that implement these algorithms, but I
am not sure which part I shall change to make it fit into Big Data.

  - Like some very simple calculation may take much time if data is too
  big,but also constructing RDD or SQLContext table takes too much time. I am
  really not sure if I shall use map(), reduce() every time I need to make
  - Also, there are some matrix/array level calculation that can not be
  implemented easily merely using map(),reduce(), thus functions of the Numpy
  package shall be used. I am not sure when data is too big, and we simply
  use the numpy functions. Will it take too much time?

I have found some scripts that are not from Mllib and was created by other
developers(credits to Meethu Mathew from Flytxt, thanks for giving me

Many thanks and look forward to getting feedbacks!

Best, Danqing (7K) <>

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message