spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh <>
Subject Re: [MLlib] Contributing Algorithm for Outlier Detection
Date Tue, 18 Nov 2014 10:40:14 GMT
Hi Anant,

I have removed the counter and all possible side effects. Now I think we can go ahead with
the testing. I have created another folder for testing. I will add you as a collaborator in
github .


From: slcclimber [via Apache Spark Developers List] <>
Sent: Monday, November 17, 2014 10:45 AM
To: Ashutosh Trivedi (MT2013030)
Subject: Re: [MLlib] Contributing Algorithm for Outlier Detection

The counter will certainly be an parellization issue when multiple nodes are used specially
over massive datasets.
A better approach would be to use some thing along these lines:

    val index = sc.parallelize(Range.Long(0, rdd.count, 1), rdd.partitions.size)
    val rddWithIndex =
Which zips the two RDD's in a parallelizable fashion.

If you reply to this email, your message will be added to the discussion below:
To unsubscribe from [MLlib] Contributing Algorithm for Outlier Detection, click here<>.

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message