spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aliaksei Litouka <aliaksei.lito...@gmail.com>
Subject Re: An attempt to implement dbscan algorithm on top of Spark
Date Fri, 13 Jun 2014 01:20:45 GMT
Vipul,
Thanks for your feedback. As far as I understand, mean RDD[(Double,
Double)] (note the parenthesis), and each of these Double values is
supposed to contain one coordinate of a point. It limits us to
2-dimensional space, which is not suitable for many tasks. I want the
algorithm to be able to work in multidimensional space. Actually, there is
a class org.alitouka.spark.dbscan.spatial.Point in my code, which
represents a point with an arbitrary number of coordinates.

IOHelper.readDataset is just a convenience method which reads a CSV file
and returns an RDD of Points (more precisely, it returns a value of type
RawDataset, which is just an alias for RDD[Point]). If your data is stored
in a format other than CSV, you will have to write your own code to convert
your data to RawDataset.

I can add support for other data formats in future versions.

As for other distance measures - it is a high priority issue in my list ;)



On Thu, Jun 12, 2014 at 6:02 PM, Vipul Pandey <vipandey@gmail.com> wrote:

> Great! I was going to implement one of my own - but I may not need to do
> that any more :)
> I haven't had a chance to look deep into your code but I would recommend
> accepting an RDD[Double,Double] as well, instead of just a file.
>
> val data = IOHelper.readDataset(sc, "/path/to/my/data.csv")
>
> And other distance measures ofcourse.
>
> Thanks,
> Vipul
>
>
>
>
> On Jun 12, 2014, at 2:31 PM, Aliaksei Litouka <aliaksei.litouka@gmail.com>
> wrote:
>
> Hi.
> I'm not sure if messages like this are appropriate in this list; I just
> want to share with you an application I am working on. This is my personal
> project which I started to learn more about Spark and Scala, and, if it
> succeeds, to contribute it to the Spark community.
>
> Maybe someone will find it useful. Or maybe someone will want to join
> development.
>
> The application is available at https://github.com/alitouka/spark_dbscan
>
> Any questions, comments, suggestions, as well as criticism are welcome :)
>
> Best regards,
> Aliaksei Litouka
>
>
>

Mime
View raw message