mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: DBSCAN implementation in Mahout
Date Mon, 01 Dec 2014 20:41:41 GMT
if memory serves me, DeLiClu (density-link) is current best density thing
since it does not require parameter searches.

What is parallelization strategy you are proposing?

I know there were a bunch of attempts to parallelize/partition the dbscan
problem, one of more interesting is perhaps of Google's MR.SCAN paper, but
even the latter is not qutie embarassingly parallel (requires partitioning
overlap between subtasks which is a function of epsilon neighborhood).
Nevertheless, this seemed to yield significantly interesting performance.

also, MR version of Mahout has (or used to have) mean shift, which is just
fine, if not better, for irregularly-shaped density clustering. Not sure of
its performance though. their translations into spark perhaps would be
interesting enough.



On Sat, Nov 29, 2014 at 12:31 PM, 3316 Chirag Nagpal <
chiragnagpal_12102@aitpune.edu.in> wrote:

> Hi Dimitry,
>
> Thanks for the reply....
>
> Since Density based clustering algorithms, are being utilised extensively,
> especially by the GIS research groups, it is a bit sad that there isn't a
> Map Reduce implementation available..
>
> I think I will propose to write MapReduce code for DBSCAN and OPTICS for
> GSoC '15.
>
> I would like to take your input as to how much of significance would this
> be of to the community in general?
>
> Thanks,
>
> Chirag Nagpal
> University of Pune, India
> www.chiragnagpal.com
> ________________________________________
> From: Dmitriy Lyubimov <dlieu.7@gmail.com>
> Sent: Saturday, November 29, 2014 11:29 PM
> To: user@mahout.apache.org
> Subject: Re: DBSCAN implementation in Mahout
>
> No there is no dbscan, optics or any other density flavor afaik
>
> Sent from my phone.
> On Nov 28, 2014 11:41 AM, "3316 Chirag Nagpal" <
> chiragnagpal_12102@aitpune.edu.in> wrote:
>
> > ?
> >
> > Hello
> > I am Chirag Nagpal, a third year student of Computer Engineering at the
> > University of Pune, India and currently interning at SERC, Indian
> Institute
> > of Science, Bangalore
> >
> > My work involves using density based clustering algorithms like DBSCAN on
> > geo-referenced data like Tweets. Typically the dataset consists of
> millions
> > of points. I would like to know if there is any Map Reduce implementation
> > of DBSCAN available.
> >
> > thank you
> > Chirag ?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message