mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: Help for graph processing implementation
Date Sun, 23 Oct 2011 10:00:31 GMT
On 23 October 2011 07:16, Bae, Jae Hyeon <metacret@gmail.com> wrote:

> I am implementing graph clustering algorithm based on hadoop and mahout.
> This is my term project of data mining course.

Cool! Fun topic...

> Spectral method of graph clustering needs calculation of eigenvectors,
> which is not practically efficient with the large scale graph. Thus, there
> exists multi-level graph clustering method without eigenvectors. This
> contains graph coarsening, base clustering, refining. Refining stage can be
> done with weighted kernel k-means clustering which is not so difficult to
> be implemented in MapReduce way, but the problem is graph coarsening.
> Pseudocode is on this paper
> http://www.cs.utexas.edu/~ddn/papers/sui10.pdf Like any graph
> processing algorithm, this algorithm does not look easy to
> be intuitively implemented in MapReduce way. So, I need a help from experts
> more proficient at converting single thread graph algorithm to MapReduce
> way. If this work is done smoothly, I will contribute this graph clustering
> algorithm to Mahout if I am allowed to do so.

Do take a look at the Spectral Clustering code (and Eigencuts) already
in Mahout.

orig proposal: https://issues.apache.org/jira/browse/MAHOUT-363
http://code.google.com/p/eigencuts/
wiki: https://cwiki.apache.org/MAHOUT/spectral-clustering.html
blog notes: http://spectrallyclustered.wordpress.com/
bugs: https://issues.apache.org/jira/browse/MAHOUT-524
javadoc: (maybe not freshest)
http://javasourcecode.org/html/open-source/mahout/mahout-0.5/org/apache/mahout/clustering/spectral/common/package-summary.html

The Wiki page in particular might be a good start,
https://cwiki.apache.org/MAHOUT/spectral-clustering.html

Code is in public svn:
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/

Take a look at the run() method in
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java

...this takes in a textual representation of an affinity matrix,
constructs the laplacian representation of it and uses the existing
DistributedLanczosSolver/SVD (see
https://cwiki.apache.org/MAHOUT/dimensional-reduction.html ) and
K-Means components from elsewhere in Mahout. The code seems fairly
well commented.

Having said that, this code doesn't seem very happy currently. It
seems per https://issues.apache.org/jira/browse/MAHOUT-524 difficult
to get running in current Mahout trunk. I couldn't get it running yet.

If you can find a way in your term project to build on this work, that
would be fantastic...

Hope this helps,

cheers,

Dan

Mime
View raw message