spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "naveen.marri" <>
Subject Implementation of random algorithm walk in spark
Date Mon, 29 Feb 2016 13:15:49 GMT

I'm new to spark, I'm trying to compute similarity between users/products.
I've a huge table which I can't do a self join with the cluster I have.

I'm trying to implement do self join using random walk methodology which
will approximately give the results. The table is a bipartite graph with 2

take any element(t1) in the first column in random
picking the corresponding element(t2) in for the element(t1) in the graph.
lookup for possible elements in the graph for t2 in random say t3
create a edge between t1 and t3
Iterate it in the order of atleat n*n so that results will be approximate

Is spark a suitable environment to do this?
I've coded logic for picking elements in random but facing issue when
building graph
Should consider graphx?
Any help is highly appreciated.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message