spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From naveenkumarmarri <naveenkumarmarri6...@gmail.com>
Subject Implementation of random algorithm walk in spark
Date Mon, 29 Feb 2016 12:56:58 GMT
Hi,

I'm new to spark, I'm trying to compute similarity between users/products.
I've a huge table which I can't do a self join with the cluster I have.

I'm trying to implement do self join using random walk methodology which
will approximately give the results. The table is a bipartite graph with 2
columns

Idea:

   - take any element(t1) in the first column in random
   - picking the corresponding element(t2) in for the element(t1) in the
   graph.
   - lookup for possible elements in the graph for t2 in random say t3
   - create a edge between t1 and t3
   - Iterate it in the order of atleat n*n so that results will be
   approximate

Questions


   - Is spark a suitable environment to do this?
   - I've coded logic for picking elements in random but facing issue when
   building graph
   - Should consider graphx?

Any help is highly appreciated.

Regards,
Naveen

Mime
View raw message