spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <robin.e...@xense.co.uk>
Subject Re: Research ideas using spark
Date Wed, 15 Jul 2015 16:13:16 GMT
Well said Will. I would add that you might want to investigate GraphChi which claims to be
able to run a number of large-scale graph processing tasks on a workstation much quicker than
a very large Hadoop cluster. It would be interesting to know how widely applicable the approach
GraphChi takes and what implications it has for parallel/distributed computing approaches.
A rich seam to mine indeed.

Robin
> On 15 Jul 2015, at 14:48, William Temperley <willtemperley@gmail.com> wrote:
> 
> There seems to be a bit of confusion here - the OP (doing the PhD) had the thread hijacked
by someone with a similar name asking a mundane question.
> 
> It would be a shame to send someone away so rudely, who may do valuable work on Spark.
> 
> Sashidar (not Sashid!) I'm personally interested in running graph algorithms for image
segmentation using MLib and Spark.  I've got many questions though - like is it even going
to give me a speed-up?  (http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html
<http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html>)
> 
> It's not obvious to me which classes of graph algorithms can be implemented correctly
and efficiently in a highly parallel manner.  There's tons of work to be done here, I'm sure.
Also, look at parallel geospatial algorithms - there's a lot of work being done on this.
> 
> Best, Will
> 
> 
> 
> On 15 July 2015 at 09:01, Vineel Yalamarthy <vineelyalamarthy@gmail.com <mailto:vineelyalamarthy@gmail.com>>
wrote:
> Hi Daniel
> 
> Well said
> 
> Regards 
> Vineel
> 
> 
> On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos <daniel.darabos@lynxanalytics.com <mailto:daniel.darabos@lynxanalytics.com>>
wrote:
> Hi Shahid,
> To be honest I think this question is better suited for Stack Overflow than for a PhD
thesis.
> 
> On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf <shahid@trialx.com <mailto:shahid@trialx.com>>
wrote:
> hi 
> 
> I have a 10 node cluster  i loaded the data onto hdfs, so the no. of partitions i get
is 9. I am running a spark application , it gets stuck on one of tasks, looking at the UI
it seems application is not using all nodes to do calculations. attached is the screen shot
of tasks, it seems tasks are put on each node more then once. looking at tasks 8 tasks get
completed under 7-8 minutes and one task takes around 30 minutes so causing the delay in results.

> 
> 
> On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao <raoshashidhar123@gmail.com <mailto:raoshashidhar123@gmail.com>>
wrote:
> Hi,
> 
> I am doing my PHD thesis on large scale machine learning e.g  Online learning, batch
and mini batch learning.
> 
> Could somebody help me with ideas especially in the context of Spark and to the above
learning methods. 
> 
> Some ideas like improvement to existing algorithms, implementing new features especially
the above learning methods and algorithms that have not been implemented etc.
> 
> If somebody could help me with some ideas it would really accelerate my work.
> 
> Plus few ideas on research papers regarding Spark or Mahout.
> 
> Thanks in advance.
> 
> Regards 
> 
> 
> 
> -- 
> with Regards
> Shahid Ashraf
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>
> 
> 


Mime
View raw message