spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Very wierd behavior
Date Tue, 22 Jul 2014 20:23:09 GMT
Is the first() being computed locally on the driver program? Maybe it's to hard to compute
with the memory, etc available there. Take a look at the driver's log and see whether it has
the message "Computing the requested partition locally". 

Matei

On Jul 22, 2014, at 12:04 PM, Nathan Kronenfeld <nkronenfeld@oculusinfo.com> wrote:

> I was wondering if anyone could provide an explanation for the behavior I'm seeing.
> 
> I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG with 2 shuffles,
not empty, not even terribly big - small enough that some partitions could be empty.
> 
> When I run foo.first, I get workers disconnecting, and applications die
> When I run foo.mapPartitions.saveAsHadoopDataset, it works fine.
> 
> Anyone got an explanation for why that might be?
> 
>                     -Thanks, Nathan
> 
> 
> -- 
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenfeld@oculusinfo.com


Mime
View raw message