spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paja <>
Subject Join stucks in the last stage step
Date Wed, 07 Jan 2015 17:27:54 GMT

  I have problem with join of two tables via Spark - I have tried to do it
via Spark SQL and API but no progress so far. I have basicaly two tables
ACCONTS - 16 mio records and TRANSACTIONS 2,5 billion records. When I try to
join the tables (please see code) the job stucks in the last stage for very
long (please see console output).  And after eg 2h it writes to the output a
weird exception like
/org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0/

I have tried several strategies - repartitioning of RDDs, broadcast the
smaller one, but result is always same
Have sombody idea what happens? 

Source Code.
Console  AccJoin_0.html

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message