spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mendelson, Assaf" <>
Subject how does create dataframe from scala collection handle executor failure?
Date Tue, 22 Nov 2016 16:34:22 GMT
Lets say I have loop that reads some data from somewhere, stores it in a collection and creates
a dataframe from it. Then an executor containing part of the dataframe dies. How does spark
handle it?

For example:
val dfSeq = for {
                                      I <- 0 to 1000
                                     V <- 0 to 1000000
                             } yield sc.parallelize(V).toDF

Then I would do something with the dataframes (e.g. union them and do some calculation).

What would happen if an executor, holding one of the partitions for one of the dataframes
Does this mean I would lose the data? Or would spark save the original data to recreate it?
If it saves the original data, where would it save it (the whole data could be very large,
larger than driver memory).

If it loses the data, is there a way to give it a function or something to recreate it (e.g.
V is read from somewhere and I can reread it if I just know what to read).


View raw message