spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Klos <michal.klo...@gmail.com>
Subject Re: Define exception handling on lazy elements?
Date Wed, 11 Mar 2015 14:49:17 GMT
Well I'm thinking that this RDD would fail to build in a specific way...
 different from the subsequent code (e.g. s3 access denied or timeout on
connecting to a database)

So for example, define the RDD failure handling on the RDD, define the
action failure handling on the action? Does this make sense.. otherwise...
on that first action, we have to handle exceptions for all of the lazy
elements that preceded it.. and that could be a lot of stuff.

If the RDD failure handling code is defined with the RDD, it just seems
cleaner because it's right next to its element. Not to mention, we believe
it would be easier to import it into multiple spark jobs without a lot of
copy pasta

m

On Wed, Mar 11, 2015 at 10:45 AM, Sean Owen <sowen@cloudera.com> wrote:

> Hm, but you already only have to define it in one place, rather than
> on each transformation. I thought you wanted exception handling at
> each transformation?
>
> Or do you want it once for all actions? you can enclose all actions in
> a try-catch block, I suppose, to write exception handling code once.
> You can easily write a Scala construct that takes a function and logs
> exceptions it throws, and the function you pass can invoke an RDD
> action. So you can refactor that way too.
>
> On Wed, Mar 11, 2015 at 2:39 PM, Michal Klos <michal.klos81@gmail.com>
> wrote:
> > Is there a way to have the exception handling go lazily along with the
> > definition?
> >
> > e.g... we define it on the RDD but then our exception handling code gets
> > triggered on that first action... without us having to define it on the
> > first action? (e.g. that RDD code is boilerplate and we want to just
> have it
> > in many many projects)
> >
> > m
> >
> > On Wed, Mar 11, 2015 at 10:08 AM, Sean Owen <sowen@cloudera.com> wrote:
> >>
> >> Handling exceptions this way means handling errors on the driver side,
> >> which may or may not be what you want. You can also write functions
> >> with exception handling inside, which could make more sense in some
> >> cases (like, to ignore bad records or count them or something).
> >>
> >> If you want to handle errors at every step on the driver side, you
> >> have to force RDDs to materialize to see if they "work". You can do
> >> that with .count() or .take(1).length > 0. But to avoid recomputing
> >> the RDD then, it needs to be cached. So there is a big non-trivial
> >> overhead to approaching it this way.
> >>
> >> If you go this way, consider materializing only a few key RDDs in your
> >> flow, not every one.
> >>
> >> The most natural thing is indeed to handle exceptions where the action
> >> occurs.
> >>
> >>
> >> On Wed, Mar 11, 2015 at 1:51 PM, Michal Klos <michal.klos81@gmail.com>
> >> wrote:
> >> > Hi Spark Community,
> >> >
> >> > We would like to define exception handling behavior on RDD
> instantiation
> >> > /
> >> > build. Since the RDD is lazily evaluated, it seems like we are forced
> to
> >> > put
> >> > all exception handling in the first action call?
> >> >
> >> > This is an example of something that would be nice:
> >> >
> >> > def myRDD = {
> >> > Try {
> >> > val rdd = sc.textFile(...)
> >> > } match {
> >> > Failure(e) => Handle ...
> >> > }
> >> > }
> >> >
> >> > myRDD.reduceByKey(...) //don't need to worry about that exception here
> >> >
> >> > The reason being that we want to try to avoid having to copy paste
> >> > exception
> >> > handling boilerplate on every first action. We would love to define
> this
> >> > once somewhere for the RDD build code and just re-use.
> >> >
> >> > Is there a best practice for this? Are we missing something here?
> >> >
> >> > thanks,
> >> > Michal
> >
> >
>

Mime
View raw message