spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rsearle <>
Subject Spark updateStateByKey fails with class leak when using case classes - resend
Date Thu, 07 May 2015 02:48:56 GMT

<<Apologies for the repeat. The first was rejected by the submission

I created a simple Spark streaming program using updateStateByKey.
The domain is represented by case classes for clarity, type safety, etc.

Spark job continuously loads new classes, which are removed by GC to
a relatively constant level of active classes instances. The total memory 
footprint grows and the throughput slows, until the job fails. The failure
is generally triggered when
the processing cannot no longer keep up with the rate limited input. 

The offending classes are of the form

This failure does not occur if the job is rewritten to use only Scala
classes: tuples and primitives.

The following github project contains the test code and more details

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message