spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <>
Subject scala.Option vs Guava Optional in Spark Java APIs
Date Thu, 08 Aug 2013 19:07:04 GMT
I've noticed that Spark's Java API is inconsistent in how it represents
optional values. Some methods use scala.Option instances, while others use
Guava's Optional:

scala.Option is used in by methods like JavaSparkContext.getSparkHome(),
and the *outerJoin methods return a JavaPairRDD[K, (V, Option[W])].

Guava Optional is used in methods like Java*RDD.getCheckpointFile() and
JavaPairDStream.updateStateByKey() function arguments.

I'd like to remove this inconsistency and settle on a single class for
representing optional values in the Java API.

Both APIs are similar, but the Guava API seems nicer for Java users.  For
example, scala.Option.getOrElse(default) accepts a function, which isn't
really usable from Java.

If we switch to exclusively using Guava Optional, we'd have to convert join
results before turning them into JavaRDDs so that we have JavaPairRDD[K,
(V, Optional[W])].  I don't anticipate this being a large performance issue.

This would be a backwards-incompatible API change and 0.8 seems like the
easiest time to make it.  I'd appreciate any thoughts on whether I should
use Guava Optional everywhere.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message