spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pwendell <...@git.apache.org>
Subject [GitHub] incubator-spark pull request: Adding an option to persist Spark RD...
Date Fri, 14 Feb 2014 05:47:13 GMT
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/468#issuecomment-35058261
  
    Hey @RongGu I did a pretty thorough review - thanks for adding this!
    
    In the code I put a bunch of lower level comments. Here is some high level feedback:
    
    1. Tests - it would be great if this had more unit tests to verify pieces of functionality.
For instance, creating the temporary directories has a bunch of logic that could be tested.
It also seems like it would be easy to mock the tachyon client and test interactions with
Tachyon. We are trying to move towards better test coverage overall and in this case there
are a lot of easily testable components.
    2. Documentation - there isn't much documentation here. At a minimum the new config options
should be documented (spark.tachyonstore.dir and spark.tachyonmaster.address) in configuration.md.
I would document tachyon as an "experimental" storage level on this page (scala-programming-guide.html).
I'd also add a note on this page that says there is currently experimental support for an
off-process cache (cluster-overview.html). If this is documented then people will use it!
    3. The Tachyon storage level should also be added to Java and Python like the other storage
levels.
    4. Style - I caught a bunch of style errors, but it would be good to fix things like extra
spaces in and comments that don't put a space after the `//`.
    5. The capitalization of tachyon/Tachyon is inconsistent - per @alig let's just keep it
uppercase always: "Tachyon"
    6. Update the maven build to include Tachyon as well.
    7. The amount of recursive dependencies for tachyon is troubling. It would be good if
Tachyon provided a "tachyon-client" artifact which only depended on things needed to talk
to Tachyon. That would probably mean restructuring the build of tachyon. It might also make
sense to mark Tachyon as "provided" in the Spark build and ask people to locally include Tachyon
when running Spark applications. That way the Spark assembly won't include Tachyon so it's
dependencies won't get in the way. I haven't come to a full conclusion on this one yet...


Mime
View raw message