samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Barrueta <j...@stormpath.com>
Subject Samza YarnJobFactory support for https
Date Fri, 22 May 2015 03:03:21 GMT
Hi all,

Once we figure it out the problem we were able to easily come up with a
solution for this.

Basically, we want to be able to set the `yarn.package.path` property to
look for an artifact over `https`, when we did this we ran into this
exception:

Exception in thread "main" java.io.IOException: No FileSystem for scheme:
https
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

First we look at the actual Yarn Resource Manager and make sure it
supported the https file system, so after a while we looked at the
YarnJobFactory code and found out the current implementation.

 class YarnJobFactory extends StreamJobFactory {
  def getJob(config: Config) = {
    // TODO fix this. needed to support http package locations.
    val hConfig = new YarnConfiguration
    hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)

    new YarnJob(config, hConfig)
  }
}

And like I said, after this it was easy to fix the issue, we just created
our own YarnJobFactory

/**
 * YarnJobFactory is an implementation based on Samza's {@link
org.apache.samza.job.yarn.YarnJobFactory}
 * implementation.
 *
 * @since 0.1.0
 */
public class YarnJobFactory implements StreamJobFactory {

    @Override
    public StreamJob getJob(Config config) {

        Configuration yarnConfig = new YarnConfiguration();
        yarnConfig.set("fs.http.impl",
org.apache.samza.util.hadoop.HttpFileSystem.class.getName());
        yarnConfig.set("fs.https.impl",
org.apache.samza.util.hadoop.HttpFileSystem.class.getName());

        return new YarnJob(config, yarnConfig);
    }
}

This one supports both, schemes http and https, I noticed the comment for
the current implementation, is there a way I can contribute to enhance this
implementation, I'm thinking maybe the Samza configuration might specify
the schema and map to a FileSystem instance.

Best,

Jose Luis

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message