crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom De Leu (JIRA)" <>
Subject [jira] [Created] (CRUNCH-622) From.avroFile fails if path not on default filesystem
Date Thu, 15 Sep 2016 20:28:22 GMT
Tom De Leu created CRUNCH-622:

             Summary: From.avroFile fails if path not on default filesystem
                 Key: CRUNCH-622
             Project: Crunch
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.14.0, 0.13.0
            Reporter: Tom De Leu
            Assignee: Josh Wills

    MemPipeline.getInstance().read(From.avroFile(new Path("s3:///something")));

Fails with: 
java.lang.IllegalArgumentException: Wrong FS: s3:/something, expected: file:///

	at org.apache.hadoop.fs.FileSystem.checkPath(
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(
	at org.apache.hadoop.fs.FileSystem.isFile(

I noticed this in the From class, method getSchemaFromPath:
      FileSystem fs = FileSystem.get(conf);

Shouldn't that be changed to this?

      FileSystem fs = path.getFileSystem(conf);

We ran into this in a usecase where the file was on a valid path on S3 but the Configuration
was pointing to HDFS, which I believe should just work.
After some googling, I also found CRUNCH-47 which seems related, but the patch there couldn't
fix the From/At/To helpers as they were introduced later...  

This message was sent by Atlassian JIRA

View raw message