nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <>
Subject [jira] [Created] (NUTCH-2281) Support non-default FileSystem
Date Fri, 17 Jun 2016 13:32:05 GMT
Sebastian Nagel created NUTCH-2281:

             Summary: Support non-default FileSystem
                 Key: NUTCH-2281
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.12
            Reporter: Sebastian Nagel
             Fix For: 1.13

If a path (input or output) does not belong to the configured default FileSystem various Nutch
tools may raise an exception like
  Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: hdfs://...

This is fixed by getting a reference to the FileSystem from the Path object
  FileSystem fs = path.getFileSystem(getConf());
instead of
  FileSystem fs = FileSystem.get(getConf());
A given path (e.g., {{s3a://...}}) may not belong to the default file system ({{hdfs://}}
or {{file://}} in local mode) and simple checks such as {{fs.exists(path)}} then will fail.
Cf. [FileSystem.checkPath(path)|],
and [FileSystem.get(conf)|]
vs. [FileSystem.get(URI,conf)|,%20org.apache.hadoop.conf.Configuration)]
which is called by [Path.getFileSystem(conf)|].
Note that the FileSystem for input and output may be different, e.g., read from HDFS and write
to S3.

This message was sent by Atlassian JIRA

View raw message