nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2281) Support non-default FileSystem
Date Thu, 06 Apr 2017 10:01:41 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958684#comment-15958684
] 

ASF GitHub Bot commented on NUTCH-2281:
---------------------------------------

sebastian-nagel closed pull request #119: NUTCH-2281 Support non-default FileSystem
URL: https://github.com/apache/nutch/pull/119
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Support non-default FileSystem
> ------------------------------
>
>                 Key: NUTCH-2281
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2281
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.12
>            Reporter: Sebastian Nagel
>             Fix For: 1.14
>
>
> If a path (input or output) does not belong to the configured default FileSystem various
Nutch tools may raise an exception like
> {noformat}
>   Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected:
hdfs://...
> {noformat}
> This is fixed by getting a reference to the FileSystem from the Path object
> {noformat}
>   FileSystem fs = path.getFileSystem(getConf());
> {noformat}
> instead of
> {noformat}
>   FileSystem fs = FileSystem.get(getConf());
> {noformat}
> A given path (e.g., {{s3a://...}}) may not belong to the default file system ({{hdfs://}}
or {{file://}} in local mode) and simple checks such as {{fs.exists(path)}} then will fail.
Cf. [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)],
and [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)]
vs. [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)]
which is called by [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29].
 
> Note that the FileSystem for input and output may be different, e.g., read from HDFS
and write to S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message