nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3
Date Wed, 17 Jan 2018 10:27:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328597#comment-16328597
] 

ASF GitHub Bot commented on NUTCH-2494:
---------------------------------------

sebastian-nagel commented on issue #274: fix for NUTCH-2494 contributed by ashrafulsust
URL: https://github.com/apache/nutch/pull/274#issuecomment-358261794
 
 
   +1 Good catch. Solution looks good! Follows the current definition of [checkOutputSpecs(...)](http://hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/mapred/OutputFormat.html#checkOutputSpecs-org.apache.hadoop.fs.FileSystem-org.apache.hadoop.mapred.JobConf-).
   
   Could you apply the [Nutch Eclipse Code Formatting rules](https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml)
and update the PR. If not let us know. Thanks!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3
> ---------------------------------------------------------
>
>                 Key: NUTCH-2494
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2494
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 1.14
>         Environment: * AWS EMR Cluster
> * AWS S3
> * Hadoop 2.2.7
>            Reporter: Ashraful Islam
>            Priority: Major
>         Attachments: NUTCH-2494.patch
>
>
> We are using nutch 1.14 in AWS EMR Cluster (Hadoop 2.2.7).  trying to use S3 as main
storage. 
> We are using the below command.
> {code}
> bin/crawl -s s3://nutch-emr-cluster/test/crawl/urls s3://nutch-emr-cluster/test/crawl
1
> {code}
> Injector and Generator completed successfully without any error and data written perfectly
into S3. But in the Fetcher and Parser steps we are getting IllegalArgumentException
> Full stacktrace 
> {code:java}
> 18/01/11 07:16:52 ERROR fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException:
Wrong FS: s3://nutch-emr-cluster/test/crawl/segments/20180111071602/crawl_fetch, expected:
hdfs://ip-172-31-26-180.eu-west-1.compute.internal:8020
> 	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
> 	at org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:55)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
> 	at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:486)
> 	at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:521)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:495)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message