nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2494) Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3
Date Thu, 18 Jan 2018 06:53:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330144#comment-16330144
] 

ASF GitHub Bot commented on NUTCH-2494:
---------------------------------------

ashrafulsust commented on issue #274: fix for NUTCH-2494 contributed by ashrafulsust
URL: https://github.com/apache/nutch/pull/274#issuecomment-358554599
 
 
   @sebastian-nagel  Thanks. I have Applied the Nutch Eclipse Code Formatting rules and update
the PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3
> ---------------------------------------------------------
>
>                 Key: NUTCH-2494
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2494
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 1.14
>         Environment: * AWS EMR Cluster
> * AWS S3
> * Hadoop 2.2.7
>            Reporter: Ashraful Islam
>            Priority: Major
>         Attachments: NUTCH-2494.patch
>
>
> We are using nutch 1.14 in AWS EMR Cluster (Hadoop 2.2.7).  trying to use S3 as main
storage. 
> We are using the below command.
> {code}
> bin/crawl -s s3://nutch-emr-cluster/test/crawl/urls s3://nutch-emr-cluster/test/crawl
1
> {code}
> Injector and Generator completed successfully without any error and data written perfectly
into S3. But in the Fetcher and Parser steps we are getting IllegalArgumentException
> Full stacktrace 
> {code:java}
> 18/01/11 07:16:52 ERROR fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException:
Wrong FS: s3://nutch-emr-cluster/test/crawl/segments/20180111071602/crawl_fetch, expected:
hdfs://ip-172-31-26-180.eu-west-1.compute.internal:8020
> 	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
> 	at org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:55)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> 	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
> 	at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:486)
> 	at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:521)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:495)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message