nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2706) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob
Date Fri, 03 May 2019 21:32:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832848#comment-16832848
] 

ASF GitHub Bot commented on NUTCH-2706:
---------------------------------------

sebastian-nagel commented on pull request #453: NUTCH-2706 NUTCH-2650 -addBinaryContent -base64
flag can cause "Strin…
URL: https://github.com/apache/nutch/pull/453
 
 
   …g length must be a multiple of four" error in IndexingJob
   
   - use conversion to base64 encoding which works for various versions of the commons-codec
libary (1.4 and 1.11) and does never return a chunked string
   
   Successfully tested with options `-addBinaryContent -base64` both in local and pseudo-distributed
mode.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> -addBinaryContent flag can cause "String length must be a multiple of four" error in
IndexingJob
> ------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2706
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2706
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.15
>         Environment: Solr:7.3.1
> Nutch: 1.15
>            Reporter: Prajeeth Emanuel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.16
>
>
> When using the following crawling command:
> bin/crawl -i -s /user/xxxx/seed /user/xxxx/test-crawl-8 3 
> with the index command in the crawl script with -addBinaryContent and -base64.
> The error I get is:
> 2019-04-04 04:10:43,702 svnNumber= clientHw="" userId="" actionKpi="" [main] WARN org.apache.hadoop.mapred.YarnChild
- Exception running child : org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: ERROR:
[doc=73ad5e05e49054efa258e7c54ae9b9ee] Error adding field 'binaryContent'='PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5nPSJlbiI+DQo8aGVhZD4NCgk8bWV0YSBodHRwLWVx...
>  
> ...
>  
> msg=String length must be a multiple of four. at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:559)
at  at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:251) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:47)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>  
> I see this https://issues.apache.org/jira/browse/NUTCH-2186 as well. Opening a new
ticket as mentioned in the comments because I have a different environment.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message