nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2706) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob
Date Fri, 24 May 2019 13:38:00 GMT


ASF GitHub Bot commented on NUTCH-2706:

sebastian-nagel commented on pull request #453: NUTCH-2706 NUTCH-2650 -addBinaryContent -base64
flag can cause "Strin…
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> -addBinaryContent flag can cause "String length must be a multiple of four" error in
> ------------------------------------------------------------------------------------------------
>                 Key: NUTCH-2706
>                 URL:
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.15
>         Environment: Solr:7.3.1
> Nutch: 1.15
>            Reporter: Prajeeth Emanuel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.16
> When using the following crawling command:
> bin/crawl -i -s /user/xxxx/seed /user/xxxx/test-crawl-8 3 
> with the index command in the crawl script with -addBinaryContent and -base64.
> The error I get is:
> 2019-04-04 04:10:43,702 svnNumber= clientHw="" userId="" actionKpi="" [main] WARN org.apache.hadoop.mapred.YarnChild
- Exception running child : org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: ERROR:
[doc=73ad5e05e49054efa258e7c54ae9b9ee] Error adding field 'binaryContent'='PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5nPSJlbiI+DQo8aGVhZD4NCgk8bWV0YSBodHRwLWVx...
> ...
> msg=String length must be a multiple of four. at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(
at  at org.apache.nutch.indexer.IndexWriters.commit( at org.apache.nutch.indexer.IndexerOutputFormat$1.close(
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(
at org.apache.hadoop.mapred.ReduceTask.runNewReducer( at
at org.apache.hadoop.mapred.YarnChild$ at
Method) at at
at org.apache.hadoop.mapred.YarnChild.main(
> I see this as well. Opening a new
ticket as mentioned in the comments because I have a different environment.

This message was sent by Atlassian JIRA

View raw message