gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Nutch crawler issue with more depth value
Date Thu, 24 Jan 2019 08:46:26 GMT
Hi there,

Can I ask you which backend you are using?
If it is HBase, then you have update the max KeyValue size configuration.
This configuration is on the hbase-site.xml file which by default is 10MB

<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>10485760</value>
</property>

I am copying the Gora mailing list as well, as they might have other
alternative solutions as well.


Best,

Renato M.

El mié., 23 ene. 2019 a las 19:37, Gomathi Palanisamy (<
gpalanisamy@worldbankgroup.org>) escribió:
>
> Hi,
>
> we are using Nutch 2.3.1-src version. Executing crawl command with 200
depth. but after few iterations, Fetching fails with the below mentioned
runtime exception.
>
> java.lang.RuntimeException: java.lang.IllegalArgumentException: KeyValue
size too large
> Exception at GoraRecordWriter.class while writing to datastore: KeyValue
size too large
>
> Crawl command:
>
> /Data/Apache/apache-nutch-2.3.1/runtime/local/bin/crawl
/Data/Apache/apache-nutch-2.3.1/runtime/local/urls crawl-nutch
http://localhost:9200/test/ 200
>
> Any suggestions?
>
> Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message