nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shailendra Mudgal" <>
Subject OOM error during parsing with nekohtml
Date Mon, 16 Jul 2007 10:04:39 GMT
Hi All,

We are getting an OOM Exception during the processing of . We have also applied
Nutch-497 patch to our source code. But actually the error is coming during
the parse method.
Does anybody has any idea regarding this.  Here is the complete stacktrace :

java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.toUpperCase(
	at java.lang.String.toUpperCase(
	at org.cyberneko.html.filters.NamespaceBinder.bindNamespaces(
	at org.cyberneko.html.filters.NamespaceBinder.startElement(
	at org.cyberneko.html.HTMLTagBalancer.callStartElement(
	at org.cyberneko.html.HTMLTagBalancer.startElement(
	at org.cyberneko.html.HTMLTagBalancer.startElement(
	at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(
	at org.cyberneko.html.HTMLScanner$ContentScanner.scan(
	at org.cyberneko.html.HTMLScanner.scanDocument(
	at org.cyberneko.html.HTMLConfiguration.parse(
	at org.cyberneko.html.HTMLConfiguration.parse(
	at org.cyberneko.html.parsers.DOMFragmentParser.parse(
	at org.apache.nutch.parse.html.HtmlParser.parseNeko(
	at org.apache.nutch.parse.html.HtmlParser.parse(
	at org.apache.nutch.parse.html.HtmlParser.getParse(
	at org.apache.nutch.parse.ParseUtil.parse(
	at org.apache.hadoop.mapred.TaskTracker$Child.main(


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message