nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shailendra Mudgal" <mudgal.shailen...@gmail.com>
Subject OOM error during parsing with nekohtml
Date Mon, 16 Jul 2007 10:04:39 GMT
Hi All,

We are getting an OOM Exception during the processing of
http://www.fotofinity.com/cgi-bin/homepages.cgi . We have also applied
Nutch-497 patch to our source code. But actually the error is coming during
the parse method.
Does anybody has any idea regarding this.  Here is the complete stacktrace :

java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.toUpperCase(String.java:2637)
	at java.lang.String.toUpperCase(String.java:2660)
	at org.cyberneko.html.filters.NamespaceBinder.bindNamespaces(NamespaceBinder.java:443)
	at org.cyberneko.html.filters.NamespaceBinder.startElement(NamespaceBinder.java:252)
	at org.cyberneko.html.HTMLTagBalancer.callStartElement(HTMLTagBalancer.java:1009)
	at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:639)
	at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:646)
	at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2343)
	at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1820)
	at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789)
	at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
	at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
	at org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
	at org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:265)
	at org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:229)
	at org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:168)
	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:84)
	at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:75)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)


Regards,
Shailendra

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message