Can anyone help me with this problem?
Here is my problem:
I want to get the source code of the
hits I get using nutch crawler. I am not sure whether nutch stores the
content of a web page(i.e actual source code for web page) in the
crawled results. I am afraid if it does not!
If nutch stores these contents, do you have idea how can I retrieve
the contents using any nuch libraries? I have my eye on these classes:
NutchBean, Hit, HitDetails. May be I can find some method of these
classes that gives me contents of the page. I am being hopeless from this classes as no method gets the content of webpage.
Any kind of help is appreciated.