nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taichi Ho <heyuehengtai...@gmail.com>
Subject Integrating Selenium with Nutch
Date Sat, 03 Oct 2015 03:53:41 GMT
Hi, all.

I have been experimenting with Selenium and Nutch following the link:
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium

I have been able to post a form using my custom handler. But the url
redirected after posting the form doesn't seem to enter the crawldb of
nutch. Is it the expected bahavior?

Also, it seems really slow to open and close firefox it for each url it
crawled. Is it possible to do this with multiple threads? I googled and
didn't find any promising answers. Do we have any workarounds?

Thank you all.

Mime
View raw message