incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: can droid crawl this site
Date Mon, 28 Dec 2009 16:17:41 GMT
Hi Ray,

I don't see why Nutch would crash on this link - just does two  
redirects, returns what looks like standard HTML, etc.

Given the above, Droids, Nutch, etc. should all work fine.

-- Ken

On Dec 28, 2009, at 6:11am, ray lukas wrote:

> Well this link crashes Nutch (redirection problem I would guess but  
> have not
> proved it).. I really just need to get my hands on the HTML and I  
> will feed
> it into my parsing and indexing systems. For this I just need a  
> crawling
> mechanism that will give me the HTML for these types of links. Nutch  
> is,
> wonderful but for this overkill and is unable t crawl these links,  
> so I am
> looking at Droid as a solution.
>
> I am not archiving anything, I am directly using the html in my java
> application. Can Droid crawl this site and return me the correct  
> html. Could
> someone try it for me on their droid installation and let me know?
>
>
>
> Thanks guys..
>
>
>
> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=5060691
> &xmcid=-12026&entry_point_id=3079198>
>
>
>
>
>
> On 07/12/2009, at 16:38, Lukas, Ray wrote:
>
>> I was having a problem with Nutch and would like to see if Droids can
>
>> help. Could someone just try crawling this web page and tell me if  
>> this
>
>> works on Droids. I need to be able to crawl these web pages and can  
>> not
>
>> seem to do so with Nutch. Would you plug this into your  
>> installation and
>
>> see if Droids can successfully crawl this.
>
>
>
> What is your problem with Nutch with this site? What are you trying to
> archive?
>
> salu2
>
>
>
>> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=506
>
>> 0691&xmcid=-12026&entry_point_id=3079198
>
>> if so I will start switching over the project to Droid, if not then I
>
>> have to keep looking for something that will work.. any advice  
>> would be
>
>> really helpful.
>
>> I don't know if this is the correct list.. sorry
>
>> thanks so much Ray
>
>>
>
>
>

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
View raw message