lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramakrishna <ramakrishna...@dioxe.com>
Subject RE: [ANNOUNCE] Web Crawler
Date Tue, 16 Jul 2013 04:03:40 GMT
so, There is no way to crawl if they blocked their web-sites to crawl ? I've
one idea, But seems little bit foolish(not works/I've to Modify whole
architecture) still I'm telling, If I use Html-Parser(Jsoup) Instead of
fetcher then? Anyhow Html-parser easily takes all contents of the
web-page.Can i do this.. I think rest of the
parts(segments,updater,indexer,parser) I've to write all these things, I
think it'll(Html-parser) not work with the already existing (parts) if i
replace fetcher with Html-parser.



--
View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078229.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message