tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject [ANNOUNCEMENT] 0.3 release of crawler-commons
Date Fri, 11 Oct 2013 18:20:22 GMT
Hi,

Just to let you know that we have just release the version 0.3 of
crawler-commons. Crawler-commons is a set of reusable Java components that
implement functionality common to any web crawler. These components benefit
from collaboration among various existing web crawler projects, and reduce
duplication of effort. The main components are parsers for robots.txt,
sitemap files, domain utilities and fetchers.

Crawler-commons is used in Bixo and Apache Nutch for parsing robots.txt
files.

 *Project* -> https://code.google.com/p/crawler-commons/

 *Release notes* ->
http://crawler-commons.googlecode.com/svn/tags/crawler-commons-0.3/CHANGES.txt

 *Info about artifacts* ->
http://search.maven.org/#artifactdetails|com.google.code.crawler-commons|crawler-commons|0.3|jar

Thanks!

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message