tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <ogjunk-t...@yahoo.com>
Subject Re: Boilerpipe is nice, but what about readability?
Date Sun, 02 Jan 2011 18:55:50 GMT
Somehow this nice offer didn't seem to attract any responses - 
http://search-lucene.com/m/ZTMKyJXNR92

+1 for this patch.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Benson Margulies <bimargulies@gmail.com>
> To: dev@tika.apache.org
> Sent: Thu, November 4, 2010 9:02:10 AM
> Subject: Boilerpipe is nice, but what about readability?
> 
> I just coded a Java port of the arclabs 'readability' javascript code,
> which  has a very strong reputation as a device for grabbing the useful
> content from  newsy web pages.
> 
> I could contribute it to Tika, if (a) you wanted it, and  (b) there was
> some reasonable way to decide or configure which one to  use.
> 

Mime
View raw message