nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: 1.0 Release?
Date Thu, 20 Nov 2008 16:51:55 GMT
Dennis Kubes wrote:
> What does everybody think of trying to do a Nutch 1.0 release in the 
> next couple of weeks.  I have 8 different patches that are ready to be 
> committed including:
> 
> 1) NUTCH-647: Resolve URLs tool
> 2) NUTCH-635: LinkAnalysis Tool for Nutch
> 3) NUTCH-646: New Indexing framework for Nutch
> 4) NUTCH-594: Serve Nutch search results in XML and JSON
> 5) Custom fields on index and plugins
> 6) Upgrade Nutch to the most recent Hadoop version (18.2).
> 7) Upgrade Nutch to the most recent Lucene version (2.4).
> 8) Analysis plugins and improvments to analyzer factory for multiple 
> languages per analysis plugin.  Language identifier.
> 
> I am going to try to get those posted in the next couple of days and 
> committed in the next week.  Are there other major improvements we want 
> to put in before trying to do a 1.0 release for Nutch?  Thoughts and 
> suggestions?

A few recently opened ones that should be easy to fix:

NUTCH-661   	 errors when the uri contains space characters
NUTCH-657   	 Estonian N-gram profile has wrong name
NUTCH-652   	 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate 
fetch interval correctly
NUTCH-644   	 RTF parser doesn't compile anymore
NUTCH-643   	 ClassCastException in PdfParser on encrypted PDF with 
empty password
NUTCH-636   	 Http client plug-in https doesn't work on IBM JRE
NUTCH-631   	 MoreIndexingFilter fails with NoSuchElementException
NUTCH-626   	 fetcher2 breaks out the domain with 
db.ignore.external.links set at cross domain redirects
NUTCH-566   	 Sun's URL class has bug in creation of relative query URLs
NUTCH-542   	 Null Pointer Exception on getSummary when segment no 
longer exists
NUTCH-531   	 Pages with no ContentType cause a Null Pointer exception

And of course this one:

NUTCH-442   	 Integrate Solr/Nutch


We should also review all other open issues marked as Blocker / Major, 
especially those with patches, and take some action - either fix them, 
or won't fix 'em, or postpone to the next release (the single Blocker 
issue should be fixed).


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message