tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Towards 1.0
Date Fri, 20 May 2011 16:49:08 GMT
Hi Jukka,

A 1.0 release sounds like a great idea.

On my list of things I'd like to straighten out by then:

1. There are still a number of HTML parser issues that I'd like to resolve first.

Many of these are assigned to me :) Hoping to have some free time after mid-June.

2. I've got vague concern about the current state of running Tika with subsets of all parsers.

This still seems fragile.

3. Language detection is still pretty lame.

Same as with HTML parsing, many of these are assigned to me.

Hoping I've got time to take a run at using LLR to improve accuracy and performance.

-- Ken

On May 20, 2011, at 9:01am, Jukka Zitting wrote:

> Hi,
> It's a few months since 0.9 and our Tika in Action book is soon ready
> for print, so I think it's good time to start planning for the 1.0
> release.
> There are a few odds and ends that I'd still like to sort out in the
> trunk, but overall I think we're in a pretty much ready for the switch
> from 0.x to 1.x.
> One major issue to be decided is whether we want to follow up with the
> earlier intention of dropping deprecated functionality (like the
> three-argument parse() method) before the 1.0 release. I think we
> should do that and also make some other backwards-incompatible
> cleanups while we're at it. That way we'll have less old baggage to
> carry as we evolve through the 1.x release cycle.
> Another thing to think about is whether we want to do a formal Apache
> press release about Tika reaching 1.0 status.
> BR,
> Jukka Zitting

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

View raw message