tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject including refactored docs from govdocs1 in test suite
Date Mon, 30 Mar 2015 13:15:56 GMT

  As part of TIKA-1512, I found that I can delete all of the contents, including the metadata,
except for one hyperlink in two documents from govdocs1 and still get the proper behavior
-- fail before fix, work after fix.

  These documents are in the public domain.

  Is it ok to include these modified documents in our test suite or should I avoid inclusion?

  Happy to avoid inclusion for the sake of a quick release of 1.8 and then we have time to
discuss/determine way ahead... unless the answer is obvious.



-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, March 30, 2015 7:03 AM
To: dev@tika.apache.org
Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1

Unless there are objections, I'd like these to be resolved before 1.8:

TIKA-1584 -- I'll fix
TIKA-1575 -- Resolved by Konstantin Gribov (thank you!)
TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but I'll leave this
open and do some more digging to see if we need to open a ticket at the POI level
TIKA-1511 -- I'll remove "provided" for xerial

TIKA-1549 -- We should thank Toke Eskildsen in CHANGES.txt, no?

I'll have these fixes completed by noon EDT.  Should I run against govdocs1 before or after
the RC?

My last build of Tika app (a few days ago) ballooned to ~43MB, and that's before I add ~3MB
for xerial.  Tika server is now ~48MB.  As of my last build, we are still including ~4MB of
pdfs (README.NLDAS1.pdf and README.NLDAS2.pdf) from the GRIB(?) parser in the tika-app and
tika-server jars.



-----Original Message-----
From: Tyler Palsulich [mailto:tpalsulich@gmail.com] 
Sent: Sunday, March 29, 2015 9:13 AM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Tika 1.8 or 1.7.1

Once TIKA-1584 and TIKA-1575 are resolved, I'll work up an RC (unless
something else pops up).

Thank you everyone.

On Mar 29, 2015 4:43 AM, "Hong-Thai Nguyen" <thaichat04@gmail.com> wrote:

> +1 for 1.8
> Hong-Thai
> > On 28 Mar 2015, at 16:01, Tyler Palsulich <tpalsulich@apache.org> wrote:
> >
> > Hi Folks,
> >
> > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to
> > release a new version of Tika. I'll volunteer to be the release manager
> > again.
> >
> > Should we release this as 1.8 or 1.7.1?
> >
> > Does anyone have any last minute issues they'd like to finish and see in
> > Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> > TIKA-1586). Any others?
> >
> > Have a good weekend,
> > Tyler
View raw message