tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Palsulich <tpalsul...@gmail.com>
Subject Re: [VOTE] Apache Tika 1.6 release candidate #1
Date Sun, 31 Aug 2014 20:11:12 GMT
Can we get TIKA-1404 in 1.6? Simple, but significant, fix.

Tyler
On Aug 31, 2014 3:54 PM, "Mattmann, Chris A (3980)" <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Ugh, sorry. Maven release plugin issues, going to have to clean some
> stuff up here. Don't mind me folks.
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> Date: Sunday, August 31, 2014 12:37 PM
> To: "dev@tika.apache.org" <dev@tika.apache.org>
> Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
>
> >OK RC #2 coming up shortly, just brought the branch up to date in
> >r1621623. Also cleaned up JIRA.
> >
> >Here goes..
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattmann@nasa.gov
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >-----Original Message-----
> >From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
> >Date: Thursday, July 31, 2014 11:29 AM
> >To: "dev@tika.apache.org" <dev@tika.apache.org>
> >Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
> >
> >>Guys, based on all the comments here, I am going to roll another
> >>RC #2 to address:
> >>
> >>- Tyler's comment about getting the MicrosoftTranslator fix incorporated.
> >>- Dave's Lingo24 API plugin for translate
> >>- Nick's POI updates
> >>
> >>I'll roll another RC #2 probably on Monday.
> >>
> >>Thanks!
> >>
> >>Cheers,
> >>Chris
> >>
> >>P.S. When I do, I'll diff trunk against the branch and then roll any
> >>trunk updates post branch to 1.6 into the new 1.6 RC #2.
> >>
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>Chris Mattmann, Ph.D.
> >>Chief Architect
> >>Instrument Software and Science Data Systems Section (398)
> >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>Office: 168-519, Mailstop: 168-527
> >>Email: chris.a.mattmann@nasa.gov
> >>WWW:  http://sunset.usc.edu/~mattmann/
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>Adjunct Associate Professor, Computer Science Department
> >>University of Southern California, Los Angeles, CA 90089 USA
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
> >>
> >>
> >>-----Original Message-----
> >>From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
> >>Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> >>Date: Monday, July 28, 2014 11:45 AM
> >>To: "dev@tika.apache.org" <dev@tika.apache.org>
> >>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
> >>
> >>>Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS
> >>>thread for a few weeks about getting 1.6 out. Do you have a patch right
> >>>now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2
> >>>to get it in. If you don't have a patch yet, would you mind terribly if
> >>>we pushed out 1.6, which already today has a ton of great updates, then
> >>>shortly thereafter rolled a 1.7 (or did so when you finished with
> >>>TIKA-1367)?
> >>>
> >>>Cheers,
> >>>Chris
> >>>
> >>>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>Chris Mattmann, Ph.D.
> >>>Chief Architect
> >>>Instrument Software and Science Data Systems Section (398)
> >>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>Office: 168-519, Mailstop: 168-527
> >>>Email: chris.a.mattmann@nasa.gov
> >>>WWW:  http://sunset.usc.edu/~mattmann/
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>Adjunct Associate Professor, Computer Science Department
> >>>University of Southern California, Los Angeles, CA 90089 USA
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>-----Original Message-----
> >>>From: Sergey Beryozkin <sberyozkin@gmail.com>
> >>>Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> >>>Date: Monday, July 28, 2014 11:38 AM
> >>>To: "dev@tika.apache.org" <dev@tika.apache.org>
> >>>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
> >>>
> >>>>+0 given that it appears that the tika-parsers dependencies
> >>>>documentation issue has been pushed away. I'm getting confused why.
> >>>>
> >>>>Thanks. Sergey
> >>>>
> >>>>[1] https://issues.apache.org/jira/browse/TIKA-1367
> >>>>
> >>>>On 28/07/14 17:16, Tyler Palsulich wrote:
> >>>>> +1
> >>>>>
> >>>>> OSX 10.9.3, Java 1.7
> >>>>>
> >>>>> Tyler
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B.
> >>>>><tallison@mitre.org>
> >>>>> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
> >>>>>> Windows 7, Java 1.7
> >>>>>>
> >>>>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of
10,000
> >>>>>>docs
> >>>>>> (all formats) plus all available msoffice-x files in govdocs1,
> >>>>>>yielding
> >>>>>> 10,413 docs.  There were several improvements in text extraction
for
> >>>>>>PDFs
> >>>>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1
pdf).
> >>>>>>
> >>>>>> There was one regression:
> >>>>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx
> >>>>>>
> >>>>>> Stacktrace:
> >>>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String
index
> >>>>>>out
> >>>>>>of
> >>>>>> range: -369073454
> >>>>>>          at java.lang.String.checkBounds(String.java:371)
> >>>>>>          at java.lang.String.<init>(String.java:415)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.ja
> >>>>>>v
> >>>>>>a
> >>>>>>:
> >>>>>>114)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:1
> >>>>>>6
> >>>>>>3
> >>>>>>)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObje
> >>>>>>c
> >>>>>>t
> >>>>>>(
> >>>>>>Ole10Native.java:91)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObje
> >>>>>>c
> >>>>>>t
> >>>>>>(
> >>>>>>Ole10Native.java:63)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleE
> >>>>>>m
> >>>>>>b
> >>>>>>e
> >>>>>>ddedOLE(AbstractOOXMLExtractor.java:250)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleE
> >>>>>>m
> >>>>>>b
> >>>>>>e
> >>>>>>ddedParts(AbstractOOXMLExtractor.java:199)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTM
> >>>>>>L
> >>>>>>(
> >>>>>>A
> >>>>>>bstractOOXMLExtractor.java:115)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OO
> >>>>>>X
> >>>>>>M
> >>>>>>L
> >>>>>>ExtractorFactory.java:112)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.
> >>>>>>j
> >>>>>>a
> >>>>>>v
> >>>>>>a:82)
> >>>>>>          at
> >>>>>>
> >>>>>>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243
> >>>>>>)
> >>>>>>
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Mattmann, Chris A (3980)
> >>>>>>[mailto:chris.a.mattmann@jpl.nasa.gov]
> >>>>>> Sent: Monday, July 28, 2014 12:22 AM
> >>>>>> To: dev@tika.apache.org
> >>>>>> Cc: user@tika.apache.org
> >>>>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1
> >>>>>>
> >>>>>> Hi Folks,
> >>>>>>
> >>>>>> A candidate for the Tika 1.6 release is available at:
> >>>>>>
> >>>>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/
> >>>>>>
> >>>>>>
> >>>>>> The release candidate is a zip archive of the sources in:
> >>>>>>
> >>>>>>      http://svn.apache.org/repos/asf/tika/tags/1.6/
> >>>>>>
> >>>>>> The SHA1 checksum of the archive is
> >>>>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3.
> >>>>>>
> >>>>>> A Maven staging repository is available at:
> >>>>>>
> >>>>>>
> >>>>>>
> https://repository.apache.org/content/repositories/orgapachetika-1003
> >>>>>>/
> >>>>>>
> >>>>>>
> >>>>>> Please vote on releasing this package as Apache Tika 1.6.
> >>>>>> The vote is open for the next 72 hours and passes if a majority
of
> >>>>>>at
> >>>>>> least three +1 Tika PMC votes are cast.
> >>>>>>
> >>>>>>      [ ] +1 Release this package as Apache Tika 1.6
> >>>>>>      [ ] -1 Do not release this package because҆
> >>>>>>
> >>>>>> Thank you!
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Chris
> >>>>>>
> >>>>>> P.S. Here is my +1!
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message