tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: 1.7 release?
Date Mon, 27 Oct 2014 19:30:02 GMT
Sounds good.  As long as the default behavior remains the same, I'm happy.  I'm going to play
with a combination of your patch and Tyler's and see what the ramifications are for embedded
docs.

To confirm, the OCR integration is fantastic.  Thank you and Tyler!


Best,

           Tim

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Friday, October 24, 2014 5:36 PM
To: dev@tika.apache.org
Subject: Re: 1.7 release?

Hey Tim,

What do you think about my existing patch for 1445? For example to
just call all the parsers? I thought I was seeing behavior that was
slow because of that, but it turned out to be Tesseract and my machine
at the time?

I think my patch for 1445 may be enough, and we should get the metadata
I think? Thoughts?

I honestly think we need to deliver Tesseract in 1.7. We're close. I'll
even take it upon myself to try and experiment with the idea of multiple
parsers being called. I think a simple solution to the metadata key
conflict issue is simply to have a policy to add values (by default) and
replace if a property is set in ParseContext. Some simple updates to
CompositeParser would allow this.

Thoughts?

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Allison>, "Timothy B." <tallison@mitre.org>
Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
Date: Friday, October 24, 2014 at 2:24 PM
To: "dev@tika.apache.org" <dev@tika.apache.org>
Subject: RE: 1.7 release?

>Sorry for coming late to the game on the implications of TIKA-1445.  I
>don't want to hold up the release of 1.7.
>
>However, would it be possible to return to the legacy default behavior of
>extracting metadata from images?
>
>We can then document on the OCR parser page on the wiki that you need to
>install Tesseract _and_ make a change in the parser/mime config file. If
>you want this new capability, it will take a small bit of work until we
>solve TIKA-1445.
>
>I worry that the current behavior of 1.7 would be surprising to most
>non-dev users (well, even to at least one dev :) ).
>
>Cheers,
>  
>          Tim
>
>________________________________________
>From: Oleg Tikhonov [olegtikhonov@gmail.com]
>Sent: Friday, October 24, 2014 2:24 PM
>To: dev@tika.apache.org
>Subject: Re: 1.7 release?
>
>Hi Tyler,
>don't mention.
>
>Cheers,
>Oleg
>On Oct 24, 2014 8:02 PM, "Tyler Palsulich" <tpalsulich@gmail.com> wrote:
>
>> Thank you for the help, Oleg! I just resolved TIKA-1422. So, are there
>>any
>> other issues anyone would like to resolve before a new release?
>>
>> Thanks,
>> Tyler
>>
>> On Tue, Oct 21, 2014 at 2:42 AM, Oleg Tikhonov <olegtikhonov@gmail.com>
>> wrote:
>>
>> > Sorry!!!
>> >
>> > On Tue, Oct 21, 2014 at 9:37 AM, Mattmann, Chris A (3980) <
>> > chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> > > Thanks Oleg, will try tomorrow for me Los angeles time!
>> > >
>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > Chris Mattmann, Ph.D.
>> > > Chief Architect
>> > > Instrument Software and Science Data Systems Section (398)
>> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > Office: 168-519, Mailstop: 168-527
>> > > Email: chris.a.mattmann@nasa.gov
>> > > WWW:  http://sunset.usc.edu/~mattmann/
>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > Adjunct Associate Professor, Computer Science Department
>> > > University of Southern California, Los Angeles, CA 90089 USA
>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Oleg Tikhonov <oleg@apache.org>
>> > > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > Date: Monday, October 20, 2014 at 11:20 PM
>> > > To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > Subject: Re: 1.7 release?
>> > >
>> > > >Please take a try with newest patch.
>> > > >Cheers,
>> > > >Oleg
>> > > >
>> > > >On Tue, Oct 21, 2014 at 9:08 AM, Oleg Tikhonov <
>> olegtikhonov@gmail.com>
>> > > >wrote:
>> > > >
>> > > >> Taken. Thanks. in progress ...
>> > > >>
>> > > >> On Tue, Oct 21, 2014 at 8:54 AM, Mattmann, Chris A (3980) <
>> > > >> chris.a.mattmann@jpl.nasa.gov> wrote:
>> > > >>
>> > > >>> Trunk is the current checkout/branch:
>> > > >>>
>> > > >>> http://svn.apache.org/repos/asf/tika/trunk
>> > > >>>
>> > > >>>
>> > > >>> 
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>> Chris Mattmann, Ph.D.
>> > > >>> Chief Architect
>> > > >>> Instrument Software and Science Data Systems Section (398)
>> > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > >>> Office: 168-519, Mailstop: 168-527
>> > > >>> Email: chris.a.mattmann@nasa.gov
>> > > >>> WWW:  http://sunset.usc.edu/~mattmann/
>> > > >>> 
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>> Adjunct Associate Professor, Computer Science Department
>> > > >>> University of Southern California, Los Angeles, CA 90089 USA
>> > > >>> 
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> -----Original Message-----
>> > > >>> From: Oleg Tikhonov <olegtikhonov@gmail.com>
>> > > >>> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > >>> Date: Monday, October 20, 2014 at 10:16 PM
>> > > >>> To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > >>> Subject: Re: 1.7 release?
>> > > >>>
>> > > >>> >Hi, I can try this on.
>> > > >>> >What is a trunk?
>> > > >>> >
>> > > >>> >
>> > > >>> >Thanks,
>> > > >>> >Oleg
>> > > >>> >
>> > > >>> >On Tue, Oct 21, 2014 at 6:21 AM, Mattmann, Chris A (3980)
<
>> > > >>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> > > >>> >
>> > > >>> >> Hmm any idea why this is failing on Windows? Tyler
P. and
>> > > >>> >> I were talking the other day - maybe we shouldn't
run the
>> > > >>> >> tests from TIKA-1422 unless Tesseract is installed?
Thoughts?
>> > > >>> >>
>> > > >>> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>> >> Chris Mattmann, Ph.D.
>> > > >>> >> Chief Architect
>> > > >>> >> Instrument Software and Science Data Systems Section
(398)
>> > > >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109
USA
>> > > >>> >> Office: 168-519, Mailstop: 168-527
>> > > >>> >> Email: chris.a.mattmann@nasa.gov
>> > > >>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> > > >>> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>> >> Adjunct Associate Professor, Computer Science Department
>> > > >>> >> University of Southern California, Los Angeles, CA
90089 USA
>> > > >>> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >> -----Original Message-----
>> > > >>> >> From: Hong-Thai Nguyen <thaichat04@gmail.com>
>> > > >>> >> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > >>> >> Date: Thursday, October 16, 2014 at 2:03 AM
>> > > >>> >> To: "dev@tika.apache.org" <dev@tika.apache.org>
>> > > >>> >> Subject: Re: 1.7 release?
>> > > >>> >>
>> > > >>> >> >Hi Andrzej,
>> > > >>> >> >
>> > > >>> >> >We are impatient for 1.7 release too.
>> > > >>> >> >I'm having compiling problem of TIKA-1422 on
me. If anyone
>>can
>> > > >>>build
>> > > >>> >> >successfully on Windows, I have no objection
to release 1.7
>> > > >>> >> >
>> > > >>> >> >Thanks,
>> > > >>> >> >
>> > > >>> >> >On Thu, Oct 16, 2014 at 10:51 AM, Andrzej BiaƂecki
<
>> > ab@getopt.org>
>> > > >>> >>wrote:
>> > > >>> >> >
>> > > >>> >> >> Hi,
>> > > >>> >> >>
>> > > >>> >> >> Any news on the 1.7 release? or at least
a 1.6.1 release
>>that
>> > > >>> >>includes
>> > > >>> >> >>the
>> > > >>> >> >> fix for broken ODF parsing...
>> > > >>> >> >>
>> > > >>> >> >> ---
>> > > >>> >> >> Best regards,
>> > > >>> >> >>
>> > > >>> >> >> Andrzej Bialecki
>> > > >>> >> >>
>> > > >>> >> >>
>> > > >>> >> >
>> > > >>> >> >
>> > > >>> >> >--
>> > > >>> >> >--------------
>> > > >>> >> >Hong-Thai
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>>


Mime
View raw message