tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [DISCUSS] Release Tika 1.11?
Date Wed, 23 Sep 2015 20:19:28 GMT
+1 to branching.  Given some surprises we've had, I'd want to have a 1.12+-SNAPSHOT branch
easily available, because I suspect that 2.0 is still at least 6 months* off given the current
pace of progress and what I've seen on other projects making major release changes.   Wish
I had more hours in the day...



-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com] 
Sent: Wednesday, September 23, 2015 1:45 PM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Release Tika 1.11?

+1 for the branching strategy.

With respect to slicing up the parsers it would be great to have more discussion on how the
parsers should be organized.  I think Tim has a draft out on this mailing list that would
benefit from some additional perspectives.  Really cool to be talking about doing this!

- Bob

On Wed, Sep 23, 2015 at 12:36 PM, Konstantin Gribov <grossws@gmail.com>
wrote:

> It seems to be a good idea to avoid inclusion of commons-io into 
> tika-core till 2.0 if we will release it in several months.
> In this case we will have trunk w/ ongoing development of 2.0-SNAPSHOT 
> and branch for 1.11+ bugfixes.
>
> Some changes related to java7 can be included to 1.11/1.12 with no 
> problems.
>
> ср, 23 сент. 2015 г. в 19:33, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov>:
>
> > I’m not so keen on fundamentally changing the organization of Tika 
> > until 2.x. This seems like a major change to me in the way people 
> > expect to consume Tika.
> >
> > Can we:
> >
> > 1. Release a 1.11 that doesn’t include these types of changes 2. 
> > After 1.11, change trunk to be 2.0-SNAPSHOT and work those types of 
> > issues there?
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398) NASA Jet 
> > Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattmann@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Associate Professor, Computer Science Department University 
> > of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Yaniv Kunda <yaniv.kunda@answers.com>
> > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> > Date: Wednesday, September 23, 2015 at 9:30 AM
> > To: "dev@tika.apache.org" <dev@tika.apache.org>
> > Subject: Re: [DISCUSS] Release Tika 1.11?
> >
> > >+1 for the uber jar!
> > >
> > >Regarding jdk7 issues, I have a few more I will create and patch 
> > >later tonight - I'll post a list of issues as well.
> > >On Sep 23, 2015 5:26 PM, "Konstantin Gribov" <grossws@gmail.com> wrote:
> > >
> > >> Tim, was your check for File#getName done manually or it's 
> > >>present in tests  somehow? If it's present in tests we can check 
> > >>it on major platforms
> (I
> > >>can
> > >> test on linux, win xp and maybe on macosx) with different jdks.
> > >>
> > >> In case commons-io doesn't support ':' as file separator we can 
> > >>have simple  utility class in Tika or send them a patch for it.
> > >>
> > >> I think, we can rethink Tika packaging in 1.11/1.12 and produce 
> > >>these
> > >> artifacts:
> > >> - tika-core w/ dependency on commons-io (and deprecate most of 
> > >>o.a.tika.io  ,  forwarding calls to jdk or commons-io),
> > >> - tika-core-uber w/ shaded commons-io (rename and drop all things  
> > >>unnecessary for o.a.tika.io),
> > >> - sliced tika-parsers-* as Bob suggested earlier,
> > >> - tika-parsers jar w/ all tika-parsers-* parts (for 
> > >>compatibility),
> > >> - other tika-* artifacts (like tika-server, tika-app etc).
> > >>
> > >> One who needs tika-core without dependencies would use 
> > >>tika-core-uber  instead of it, all others, who prefer using 
> > >>maven/ivy/gradle/sbt/lein will  depend on tika-core.
> > >> And we can drop o.a.tika.io in 2.0.
> > >>
> > >> Also, I'll take a look at unresolved jdk7 issues/patches today.
> > >>
> > >> вт, 22 сент. 2015 г. в 15:41, Allison, Timothy B. 
> > >> <tallison@mitre.org
> >:
> > >>
> > >> > Thank _you_ for all of your work in modernizing us.  With your
> > >>efforts,
> > >> > we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon.
> :)
> > >> >
> > >> > >>Regarding FilenameUtils.getName() - I believe that its
> functionality
> > >> can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > I agree that we should use that anytime we're interacting with 
> > >> > the
> > >>file
> > >> > system.
> > >> >
> > >> > However, that's actually the problem for paths that are stored
> within
> > >>the
> > >> > document (say, an embedded resource).  Let's say a user creates 
> > >> > a
> > >>file on
> > >> > Windows, the file path information for the embedded file 
> > >> > (depending
> on
> > >> the
> > >> > parser and the file format) may be in Windows-ese, which is a
> > >>problem if
> > >> > you try to use Path.getFileName() (I think... I haven't 
> > >> > actually
> > >>tested
> > >> > this) on a Linux machine.  I have actually tested this with the 
> > >> > old
> > >>File
> > >> > getName(), and it did not work cross-platform IIRC.
> > >> >
> > >> > In short, Tika needs to have the ability to extract the file 
> > >> > name
> > >>from a
> > >> > path that was created on any platform (including old Mac and its ":"
> > >> > separator) while Tika is running on any platform.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Monday, September 21, 2015 11:31 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Thanks for the positive spirit!
> > >> >
> > >> > Regarding FilenameUtils.getName() - I believe that its 
> > >> > functionality
> > >>can
> > >> > be replaced by Path.getFileName() - and in a platform-aware 
> > >> > manner,
> as
> > >> each
> > >> > JVM distribution comes with a specific provider implementation 
> > >> > for
> > >>the OS
> > >> > it's for.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > >> > Sent: Monday, September 21, 2015 14:27
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > +1, it would be great to move a bit more into EOL'd Java 7 asap.
> > >> >
> > >> > I'll take TIKA-1734 by tomorrow EDT.
> > >> >
> > >> > As for the other 2, I'm personally ok waiting for 1.12, but I 
> > >> > defer
> to
> > >> the
> > >> > dev community.
> > >> >
> > >> > Chris, Nick, Ray, Ken, Konstantin, if you have a chance to 
> > >> > chime in
> on
> > >> > TIKA-1726, that might help move things forward.
> > >> >
> > >> > On TIKA-1706, I share Nick's and Jukka's caution, and I also 
> > >> > share
> > >> Yaniv's
> > >> > point about duplication of code, bloat within Tika and missing 
> > >> > out
> on
> > >> > updates.   Aside from one small bit of code I'd like to keep or
> > >>perhaps
> > >> try
> > >> > to move into commons-io (?)[0], I think I'm now +1 to going 
> > >> > forward
> > >>with
> > >> > TIKA-1706 in core...unless there is a -1 from the community.
> > >> >
> > >> > Best,
> > >> >
> > >> >              Tim
> > >> >
> > >> >
> > >> > [1] I added some customizations for old MAC OS behavior (treat ":"
> as
> > >> file
> > >> > separator) in FileNameUtils.getName() that I don't want to lose.
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: Yaniv Kunda [mailto:yaniv.kunda@answers.com]
> > >> > Sent: Sunday, September 20, 2015 7:15 AM
> > >> > To: dev@tika.apache.org
> > >> > Subject: RE: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > I would really like to push the following:
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1706 - Bring back
> > >>commons-io
> > >> > to tika-core This requires a decision to re-include commons-io 
> > >> > as a dependency of tika-core.
> > >> > All the pros and cons have been already debated, but no 
> > >> > decision has
> > >>been
> > >> > made.
> > >> >
> > >> > https://issues.apache.org/jira/browse/TIKA-1726 - Augment 
> > >> > public
> > >>methods
> > >> > that use a java.io.File with methods that use a 
> > >> > java.nio.file.Path
> > >>Since
> > >> > this adds new methods to the public API, I requested the group 
> > >> > to
> > >>make a
> > >> > decision about the new names - but have not received something
> > >>definite.
> > >> > However, I did create a subtask -
> > >> > https://issues.apache.org/jira/browse/TIKA-1734 Use
> > java.nio.file.Path
> > >> in
> > >> > TemporaryResources - using [~tallison]'s suggestion, which has 
> > >> > not
> > >>been
> > >> > committed yet.
> > >> >
> > >> > If decisions are made on the above issues, I can quickly create
> > >>patches
> > >> > for them.
> > >> >
> > >> > -----Original Message-----
> > >> > From: Mattmann, Chris A (3980) [mailto:
> chris.a.mattmann@jpl.nasa.gov]
> > >> > Sent: Saturday, September 19, 2015 08:10
> > >> > To: dev@tika.apache.org
> > >> > Subject: [DISCUSS] Release Tika 1.11?
> > >> >
> > >> > Hey Guys and Gals,
> > >> >
> > >> > I’d like to roll a 1.11 release. There is TIKA-1716 which in
> > >>particular
> > >> > allows some neat functionality in tika-python:
> > >> > https://github.com/chrismattmann/tika-python/pull/67
> > >> >
> > >> >
> > >> > Anything else to try and get into the release?
> > >> >
> > >> > If not, I’ll produce an RC #1 by end of weekend.
> > >> >
> > >> > Cheers,
> > >> > Chris
> > >> >
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Chris Mattmann, Ph.D.
> > >> > Chief Architect
> > >> > Instrument Software and Science Data Systems Section (398) NASA 
> > >> > Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> > Office: 168-519, Mailstop: 168-527
> > >> > Email: chris.a.mattmann@nasa.gov
> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> > Adjunct Associate Professor, Computer Science Department 
> > >> > University
> of
> > >> > Southern California, Los Angeles, CA 90089 USA
> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank
you.
> > >> >
> > >> > --
> > >> >
> > >> >
> > >> > This email communication (including any attachments) contains
> > >>information
> > >> > from Answers Corporation or its affiliates that is confidential 
> > >> > and
> > >>may
> > >> be
> > >> > privileged. The information contained herein is intended only 
> > >> > for
> the
> > >>use
> > >> > of the addressee(s) named above. If you are not the intended
> recipient
> > >> (or
> > >> > the agent responsible to deliver it to the intended recipient), 
> > >> > you
> > >>are
> > >> > hereby notified that any dissemination, distribution, use, or
> copying
> > >>of
> > >> > this communication is strictly prohibited. If you have received 
> > >> > this
> > >> email
> > >> > in error, please immediately reply to sender, delete the 
> > >> > message and destroy all copies of it. If you have questions, 
> > >> > please email legal@answers.com.
> > >> >
> > >> > If you wish to unsubscribe to commercial emails from Answers 
> > >> > and its affiliates, please go to the Answers Subscription 
> > >> > Center http://campaigns.answers.com/subscriptions to opt out.  Thank
you.
> > >> >
> > >> --
> > >> Best regards,
> > >> Konstantin Gribov
> > >>
> > >
> > >--
> > >
> > >
> > >This email communication (including any attachments) contains
> information
> > >from Answers Corporation or its affiliates that is confidential and 
> > >may be privileged. The information contained herein is intended 
> > >only for the
> use
> > >of the addressee(s) named above. If you are not the intended 
> > >recipient (or the agent responsible to deliver it to the intended 
> > >recipient), you are hereby notified that any dissemination, 
> > >distribution, use, or copying of this communication is strictly 
> > >prohibited. If you have received this email in error, please 
> > >immediately reply to sender, delete the message and destroy all 
> > >copies of it. If you have questions, please email 
> > >legal@answers.com.
> > >
> > >If you wish to unsubscribe to commercial emails from Answers and 
> > >its affiliates, please go to the Answers Subscription Center 
> > >http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
> > --
> Best regards,
> Konstantin Gribov
>
Mime
View raw message