tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <ep...@opensourceconnections.com>
Subject Re: Questions
Date Tue, 10 Sep 2019 12:56:49 GMT
Hey Keith…

Your question #3 made me curious, as I thought GitHub was a mirror, but https://devclass.com/2019/04/30/apache-heads-to-github/
<https://devclass.com/2019/04/30/apache-heads-to-github/> looks like Github is the authoritative
repo.   The https://tika.apache.org/contribute.html <https://tika.apache.org/contribute.html>
also says the same thing…

So yes, I think the title does need updating.   The Apache Spark’s Github description is
“Apache Spark”, so we could be “Apache Tika”.

Not sure I can answer 1.   

As far as 2,  I find I typically use the tike-app jar unless I am carefully choosing which
dependencies I want.

> On Sep 9, 2019, at 8:21 AM, Keith Bennett <keithrbennett@gmail.com> wrote:
> 
> Hello, everyone. I am a Tika committer but have not been active for a long time. I've
been looking over the code and would appreciate if you could answer some questions:
> 
> 1) There is a Jira issue (at https://issues.apache.org/jira/browse/DRILL-6256?jql=text%20~%20%22readme%20java%207%22)
regarding the mention of Java 1.7 in the README (https://github.com/apache/tika/blob/master/README.md).
It was marked as fixed, but I still see Java 7 mentioned. Tika should work with the most recent
versions of Java, right? Should we not update the readme accordingly? I noticed that there
is a "tika-java7" directory in the project consisting solely of a TikaFileTypeDetector class.
Can you help me understand what the connection with Java version 7 is? Is it that Tika code
should not use features that were absent in Java 7 (such as lambdas)?
> 
> 2) I would like to bring "Rika" (https://github.com/ricn/rika), a Ruby wrapper around
Tika, up to date with respect to the dependency jar files packaged with it. I thought I would
check out the commit to which the 1.22 tag was attached, and do a fresh maven install, and
use the files that were installed ("~/.m2/repository/**/*jar"). Then again, Rika unconditionally
loads all the jar files; would it be faster to just use the jar file of the Tika distribution
(e.g. tika-app-1.22.jar) so that only one instead of n files needs to be loaded? 
> 
> 3) The description for the Github repo at https://github.com/apache/tika says "Tika Mirror".
Is it really a mirror, or has it become the authoritative source? (Given that I saw mentions
of pull requests, I suspect the latter.) If the latter, I suggest changing that text to something
like "Tika Authoritative Repository", as it is currently misleading.
> 
> Thanks,
> Keith
> 
> --	
> Keith R. Bennett
> about.me/keithrbennett
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
<http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
 
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be Company Confidential
unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message