tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <ep...@opensourceconnections.com>
Subject Re: Tika talk next week - help needed!
Date Tue, 16 May 2017 15:21:37 GMT
Nick,

It was great to read through http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf…
   Wow there is a lot in Tika.

And I think that might be the one challenge with the talk structure, there is SOO much information.

I think I’d like to see “How does Tika actually architected” to support so many amazing
use cases.    If this talk is meant for folks who don’t already know a lot about the project,
then they might get overwhelmed with the long lists, such as all the file types it can handle.
  Maybe change some of them to “here is an eye chart of logos, don’t actually read it”
and consolidate some pages.




Eric

> On May 16, 2017, at 10:38 AM, Thamme Gowda <thammegowda@apache.org> wrote:
> 
> Nick,
> Here are some pointers:
> 1. Image recognition using Tensorflow:
> https://wiki.apache.org/tika/TikaAndVision; Link to Paper:
> https://memex.jpl.nasa.gov/MFSEC17.pdf
> 2. Image Recognition using Deeplearning4j -
> https://wiki.apache.org/tika/TikaAndVisionDL4J
> 3. Sentiment Analysis using OpenNLP: https://github.com/apache/tika/pull/169
> 4. Video labeling using tensorflow image rec:
> https://wiki.apache.org/tika/TikaAndVisionVideo
> 5.  Named Entity Extraction using OpenNLP and CoreNLP:
> https://wiki.apache.org/tika/TikaAndNER
> 
> *Coming soon (Work in progress):*
> 6. Image Captioning (Image-to-Text) https://github.com/apache/tika/pull/180
> 
> Cheers,
> -Thamme
> 
> *--*
> *Thamme Gowda*
> TG | @thammegowda <https://twitter.com/thammegowda>
> ~Sent via somebody's Webmail server!
> 
> On Tue, May 16, 2017 at 6:59 AM, Chris Mattmann <mattmann@apache.org> wrote:
> 
>> Yep, literally take a look at the Tika wiki – there are examples a plenty
>> and even
>> screen shots. Further, if you look at the MEMEX site under our new
>> publications
>> section, there are a few examples (like the ICMR paper on forensics) that
>> show it
>> in action.
>> 
>> http://memex.jpl.nasa.gov/#publications
>> 
>> 
>> 
>> On 5/16/17, 6:21 AM, "Konstantin Gribov" <grossws@gmail.com> wrote:
>> 
>>    IIRC, image and video labeling basic support was added (Chris & Thamme
>>    could you elaborate on that, please), TSD (TIKA-2309, time stamped data
>>    envelope format) support, slf4j migration (ongoing on 2.x branch).
>> 
>>    вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. <tallison@mitre.org>:
>> 
>>> Doh!  Sorry for the delay...might add configuration of
>> EncodingDetectors,
>>> but that's probably too far into the weeds?
>>> 
>>> -----Original Message-----
>>> From: Nick Burch [mailto:nick@apache.org]
>>> Sent: Sunday, May 14, 2017 11:34 AM
>>> To: dev@tika.apache.org
>>> Subject: Tika talk next week - help needed!
>>> 
>>> Hi All
>>> 
>>> Last year in Seville, I gave a talk on Tika entitled "Apache Tika -
>> What’s
>>> new with 2.0?". For ApacheCon Miami next week, I've been roped into
>> giving
>>> an updated version...
>>> 
>>> https://apachecon2017.sched.com/event/9zvD/apache-tika-
>> whats-new-with-20-nick-burch-apache-software-foundation
>>> 
>>> My slides from Seville are available at:
>>> 
>>> http://events.linuxfoundation.org/sites/events/files/slides/
>> WhatsNewWithApacheTika_1.pdf
>>> 
>>> Beyond updating the list of releases and parsers, and the slide
>>> background, what should I change?
>>> 
>>> Maybe some more on Tika eval? More details on some of the NLP /
>> Entity
>>> Recognition / Image Recoginition stuff? Some screenshots of that
>> stuff?
>>> More on translation? Something else?
>>> 
>>> Ideas greatly appreciated! Good screenshots even more so :)
>>> 
>>> Cheers
>>> Nick
>>> 
>>    --
>> 
>>    Best regards,
>>    Konstantin Gribov
>> 
>> 
>> 
>> 


_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
<http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
 
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be Company Confidential
unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message