tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (TIKA-321) Optimize type detection speed
Date Sun, 13 Dec 2009 22:17:18 GMT

     [ https://issues.apache.org/jira/browse/TIKA-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jukka Zitting resolved TIKA-321.

       Resolution: Fixed
    Fix Version/s: 0.6
         Assignee: Jukka Zitting

I've made a number of optimizations to the type detection code and as a result it's already
over an order of magnitude faster than before. I believe there's *still* an order of magnitude
of improvement available (check most common types first, short-circuit matching to only subtypes
of already detected types, etc.), but already now I've reached the performance goals I had
so I'll mark this as resolved for Tika 0.6. We can follow up with another issue in case anyone
has more strict performance requirements.

> Optimize type detection speed
> -----------------------------
>                 Key: TIKA-321
>                 URL: https://issues.apache.org/jira/browse/TIKA-321
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.6
> It would be good to do some simple benchmarks on the type detection code (Tika.detect)
to see if there are obvious performance optimizations we could make. There are some use cases
like attaching file type information directory listings where type detection speed is important
and not necessarily dwarfed by IO waits.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message