james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dat Pham (JIRA)" <server-...@james.apache.org>
Subject [jira] [Created] (JAMES-2581) Configurable ContentType blacklist for Tika
Date Thu, 01 Nov 2018 08:43:00 GMT
Dat Pham created JAMES-2581:

             Summary: Configurable ContentType blacklist for Tika
                 Key: JAMES-2581
                 URL: https://issues.apache.org/jira/browse/JAMES-2581
             Project: James Server
          Issue Type: Improvement
            Reporter: Dat Pham

Enhanced production logging upon Tika failing call highlight the fact that our installation
of Tika **can not** handle some kinds of attachments.

Here is a log example:

org.apache.http.client.HttpResponseException: Unsupported Media Type
 at org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70)
 at org.apache.http.client.fluent.Response.handleResponse(Response.java:90)
 at org.apache.http.client.fluent.Response.returnContent(Response.java:97)
 at org.apache.james.mailbox.tika.TikaHttpClientImpl.recursiveMetaDataAsJson(TikaHttpClientImpl.java:62)
 at org.apache.james.mailbox.tika.TikaTextExtractor.performContentExtraction(TikaTextExtractor.java:86)
 at org.apache.james.mailbox.tika.TikaTextExtractor.lambda$extractContent$0(TikaTextExtractor.java:81)

(131 matches in the last 2 days)

Here is a list if Content types we recurringly fail on:

- application/ics
 - application/zip
 - application/pgp-signature
 - image/jpg
 - image/jpeg
 - image/png
 - message/delivery-status

As an admin, I should be able to specify in `tika.properties` file a coma separated list of
Content type to blacklist.

 - Avoid known-to-be-failing Tika calls - reduce log output
 - Avoid transmitting potentially big payload over the network for nothing - performance

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message