james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antoine Duprat (JIRA)" <server-...@james.apache.org>
Subject [jira] [Resolved] (JAMES-2581) Configurable ContentType blacklist for Tika
Date Fri, 09 Nov 2018 13:12:00 GMT

     [ https://issues.apache.org/jira/browse/JAMES-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Antoine Duprat resolved JAMES-2581.
-----------------------------------
    Resolution: Fixed

merged

> Configurable ContentType blacklist for Tika
> -------------------------------------------
>
>                 Key: JAMES-2581
>                 URL: https://issues.apache.org/jira/browse/JAMES-2581
>             Project: James Server
>          Issue Type: Improvement
>            Reporter: Dat Pham
>            Priority: Major
>
> Enhanced production logging upon Tika failing call highlight the fact that our installation
of Tika **can not** handle some kinds of attachments.
> Here is a log example:
> ```
> org.apache.http.client.HttpResponseException: Unsupported Media Type
>  at org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70)
>  at org.apache.http.client.fluent.Response.handleResponse(Response.java:90)
>  at org.apache.http.client.fluent.Response.returnContent(Response.java:97)
>  at org.apache.james.mailbox.tika.TikaHttpClientImpl.recursiveMetaDataAsJson(TikaHttpClientImpl.java:62)
>  at org.apache.james.mailbox.tika.TikaTextExtractor.performContentExtraction(TikaTextExtractor.java:86)
>  at org.apache.james.mailbox.tika.TikaTextExtractor.lambda$extractContent$0(TikaTextExtractor.java:81)
> ```
> (131 matches in the last 2 days)
> Here is a list if Content types we recurringly fail on:
> - application/ics
>  - application/zip
>  - application/pgp-signature
>  - image/jpg
>  - image/jpeg
>  - image/png
>  - message/delivery-status
> As an admin, I should be able to specify in `tika.properties` file a coma separated list
of Content type to blacklist.
> Benefits:
>  - Avoid known-to-be-failing Tika calls - reduce log output
>  - Avoid transmitting potentially big payload over the network for nothing - performance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message