james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tellier Benoit (JIRA)" <server-...@james.apache.org>
Subject [jira] [Closed] (JAMES-2018) JMAP html extraction should not use tika but jsoup
Date Thu, 01 Jun 2017 09:10:04 GMT

     [ https://issues.apache.org/jira/browse/JAMES-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tellier Benoit closed JAMES-2018.

> JMAP html extraction should not use tika but jsoup
> --------------------------------------------------
>                 Key: JAMES-2018
>                 URL: https://issues.apache.org/jira/browse/JAMES-2018
>             Project: James Server
>          Issue Type: Bug
>          Components: JMAP
>            Reporter: Matthieu Baechler
>            Assignee: Antoine Duprat
> It looks like we cannot use two TextExtractor at the same time and thus, html to text
transformation, very common in jmap code, is done by tika.
> It's obviously overkill when we have a jsoup based TextExtractor at hand that is probably
much faster because it skips content type detection).
> We should qualify TextExtractor with a @Named to be able to force jsoup usage for html

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message