tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manolo Caracuel (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (TIKA-2542) Support in tika-server for getting plain text and metadata at the same time
Date Sat, 06 Jan 2018 02:05:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manolo Caracuel updated TIKA-2542:
----------------------------------
    Comment: was deleted

(was: Pull request:

https://github.com/apache/tika/pull/216)

> Support in tika-server for getting plain text and metadata at the same time
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2542
>                 URL: https://issues.apache.org/jira/browse/TIKA-2542
>             Project: Tika
>          Issue Type: Improvement
>          Components: core, server
>    Affects Versions: 1.17
>            Reporter: Manolo Caracuel
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.18
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be good to have a way to get a files plain text extracted and also get the metadata
detected. Currently you can only get the metadata if the request has Accepts of text/xml or
text/html but then the text in the body is not the plain text as it contains html elements
as well.
> I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml
document is returned with the metadata in head's meta elements and the plain text in the body.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message