tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-1328) Translate Metadata and Content
Date Mon, 07 Jul 2014 01:30:35 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris A. Mattmann updated TIKA-1328:

    Component/s: translation

> Translate Metadata and Content
> ------------------------------
>                 Key: TIKA-1328
>                 URL: https://issues.apache.org/jira/browse/TIKA-1328
>             Project: Tika
>          Issue Type: New Feature
>          Components: translation
>            Reporter: Tyler Palsulich
>             Fix For: 1.7
> Right now, Translation is only done on Strings. Ideally, users would be able to "turn
on" translation while parsing. I can think of a couple options:
> - Make a TranslateAutoDetectParser. Automatically detect the file type, parse it, then
translate the content.
> - Make a Context switch. When true, translate the content regardless of the parser used.
I'm not sure the best way to go about this method, but I prefer it over another Parser.
> Regardless, we need a black or white list for translation. I think black list would be
the way to go -- which fields should not be translated (dates, versions, ...) Any ideas? Also,
somewhat unrelated, does anyone know of any other open source translation libraries? If we
were really lucky, it wouldn't depend on an online service.

This message was sent by Atlassian JIRA

View raw message