tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Barranco Ramón <abarra...@autentia.com>
Subject EpubContentParser for xhtml
Date Wed, 11 May 2011 12:37:02 GMT
Hello Community.

I'm not sure if this is the correct mailing list for this purpouse, if so,
please tell me.

We are looking for a .epub parser. We consider Tika at begining but we
realized making test that Tika doesn't parse for now .xhtml files. At this
moment just .html files are parsed. I saw a TODO in the source code at
EpubContentParser.java and it says :

 /**
 * Parser for EPUB OPS <code>*.html</code> files.
 *
 * For the time being, assume XHTML (TODO: DTBook)
 */

So we asume that community thinks about the posibility of parsing .xhtml.
Our question is ... do you have any idea about how many time will take this
new feature for being released ? Is anybody working on it ? We want to know
if this feature will be available soon.

P.S: Sorry for my horrible english, it's not my motherlenguage, and thanks
for any kind of information and for making possible this great community of
Apache.

-- 
Alberto Barranco Ramón (@barrancoalberto)
Analista Programador
mailto:abarranco@autentia.com
Tel.: 620 55 02 12

Autentia Real Business Solutions S.L.
       "Soporte a Desarrollo"
http://www.autentia.com

Este mensaje, y en su caso cualquier fichero anexo al mismo, puede contener
información confidencial y/o privilegiada, siendo para uso exclusivo del
destinatario. Si Vd. no es el destinatario o lo ha recibido por error, por
favor, informe inmediatamente al emisor y destrúyalo. Está estrictamente
prohibido por la legislación vigente realizar sin autorización cualquier
copia, revelación o distribución del contenido de este mensaje sin la
autorización expresa del remitente. Las opiniones expresadas en este correo
son las de su autor y no son, necesariamente, compartidas por Autentia Real
Business Solutions S.L.

This e-mail, and in the case of any file annexed to it, may contain
confidential and/or privileged information, and it is exclusively for the
use of the addresses of the message. If you are not the intended recipient
(or have received this e-mail in error), please notify the sender
immediately and destroy this e-mail. Any unauthorised copying, disclosure or
distribution of the material in this e-mail is strictly forbidden by current
legislation. The points of view expressed in this e-mail are solely those of
the author and may not necessarily be from, or supported by, Autentia Real
Business Solutions S.L.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message