tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Chuk <jaychuk2...@gmail.com>
Subject Re: [EXTERNAL] Extracting font information from xml
Date Tue, 15 Oct 2019 22:54:11 GMT
Thanks for the quick reply Chris.
Please is there a possible code snippet in python for it.

Reagrds,
Jay

On Tue, Oct 15, 2019 at 6:52 PM Chris Mattmann <mattmann@apache.org> wrote:

> Hi Jay, yes, I believe so. Tika Python is just a thin client to Tika
> Server and it
> provides this functionality. CC’ing dev@tika
>
>
>
>
>
>
>
> *From: *Jay Chuk <jaychuk2017@gmail.com>
> *Date: *Tuesday, October 15, 2019 at 3:47 PM
> *To: *"Mattmann, Chris A (US 1761)" <chris.a.mattmann@jpl.nasa.gov>
> *Subject: *[EXTERNAL] Extracting font information from xml
>
>
>
> Hi Chris,
>
>
>
> Thanks for provide the python package -Tika, to use for extracting text
> from pdf's.
>
>
>
> I'll like to confirm it is possible when converting pdf to xml  to get the
> font style for the text e.g the font type, if the text is bold/solid .
>
> I need such information in identifying section headers and titles from the
> documents.
>
>
>
> Please let me know if it is possible or if there is another way tp gp
> about this.
>
>
>
> Thank you
>
> Jay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message