tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: [EXTERNAL] Extracting font information from xml
Date Tue, 15 Oct 2019 22:51:55 GMT
Hi Jay, yes, I believe so. Tika Python is just a thin client to Tika Server and it
provides this functionality. CC’ing dev@tika




From: Jay Chuk <jaychuk2017@gmail.com>
Date: Tuesday, October 15, 2019 at 3:47 PM
To: "Mattmann, Chris A (US 1761)" <chris.a.mattmann@jpl.nasa.gov>
Subject: [EXTERNAL] Extracting font information from xml


Hi Chris, 


Thanks for provide the python package -Tika, to use for extracting text from pdf's.


I'll like to confirm it is possible when converting pdf to xml  to get the font style for
the text e.g the font type, if the text is bold/solid . 

I need such information in identifying section headers and titles from the documents.


Please let me know if it is possible or if there is another way tp gp about this.


Thank you


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message