tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Bednárik (JIRA) <j...@apache.org>
Subject [jira] [Updated] (TIKA-1315) Basic list support in WordExtractor
Date Fri, 30 May 2014 15:55:02 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Filip Bednárik updated TIKA-1315:
---------------------------------

    Attachment: ListUtils.java
                WordExtractor.java
                WordParserTest.java

> Basic list support in WordExtractor
> -----------------------------------
>
>                 Key: TIKA-1315
>                 URL: https://issues.apache.org/jira/browse/TIKA-1315
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Filip Bednárik
>            Priority: Minor
>             Fix For: 1.6
>
>         Attachments: ListUtils.java, WordExtractor.java, WordParserTest.java
>
>
> Hello guys, I am really sorry to post issue like this because I have no other way of
contacting you and I don't quite understand how you manage forks and pull requests (I don't
think you do that).
> In my project I needed for tika to parse numbered lists from word .doc documents, but
TIKA doesn't support it. So I looked for solution and found one here: http://developerhints.blog.com/2010/08/28/finding-out-list-numbers-in-word-document-using-poi-hwpf/
. So I adapted this solution to Apache TIKA with few fixes and improvements. Anyway feel free
to use any of it so it can help people who struggle with lists in TIKA like I did.
> Attached files are:
> Updated test
> Fixed WordExtractor
> Added ListUtils



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message