tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:
Date Wed, 30 Sep 2015 15:43:05 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14937020#comment-14937020
] 

Chris A. Mattmann commented on TIKA-1754:
-----------------------------------------

would [cas-crawler|http://svn.apache.org/repos/asf/oodt/trunk/crawler/] make sense to use
here? See [DRAT|http://github.com/chrismattmann/drat/] as a real example of this.

> tika-batch's FileListCrawler truncates the first character of the fileList if the root
is e.g. X:
> -------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1754
>                 URL: https://issues.apache.org/jira/browse/TIKA-1754
>             Project: Tika
>          Issue Type: Bug
>          Components: batch
>    Affects Versions: 1.10
>            Reporter: Tim Allison
>            Priority: Trivial
>              Labels: java7
>
> The FileListCrawler takes a root directory and a list of relative file paths and "crawls"
that list as if it were a directory crawler.  If the root is specified as, e.g. "X:" on a
Windows system, the call to substring on root's absolute path and the subtraction of one character
is incorrect.
> With a root of X: and a relative file of "dir1/dir2/file.doc", the output file is: "X:/ir/dir2/file.doc.txt"
> Let's get rid of the substring calculations and move to Java 7! :)
> See TIKA-1747.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message