tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-169) Tika Web Service Servlet
Date Mon, 24 Nov 2008 17:27:44 GMT

    [ https://issues.apache.org/jira/browse/TIKA-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650251#action_12650251

Jukka Zitting commented on TIKA-169:

Another alternative for cross-platform use is the CLI feature:

    # Extracting structured text content from a file
    java -jar tika-0.2-standalone.jar --xml /path/to/file

    # Extracting plain text content from a file
    java -jar tika-0.2-standalone.jar --text /path/to/file

    # Extracting metadata from a file
    java -jar tika-0.2-standalone.jar --metadata /path/to/file

This way you don't need a separate server process and there won't be any concerns about unauthorized
users getting access to your files.

I'm a bit concerned about any web service that allows the client to retrieve the contents
of any file on the local file system. Would it make more sense to always require the client
to upload the files they want parsed?

Also, the file system traversal feature seems a bit outside the scope of Tika, though having
something like this in a contrib area might be nice.

> Tika Web Service Servlet
> ------------------------
>                 Key: TIKA-169
>                 URL: https://issues.apache.org/jira/browse/TIKA-169
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.2
>            Reporter: Rida Benjelloun
>            Priority: Minor
>         Attachments: tikaServlet.war
> Tika servlet, use file or directory path to build a list of XML documents. The next version
will allow file upload.
> Usage :
> //Extract document content and metadata
> http://localhost:8080/tikaServlet/?filePath=C:\test&start=0&rows=10
> //Extract metadata
> http://localhost:8080/tikaServlet/?filePath=C:\test&start=0&rows=10&extract=metadata
> //Extract document content
> http://localhost:8080/tikaServlet/?filePath=C:\test&start=0&rows=10&extract=content

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message