lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liam O'Boyle (JIRA)" <>
Subject [jira] [Commented] (SOLR-2424) extracted text from tika has no spaces
Date Mon, 30 May 2011 01:48:47 GMT


Liam O'Boyle commented on SOLR-2424:

Hi, sorry for the slow response, I don't seem to be receiving notifications of updates.  

You are correct; I used the Tika 0.9 command line tool, which worked correctly.  When I tried
the 0.8 version the same problem occurs as is described in this ticket, so it appears that
the bug is in Tika and that it is already resolved in the 0.9 release.

I'll try to update the version of Tika in use in my installation, although it's something
that has caused more problems than it has solved when I've tried it in the past.

> extracted text from tika has no spaces
> --------------------------------------
>                 Key: SOLR-2424
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 3.1
>            Reporter: Yonik Seeley
>         Attachments: ET2000 Service Manual.pdf
> Try this:
> curl "http://localhost:8983/solr/update/extract?extractOnly=true&wt=json&indent=true"
 -F "tutorial=@tutorial.pdf"
> And you get text output w/o spaces: "ThisdocumentcoversthebasicsofrunningSolru"...

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message