lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Taylor ...@inovem.com>
Subject Re: Extracting contents of zipped files with Tika and Solr 1.4.1
Date Mon, 11 Apr 2011 10:12:30 GMT
Jayendra,

Thanks for the info - been keeping an eye on this list in case this 
topic cropped up again.  It's currently a background task for me, so 
I'll try and take a look at the patches and re-test soon.

Joey - glad you brought this issue up again.  I haven't progressed any 
further with it.  I've not yet moved to Solr 3.1 but it's on my to-do 
list, as is testing out the patches referenced by Jayendra.  I'll post 
my findings on this thread - if you manage to test the patches before 
me, let me know how you get on.

Thanks and kind regards,
Gary.


On 11/04/2011 05:02, Jayendra Patil wrote:
> The migration of Tika to the latest 0.8 version seems to have
> reintroduced the issue.
>
> I was able to get this working again with the following patches. (Solr
> Cell and Data Import handler)
>
> https://issues.apache.org/jira/browse/SOLR-2416
> https://issues.apache.org/jira/browse/SOLR-2332
>
> You can try these.
>
> Regards,
> Jayendra
>
> On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzel<phanzel@nearinfinity.com>  wrote:
>> Hi Gary,
>>
>> I have been experiencing the same problem... Unable to extract content from
>> archive file formats.  I just tried again with a clean install of Solr 3.1.0
>> (using Tika 0.8) and continue to experience the same results.  Did you have
>> any success with this problem with Solr 1.4.1 or 3.1.0 ?
>>
>> I'm using this curl command to send data to Solr.
>> curl "
>> http://localhost:8080/solr/update/extract?literal.id=doc1&fmap.content=attr_content&commit=true"
>> -H "application/octet-stream" -F  "myfile=@data.zip"
>>
>> No problem extracting single rich text documents, but archive files only
>> result in the file names within the archive being indexed. Am I missing
>> something else in my configuration? Solr doesn't seem to be unpacking the
>> archive files. Based on the email chain associated with your first message,
>> some people have been able to get this functionality to work as desired.
>>
>


-- 
Gary Taylor
INOVEM

Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.taylor@inovem.com
www.inovem.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE


Mime
View raw message