manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject [TIP] Workaround for Solr bugs when Indexing Solr 1.4.1
Date Wed, 30 Mar 2011 15:39:14 GMT

Solr 1.4.1 has several bugs which makes it difficult to deploy MCF on a 
application server such as Resin. I have struggled a lot with some of 
these bugs and decided to share my experiences in case others have the 
same problems.

First I figured out that I had to upgrade Tika to version 0.8 in order 
to extract the content of MS Office documents etc. Solr 1.4.1 ships with 
Tika 0.4 and will not work:
https://issues.apache.org/jira/browse/SOLR-1902

Here you have basically two options:
1. Install the following branch:
http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/
2. Install the latest version from trunk (not recommended for production 
use).

Then I figured out that I couldn't parse dates correctly. You have the 
option in ExtractingRequestHandler to specify different date formats by 
the following example:
<lst name="date.formats">
   <str>yyyy-MM-dd</str>
   <str>dd.MM.yyyy</str>
</lst>

This will cause a lazy loading error due to the following bug:
https://issues.apache.org/jira/browse/SOLR-1756

You have the following workaround:
1. Install the branch mentioned above and then install the following patch:
https://issues.apache.org/jira/secure/attachment/12434831/SOLR-1756.patch
2. Install the latest version from trunk.

Remember to rebuild Solr and place the necessary jar files in a separate 
folder which your application server has access to 
(apache-solr-cell*.jar, Tika and its depencencies).

Erlend

-- 
Erlend GarĂ¥sen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message