manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <>
Subject [TIP] Workaround for Solr bugs when Indexing Solr 1.4.1
Date Wed, 30 Mar 2011 15:39:14 GMT

Solr 1.4.1 has several bugs which makes it difficult to deploy MCF on a 
application server such as Resin. I have struggled a lot with some of 
these bugs and decided to share my experiences in case others have the 
same problems.

First I figured out that I had to upgrade Tika to version 0.8 in order 
to extract the content of MS Office documents etc. Solr 1.4.1 ships with 
Tika 0.4 and will not work:

Here you have basically two options:
1. Install the following branch:
2. Install the latest version from trunk (not recommended for production 

Then I figured out that I couldn't parse dates correctly. You have the 
option in ExtractingRequestHandler to specify different date formats by 
the following example:
<lst name="date.formats">

This will cause a lazy loading error due to the following bug:

You have the following workaround:
1. Install the branch mentioned above and then install the following patch:
2. Install the latest version from trunk.

Remember to rebuild Solr and place the necessary jar files in a separate 
folder which your application server has access to 
(apache-solr-cell*.jar, Tika and its depencencies).


Erlend GarĂ¥sen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

View raw message