manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Indexing Solr with the web crawler
Date Mon, 24 Jan 2011 13:48:28 GMT
On Mon, Jan 24, 2011 at 8:40 AM, Erlend Garåsen <> wrote:
> On 21.01.11 17.38, Karl Wright wrote:
>> I will not be talking about ManifoldCF at this year's conference, most
>> likely, because the conference conflicts with my daughter's college
>> graduation.  Sorry about that!
> I'm not sure when the conference will be held anyway - I guess the date is
> not officially published yet.

I received the email last week.  The conference is currently set for
May 18 and 19 in San Francisco.

>> I hadn't heard that they removed the extracting update request handler
>> from Solr.  That's unfortunate.  Please let me know how hard you find
>> it to install the jar, and I'll update the instructions accordingly.
> It's finally working, but not perfectly. Here's what I had to do:
> - Run "ant example"
> - Create a <solr.home>/lib directory
> - Place all jars in contrib/extraction/lib/ and contrib/extraction/build/
> into this lib folder.
> I also had co use the schema.xml file from the example. My own schema
> configuration is different, so I guess I need to adapt it later. Content is
> missing, title is not. And maybe I need to create my own request handler in
> order to implement language detection. I will try to dive deeper into all
> the configuration settings.

Thanks for the information.
What I'd like to do is wait until your research is done and then post
the rough instructions to for confirmation that
your approach is the preferred one.  I'd also like to know if you
check out the latest solr release from the svn tag and just build it,
whether you have any of these problems.  I've been building
solr/lucene trunk and not using the binary distribution, which may be
why I never noticed that this has gone away in the main distribution.

Thanks again!

View raw message