manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam LaPila <Adam.LaP...@lmal.com.au>
Subject RE: Problem connecting to Solr 3.5
Date Tue, 06 Dec 2011 22:55:49 GMT
Hello I had the same problem a couple weeks ago when working with solr 3.4 and MCF 3.0  the
solrconfig where you need to look

If there isn't anything showing up in the solr log. Check the manifoldCF log...if running
the example it will be somewhere like this C:\mcf\dist\example\logs
If nothing is showing up in the solr logs, something should be showing up in there...you will
probably see that something like "a class cannot be found".

In your solr home directory, create a lib folder.

Copy all the .jars  from \extraction\lib to your lib directory in your solr home.
Copy all the .jars from \dist ....should see the apache-solr-cell and solr.core and more in
here...copy all and put into your lib directory in solr home too.

Now in your solrconfig.xml...where you include all your libs. Just have  <lib dir="./lib"
/>, get rid of the other lib directives.

What I found the problem was when a regex is specified in addition to a directory, only the
files in that directory which completely match the regex will be used. So it wasn't making
use of the extraction lib or the newer .jars since last 3.0 or whatever the last version of
solr was.

Closed the jetty service and re-started it.

Go into MCF and started the job again.

Check your solr logs you should see something new in there now.

After this I found that all the documents in my file system were index into solr.

Also I recommend you get your hands on a tool called Luke, its great for analysing your lucene/solr
indexes, making sure everything is in there the way you want it to be.

Hope that helps.

Adam.


-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Wednesday, 7 December 2011 9:11 AM
To: connectors-user@incubator.apache.org
Subject: Re: Problem connecting to Solr 3.5

Are you sure you set the job up with the Solr output connection as the output?  It sounds
like it might be going to a null output connection or some such.  Alternatively, there are
a number of ways you can filter out documents from your crawl which you may have set incorrectly.
 For example, the document exclusion/inclusion tabs in the web job also have indexing inclusion/exclusion
fields; if those regexps are not correct you may wind up discarding everything you crawl.

There is also a Documents tab in your Solr connection configuration that allows you to filter
documents indexed by Solr by mime type.
Make sure the mime types you are interested in are in the list, OR make sure the list is empty.

FWIW, if you mess up configuration with the Solr connector, it is
*not* silent.  You'd see errors all over the place.  So that's probably not the issue.

Karl

On Tue, Dec 6, 2011 at 4:37 PM, Michael Kelleher <mj.kelleher@gmail.com> wrote:
> I have a Solr output connector setup:
>    server settings:
>        http
>        localhost
>        8983
>
>        solr
> <my core name here>
>        /update/extract
>        /update
>        /admin/ping
>
>        everything else blank
>
>    Schema
>        id
>
>    All other tabs/configurations are empty
>
>    Solr does indeed have /update and /update/extract configured.
>
>
> After MCF visits the documents, they never get submitted to Solr.
> There are no exceptions in the log, and there is no activity on the Solr side either.
>
> I am positive I must have missed/configured something wrong, but not
> sure what.
>
> Is anyone else using MCF 0.3 with Solr 3.5?

This message is intended only for the use of the intended recipient(s) If you are not an intended
recipient, you are hereby notified that any use, dissemination, disclosure or copying of this
communication is strictly prohibited. If you have received this communication in error please
destroy all copies of this message and its attachments and notify the sender immediately

Mime
View raw message