manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronny Heylen <ronnyhey...@gmail.com>
Subject Re: Scheduler not working as we expected
Date Tue, 25 Sep 2018 08:21:37 GMT
Hi,
We have been using SOLR for a few years and now the server has been
transferred to the VM's in out HQ ( and reinstalled ),
We ara having the the following issue now :
orcing SOLR indexation by curl works, as we can see from:
*curl "*
*http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true*
<http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true>*"
-F "myfile=@z:\qbere_bru\common\testsolr.txt"*
which has successfully indexed testsolr.txt.
As can be checked by:
http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=ella
giving:
<result name="response" numFound="1" start="0">
Searching for john returns 0 files:
http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=john
<result name="response" numFound="0" start="0"/>
and searching for any gives also 1 file:
http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=*
<result name="response" numFound="1" start="0">

However, launching a job from ManifoldCF doesn't seem to work.
We see the folder names in file definition, we see that the job indexes
documents (or at least seems to do so), but SOLR API:
http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=*
still return 1 file only, the one we have manually indexed

If anybody have anu suggestion, would be really gratful

Ronny.Heylen@qbere.com


aan ik

Op di 31 jul. 2018 om 12:12 schreef Karl Wright <daddywri@gmail.com>:

> Hi Vinay,
>
> Dynamic rescan is meant for web-crawling and revisits already crawled
> documents based on how often they have changed in the past.  It is
> therefore wholly inappropriate for something like a file crawl, since
> directory contents (one of the kinds of documents there are in a file
> crawl) change very infrequently.
>
> Instead, I recommend that you run complete crawls, non-dynamic.  You can
> even run minimal crawls fairly often, which will pick up new and changed
> documents, and run non-minimal crawls on a less frequent schedule to
> capture deletions.
>
> Thanks,
> Karl
>
>
> On Tue, Jul 31, 2018 at 4:05 AM VINAY Bengaluru <vinaybs.20@gmail.com>
> wrote:
>
>> Hi Karl,
>>                We have set up a scheduler for our jobs with input
>> connector as file system and output connector as Solr.
>> We have set up a scheduler as follows :
>> Schedule type: Rescan documents dynamically
>> Recrawl interval: blank
>> Schedule time: appropriate times with job invocation as complete.
>>
>> We see that the job is not picking up documents at the scheduled
>> intervals.
>>
>> Why the job doesn't pickup new docs at the scheduled interval? Anything
>> wrong with our job configuration or our understanding?
>>
>> Thanks and regards,
>> Vinay
>>
>>

Mime
View raw message