manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Wed, 30 Aug 2017 15:58:26 GMT
I'm actually not using zookeeper. i want to know how is zookeeper different
from file based sync? I also need a guidance on how to manage my pc's
memory. How many Go should I allocate for the start-agent of ManifoldCF? Is
4Go enough in order to crawler 35K files ?

Othman.

On Wed, 30 Aug 2017 at 16:11, Karl Wright <daddywri@gmail.com> wrote:

> Your disk is not writable for some reason, and that's interfering with
> ManifoldCF 2.8 locking.
>
> I would suggest two things:
>
> (1) Use Zookeeper for sync instead of file-based sync.
> (2) Have a look if you still get failures after that.
>
> Thanks,
> Karl
>
>
> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <i93othman@gmail.com>
> wrote:
>
>> Hi Mr Karl,
>>
>> Thank you Mr Karl for your quick response. I have looked into the
>> ManifoldCF log file and extracted the following warnings :
>>
>> - Attempt to set file lock
>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>> Synapses.lock' failed : Access is denied.
>>
>>
>> - Couldn't write to lock file; disk may be full. Shutting down process;
>> locks may be left dangling. You must cleanup before restarting.
>>
>> ES (lowercase) synapses being the elasticsearch output connection.
>> Moreover, the job uses Tika to extract metadata and a file system as a
>> repository connection. During the job, I don't extract the content of the
>> documents. I was wandering if the issue comes from elasticsearch ?
>>
>> Othman.
>>
>>
>>
>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Othman,
>>>
>>> ManifoldCF aborts a job if there's an error that looks like it might go
>>> away on retry, but does not.  It can be either on the repository side or on
>>> the output side.  If you look at the Simple History in the UI, or at the
>>> manifoldcf.log file, you should be able to get a better sense of what went
>>> wrong.  Without further information, I can't say any more.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm Othman Belhaj, a software engineer from société générale in France.
>>>> I'm actually using your recent version of manifoldCF 2.8 . I'm working on
>>>> an internal search engine. For this reason, I'm using manifoldcf in order
>>>> to index documents on windows shares. I encountered a serious problem while
>>>> crawling 35K documents. Most of the time, when manifoldcf start crawling
a
>>>> big sized documents (19Mo for example), it ends the job with the following
>>>> error: repeated service interruptions - failure processing document :
>>>> software caused connection abort: socket write error.
>>>> Can you give me some tips on how to solve this problem, please ?
>>>>
>>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 .
>>>> I'm looking forward for your response.
>>>>
>>>> Best regards,
>>>>
>>>> Othman BELHAJ
>>>>
>>>
>>>
>

Mime
View raw message