manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 01 Sep 2017 09:42:09 GMT
Hi Karl,

Thank you very much for your help, I'm going to try out the zookeeper
example. Should I initialize a new database? And how can I run the
zookeeper start-agent ?

Othman.

On Fri, 1 Sep 2017 at 11:37, Karl Wright <daddywri@gmail.com> wrote:

> Hi Othman,
>
> These exceptions are now coming from file locking and are due to
> permissions problems.  I suggest you go to Zookeeper for file locking.
>
> I am building a 2.8.1 release candidate.  When it available for download,
> I'll send you the URL.
>
> Thanks,
> Karl
>
>
> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
>
>> Hi Karl,
>>
>> This morning, I have followed the steps you told me to do and I still got
>> stack traces. I have attached the stack traces as well as the content of my
>> lib repo and option.env.
>> I have installed zookeeper and I'm ready to use the zookeeper example.
>> Could you guide through it? I don't know if I follow the same steps in the
>> file based example, I may not get stack traces.
>>
>> Thanks,
>> Othman
>>
>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Please do the following:
>>>
>>> (0) Shut down all ManifoldCF processes.
>>> (1) Move poi*.jar from connector-common-lib to lib.
>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>> (3) Move commons-collections4*.jar from connector-common-lib to lib.
>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>> (6) Modify your options.env to include all of the jars you moved.
>>> (7) Start up all ManifoldCF processes.
>>> (8) If you still get stack traces, please send them to me.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <i93othman@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> By 'other place', do you mean the \lib repository? If that so, then I
>>>> have already tried it and it didn't work.
>>>>
>>>> Othman.
>>>>
>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Hi Othman,
>>>>>
>>>>> I used the java dependency inspector to see what the issue is and it
>>>>> turns out that poi-ooxml.jar does refer back to poi.jar in the class that
>>>>> is failing.  So you will need to move poi-3.15.jar and
>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>
>>>>> Let's hope that finally fixes this issue.
>>>>>
>>>>> I'm very unhappy about the quality of the POI project code; it is
>>>>> definitely not using reasonable engineering practices, and I will be
>>>>> opening a ticket with them.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm using the file based example and all the changes you told me to
>>>>>> do. I reproduced them in the file based example. I'll try to install
>>>>>> zookeeper and use the zookeeper example. Will I need a configuration to do
>>>>>> in order to run the zookeeper example ?
>>>>>>
>>>>>> Othman.
>>>>>>
>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <daddywri@gmail.com> wrote:
>>>>>>
>>>>>>> Are you using the zookeeper example, or the file-based example?
>>>>>>>
>>>>>>> If these jars have all been moved, and the options.env includes
>>>>>>> them, then I have to conclude that Apache POI's pom.xml is incorrect too.
>>>>>>> It will take a while to figure out what's missing that poi-ooxml.jar needs
>>>>>>> that is not listed.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <i93othman@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> All the dependencies you mentioned have already been added in the
>>>>>>>> options.env.win file in the multiprocess-file-example repository.
>>>>>>>>
>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the one
>>>>>>>>> in the multiprocess-zk-example document or multiprocess-file-example ?
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm actually
>>>>>>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>>>>>>
>>>>>>>>>>> Othman.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I moved back both the jars you mentioned and a different is
>>>>>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Othman
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar must
>>>>>>>>>>>>>> also be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another one
>>>>>>>>>>>>>>> : poi-3.15.jar . Unfortunately, there is another error showing. This time,
>>>>>>>>>>>>>>> it concerns excel files. You will find attached the stack trace.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into
>>>>>>>>>>>>>>>> another jar, which will also need to be moved.  *That* jar has yet another
>>>>>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies for
>>>>>>>>>>>>>>>>> the bad quality of the image, I'm doing my best to send you the stack trace
>>>>>>>>>>>>>>>>> as I don't have the right to send documents outside the company.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the
>>>>>>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into
>>>>>>>>>>>>>>>>>>> the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected
>>>>>>>>>>>>>>>>>>>> the crawling resumed. How about the regular expressions? How can I make
>>>>>>>>>>>>>>>>>>>> complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it
>>>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env
>>>>>>>>>>>>>>>>>>>>>> files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround you
>>>>>>>>>>>>>>>>>>>>>>> could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl resumes.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika
>>>>>>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary version,
>>>>>>>>>>>>>>>>>>>>>>>>> and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it in
>>>>>>>>>>>>>>>>>>>>>>>>> the attached file. For your information, the job started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is
>>>>>>>>>>>>>>>>>>>>>>>>>> missing.
>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this, if
>>>>>>>>>>>>>>>>>>>>>>>>>> you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For
>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from my computer. I have copied
>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the windows
>>>>>>>>>>>>>>>>>>>>>>>>>>>> share connector is by specifying information on the "Paths" tab in jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that crawl windows shares.  There is end-user documentation both online and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> distributed with all binary distributions that describe how to do this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it works. I have another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some filters while crawling. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. Could you give me an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex allow to use /i to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file permissions right, and they do not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts of memory.  The default values
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which is a pretty small job for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know how is zookeeper different from file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have looked into the ManifoldCF log file and extracted the following
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full. Shutting down process; locks may be left dangling. You must cleanup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away on retry, but does not.  It can be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the output side.  If you look at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the manifoldcf.log file, you should be able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong.  Without further information, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from société générale in France. I'm actually using your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal search engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Mime
View raw message