manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 01 Sep 2017 10:20:54 GMT
Hi Othman,

You do not need a new database instance.

You can download MCF 2.8.1 RC0 from here:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1

Karl


On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:

> Hi Karl,
>
> Thank you very much for your help, I'm going to try out the zookeeper
> example. Should I initialize a new database? And how can I run the
> zookeeper start-agent ?
>
> Othman.
>
> On Fri, 1 Sep 2017 at 11:37, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Othman,
>>
>> These exceptions are now coming from file locking and are due to
>> permissions problems.  I suggest you go to Zookeeper for file locking.
>>
>> I am building a 2.8.1 release candidate.  When it available for download,
>> I'll send you the URL.
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>>
>>> This morning, I have followed the steps you told me to do and I still
>>> got stack traces. I have attached the stack traces as well as the content
>>> of my lib repo and option.env.
>>> I have installed zookeeper and I'm ready to use the zookeeper example.
>>> Could you guide through it? I don't know if I follow the same steps in the
>>> file based example, I may not get stack traces.
>>>
>>> Thanks,
>>> Othman
>>>
>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Please do the following:
>>>>
>>>> (0) Shut down all ManifoldCF processes.
>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>> (3) Move commons-collections4*.jar from connector-common-lib to lib.
>>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>>> (6) Modify your options.env to include all of the jars you moved.
>>>> (7) Start up all ManifoldCF processes.
>>>> (8) If you still get stack traces, please send them to me.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <i93othman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> By 'other place', do you mean the \lib repository? If that so, then I
>>>>> have already tried it and it didn't work.
>>>>>
>>>>> Othman.
>>>>>
>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Hi Othman,
>>>>>>
>>>>>> I used the java dependency inspector to see what the issue is and it
>>>>>> turns out that poi-ooxml.jar does refer back to poi.jar in the class that
>>>>>> is failing.  So you will need to move poi-3.15.jar and
>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>
>>>>>> Let's hope that finally fixes this issue.
>>>>>>
>>>>>> I'm very unhappy about the quality of the POI project code; it is
>>>>>> definitely not using reasonable engineering practices, and I will be
>>>>>> opening a ticket with them.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm using the file based example and all the changes you told me to
>>>>>>> do. I reproduced them in the file based example. I'll try to install
>>>>>>> zookeeper and use the zookeeper example. Will I need a configuration to do
>>>>>>> in order to run the zookeeper example ?
>>>>>>>
>>>>>>> Othman.
>>>>>>>
>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are you using the zookeeper example, or the file-based example?
>>>>>>>>
>>>>>>>> If these jars have all been moved, and the options.env includes
>>>>>>>> them, then I have to conclude that Apache POI's pom.xml is incorrect too.
>>>>>>>> It will take a while to figure out what's missing that poi-ooxml.jar needs
>>>>>>>> that is not listed.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> All the dependencies you mentioned have already been added in the
>>>>>>>>> options.env.win file in the multiprocess-file-example repository.
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the one
>>>>>>>>>> in the multiprocess-zk-example document or multiprocess-file-example ?
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm actually
>>>>>>>>>>>> using 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>>>>>>>
>>>>>>>>>>>> Othman.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I moved back both the jars you mentioned and a different is
>>>>>>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>>>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar
>>>>>>>>>>>>>>> must also be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another
>>>>>>>>>>>>>>>> one : poi-3.15.jar . Unfortunately, there is another error showing. This
>>>>>>>>>>>>>>>> time, it concerns excel files. You will find attached the stack trace.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into
>>>>>>>>>>>>>>>>> another jar, which will also need to be moved.  *That* jar has yet another
>>>>>>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies for
>>>>>>>>>>>>>>>>>> the bad quality of the image, I'm doing my best to send you the stack trace
>>>>>>>>>>>>>>>>>> as I don't have the right to send documents outside the company.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the
>>>>>>>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked into
>>>>>>>>>>>>>>>>>>>> the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/
>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you expected
>>>>>>>>>>>>>>>>>>>>> the crawling resumed. How about the regular expressions? How can I make
>>>>>>>>>>>>>>>>>>>>> complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it
>>>>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env
>>>>>>>>>>>>>>>>>>>>>>> files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround
>>>>>>>>>>>>>>>>>>>>>>>> you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl
>>>>>>>>>>>>>>>>>>>>>>>> resumes.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external Tika
>>>>>>>>>>>>>>>>>>>>>>>>> server transformer rather than the embedded Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary
>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it
>>>>>>>>>>>>>>>>>>>>>>>>>> in the attached file. For your information, the job started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is
>>>>>>>>>>>>>>>>>>>>>>>>>>> missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this,
>>>>>>>>>>>>>>>>>>>>>>>>>>> if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from my computer. I have copied
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying information on the "Paths" tab in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares.  There is end-user documentation both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary distributions that describe how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it works. I have another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some filters while crawling. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. Could you give me an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex allow to use /i to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because people
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> often have problems with getting file permissions right, and they do not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understand how to shut processes down cleanly, and zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts of memory.  The default values
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which is a pretty small job for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know how is zookeeper different from file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get failures
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the ManifoldCF log file and extracted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be full. Shutting down process; locks may be left dangling. You must
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away on retry, but does not.  It can be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the output side.  If you look at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the manifoldcf.log file, you should be able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong.  Without further information, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from société générale in France. I'm actually using your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldCF 2.8 . I'm working on an internal search engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using manifoldcf in order to index documents on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> encountered a serious problem while crawling 35K documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time, when manifoldcf start crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

Mime
View raw message