manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 01 Sep 2017 14:46:28 GMT
Hi Othman,

I will respin a new 2.8.1 (RC1) to address the zookeeper issue.

The failure you are seeing is "NoSuchMethodError".  Therefore, the class is
being found, but it is the *wrong* class.  When you deployed the new
release, did you deploy it in a new directory, or did you overwrite the
previous deployment?  If you overwrote it, you probably have multiple
versions of the POI jars.

Karl


On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:

> Hi Karl,
>
> I have just tried the new release of ManifoldCF. At first, the first job
> ended normally, but in the second I got a new stack trace concerning the
> POI. Moreover, the runzookeeper.bat doesn't run properly. It shows me the
> stack trace attached.
>
> Ps:
> The second attached file contains the POI stack trace.
>
> Othman.
>
> On Fri, 1 Sep 2017 at 12:21, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Othman,
>>
>> You do not need a new database instance.
>>
>> You can download MCF 2.8.1 RC0 from here:
>>
>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>
>> Karl
>>
>>
>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <i93othman@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>>
>>> Thank you very much for your help, I'm going to try out the zookeeper
>>> example. Should I initialize a new database? And how can I run the
>>> zookeeper start-agent ?
>>>
>>> Othman.
>>>
>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Othman,
>>>>
>>>> These exceptions are now coming from file locking and are due to
>>>> permissions problems.  I suggest you go to Zookeeper for file locking.
>>>>
>>>> I am building a 2.8.1 release candidate.  When it available for
>>>> download, I'll send you the URL.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> This morning, I have followed the steps you told me to do and I still
>>>>> got stack traces. I have attached the stack traces as well as the content
>>>>> of my lib repo and option.env.
>>>>> I have installed zookeeper and I'm ready to use the zookeeper example.
>>>>> Could you guide through it? I don't know if I follow the same steps in the
>>>>> file based example, I may not get stack traces.
>>>>>
>>>>> Thanks,
>>>>> Othman
>>>>>
>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Please do the following:
>>>>>>
>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to lib.
>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>>>>> (6) Modify your options.env to include all of the jars you moved.
>>>>>> (7) Start up all ManifoldCF processes.
>>>>>> (8) If you still get stack traces, please send them to me.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> By 'other place', do you mean the \lib repository? If that so, then
>>>>>>> I have already tried it and it didn't work.
>>>>>>>
>>>>>>> Othman.
>>>>>>>
>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Othman,
>>>>>>>>
>>>>>>>> I used the java dependency inspector to see what the issue is and
>>>>>>>> it turns out that poi-ooxml.jar does refer back to poi.jar in the class
>>>>>>>> that is failing.  So you will need to move poi-3.15.jar and
>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>
>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>
>>>>>>>> I'm very unhappy about the quality of the POI project code; it is
>>>>>>>> definitely not using reasonable engineering practices, and I will be
>>>>>>>> opening a ticket with them.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I'm using the file based example and all the changes you told me
>>>>>>>>> to do. I reproduced them in the file based example. I'll try to install
>>>>>>>>> zookeeper and use the zookeeper example. Will I need a configuration to do
>>>>>>>>> in order to run the zookeeper example ?
>>>>>>>>>
>>>>>>>>> Othman.
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Are you using the zookeeper example, or the file-based example?
>>>>>>>>>>
>>>>>>>>>> If these jars have all been moved, and the options.env includes
>>>>>>>>>> them, then I have to conclude that Apache POI's pom.xml is incorrect too.
>>>>>>>>>> It will take a while to figure out what's missing that poi-ooxml.jar needs
>>>>>>>>>> that is not listed.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> All the dependencies you mentioned have already been added in
>>>>>>>>>>> the options.env.win file in the multiprocess-file-example repository.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be the
>>>>>>>>>>>> one in the multiprocess-zk-example document or multiprocess-file-example ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm
>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a different is
>>>>>>>>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have moved
>>>>>>>>>>>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of poi.jar
>>>>>>>>>>>>>>>>> must also be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and another
>>>>>>>>>>>>>>>>>> one : poi-3.15.jar . Unfortunately, there is another error showing. This
>>>>>>>>>>>>>>>>>> time, it concerns excel files. You will find attached the stack trace.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back into
>>>>>>>>>>>>>>>>>>> another jar, which will also need to be moved.  *That* jar has yet another
>>>>>>>>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My apologies
>>>>>>>>>>>>>>>>>>>> for the bad quality of the image, I'm doing my best to send you the stack
>>>>>>>>>>>>>>>>>>>> trace as I don't have the right to send documents outside the company.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what the
>>>>>>>>>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I looked
>>>>>>>>>>>>>>>>>>>>>> into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/
>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you
>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the regular expressions? How can I
>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know if it
>>>>>>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your options.env
>>>>>>>>>>>>>>>>>>>>>>>>> files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another workaround
>>>>>>>>>>>>>>>>>>>>>>>>>> you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl
>>>>>>>>>>>>>>>>>>>>>>>>>> resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external
>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika server transformer rather than the embedded Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can see it
>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the attached file. For your information, the job started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address this,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> security reasons, I can't send any files from my computer. I have copied
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace and scanned it with my cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying information on the "Paths" tab in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares.  There is end-user documentation both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary distributions that describe how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper and I will let you know if it works. I have another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some filters while crawling. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. Could you give me an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex allow to use /i to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with getting file permissions right, and they do
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes down cleanly, and zookeeper is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into memory so you do not need huge amounts of memory.  The default values
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are more than enough for 35,000 files, which is a pretty small job for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to know how is zookeeper different from file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the ManifoldCF log file and extracted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be full. Shutting down process; locks may be left dangling. You must
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error that looks like it might go away on retry, but does not.  It can be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> either on the repository side or on the output side.  If you look at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the manifoldcf.log file, you should be able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong.  Without further information, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from société générale in France. I'm actually using your recent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm working on an internal search engine. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifoldcf in order to index documents on windows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious problem while crawling 35K documents. Most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf start crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

Mime
View raw message