manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Mon, 04 Sep 2017 08:53:04 GMT
Hi Karl,

This morning, I have tried the zookeeper based file and it worked really
good. However, I still have one error which is bugging me. It is a socket
write error. You will find attached the simple history report.
Surprisingly, I didn't have any stack trace in the ManifoldCF log file.

Best regards,

Othman.

On Fri, 1 Sep 2017 at 19:39, Karl Wright <daddywri@gmail.com> wrote:

> This is from file locking yet again.
>
> I have uploaded a new RC.  Please download and try out the zookeeper
> locking.
>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>
> Karl
>
>
> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
>
>> There is another issue as well that gives the following stack trace.
>>
>> Othman.
>>
>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <i93othman@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the version 3.9
>>> of POI and when I changed the version to 3.15 it worked fine. I really want
>>> to try the zookeeper if as you told me its performance is better than the
>>> file-based example. For the time being, I'm using the file-based because it
>>> is the only part that works for me but I actually need a stable version for
>>> my production environment. That is one point.
>>> Another point is, the path's tab is still an issue for me because I
>>> exclude some files and it still crawls them. I want to exclude some
>>> specific extensions of files and some specific directories. For instance, i
>>> don't want to index .exe files and contains a specific word. I do as
>>> follows I make the first exclude with *.exe and the second one with *word*.
>>> Only the second one which doesn't work. How can I solve this issue, please?
>>>
>>> Thank you very much, have a nice week-end,
>>>
>>> Othman
>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Othman,
>>>>
>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper issue.
>>>>
>>>> The failure you are seeing is "NoSuchMethodError".  Therefore, the
>>>> class is being found, but it is the *wrong* class.  When you deployed the
>>>> new release, did you deploy it in a new directory, or did you overwrite the
>>>> previous deployment?  If you overwrote it, you probably have multiple
>>>> versions of the POI jars.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I have just tried the new release of ManifoldCF. At first, the first
>>>>> job ended normally, but in the second I got a new stack trace concerning
>>>>> the POI. Moreover, the runzookeeper.bat doesn't run properly. It shows me
>>>>> the stack trace attached.
>>>>>
>>>>> Ps:
>>>>> The second attached file contains the POI stack trace.
>>>>>
>>>>> Othman.
>>>>>
>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Hi Othman,
>>>>>>
>>>>>> You do not need a new database instance.
>>>>>>
>>>>>> You can download MCF 2.8.1 RC0 from here:
>>>>>>
>>>>>>
>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> Thank you very much for your help, I'm going to try out the
>>>>>>> zookeeper example. Should I initialize a new database? And how can I run
>>>>>>> the zookeeper start-agent ?
>>>>>>>
>>>>>>> Othman.
>>>>>>>
>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <daddywri@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Othman,
>>>>>>>>
>>>>>>>> These exceptions are now coming from file locking and are due to
>>>>>>>> permissions problems.  I suggest you go to Zookeeper for file locking.
>>>>>>>>
>>>>>>>> I am building a 2.8.1 release candidate.  When it available for
>>>>>>>> download, I'll send you the URL.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>>
>>>>>>>>> This morning, I have followed the steps you told me to do and I
>>>>>>>>> still got stack traces. I have attached the stack traces as well as the
>>>>>>>>> content of my lib repo and option.env.
>>>>>>>>> I have installed zookeeper and I'm ready to use the zookeeper
>>>>>>>>> example. Could you guide through it? I don't know if I follow the same
>>>>>>>>> steps in the file based example, I may not get stack traces.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Othman
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Please do the following:
>>>>>>>>>>
>>>>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to
>>>>>>>>>> lib.
>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib.
>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib.
>>>>>>>>>> (6) Modify your options.env to include all of the jars you moved.
>>>>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>>>>> (8) If you still get stack traces, please send them to me.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <
>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>
>>>>>>>>>>> By 'other place', do you mean the \lib repository? If that so,
>>>>>>>>>>> then I have already tried it and it didn't work.
>>>>>>>>>>>
>>>>>>>>>>> Othman.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>
>>>>>>>>>>>> I used the java dependency inspector to see what the issue is
>>>>>>>>>>>> and it turns out that poi-ooxml.jar does refer back to poi.jar in the class
>>>>>>>>>>>> that is failing.  So you will need to move poi-3.15.jar and
>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm very unhappy about the quality of the POI project code; it
>>>>>>>>>>>> is definitely not using reasonable engineering practices, and I will be
>>>>>>>>>>>> opening a ticket with them.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm using the file based example and all the changes you told
>>>>>>>>>>>>> me to do. I reproduced them in the file based example. I'll try to install
>>>>>>>>>>>>> zookeeper and use the zookeeper example. Will I need a configuration to do
>>>>>>>>>>>>> in order to run the zookeeper example ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you using the zookeeper example, or the file-based
>>>>>>>>>>>>>> example?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If these jars have all been moved, and the options.env
>>>>>>>>>>>>>> includes them, then I have to conclude that Apache POI's pom.xml is
>>>>>>>>>>>>>> incorrect too.  It will take a while to figure out what's missing that
>>>>>>>>>>>>>> poi-ooxml.jar needs that is not listed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All the dependencies you mentioned have already been added
>>>>>>>>>>>>>>> in the options.env.win file in the multiprocess-file-example repository.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <
>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be
>>>>>>>>>>>>>>>> the one in the multiprocess-zk-example document or
>>>>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm
>>>>>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new version of ManifoldCF?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a different
>>>>>>>>>>>>>>>>>>> is showing. You will find the stack trace attached.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have
>>>>>>>>>>>>>>>>>>>> moved poi-3.15.jar.  Please move that back, and
>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of
>>>>>>>>>>>>>>>>>>>>> poi.jar must also be included.  This would mean that curvesapi-1.04.jar and
>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and
>>>>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, there is another error showing.
>>>>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will find attached the stack trace.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back
>>>>>>>>>>>>>>>>>>>>>>> into another jar, which will also need to be moved.  *That* jar has yet
>>>>>>>>>>>>>>>>>>>>>>> another dependency too.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My
>>>>>>>>>>>>>>>>>>>>>>>> apologies for the bad quality of the image, I'm doing my best to send you
>>>>>>>>>>>>>>>>>>>>>>>> the stack trace as I don't have the right to send documents outside the
>>>>>>>>>>>>>>>>>>>>>>>> company.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose what
>>>>>>>>>>>>>>>>>>>>>>>>> the problem is.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I
>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you
>>>>>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the regular expressions? How can I
>>>>>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the external
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika server transformer rather than the embedded Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see it in the attached file. For your information, the job started
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For security reasons, I can't send any files from my computer. I have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> copied the stack trace and scanned it with my cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' works in the specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for the documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(SON)* . the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifying information on the "Paths" tab in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares.  There is end-user documentation both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary distributions that describe how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start using zookeeper and I will let you know if it works. I have another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to make some filters while crawling. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and some folders. Could you give me an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does the regex allow to use /i to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with getting file permissions right, and they do
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes down cleanly, and zookeeper is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files into memory so you do not need huge amounts of memory.  The default
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> values are more than enough for 35,000 files, which is a pretty small job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to know how is zookeeper different from file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's memory. How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the ManifoldCF log file and extracted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; disk
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may be full. Shutting down process; locks may be left dangling. You must
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if there's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an error that looks like it might go away on retry, but does not.  It can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be either on the repository side or on the output side.  If you look at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simple History in the UI, or at the manifoldcf.log file, you should be able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to get a better sense of what went wrong.  Without further information, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from société générale in France. I'm actually using your recent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm working on an internal search engine. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifoldcf in order to index documents on windows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious problem while crawling 35K documents. Most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf start crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with the following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure processing document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>

Mime
View raw message