manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 08 Sep 2017 17:27:54 GMT
Hi Karl,

My zookeeper is still pointing to the HSQL database. What should I do in
order to change it so that it points to my PostgreSQL database ?

Best regards,

Othman Belhaj .

On Wed, 6 Sep 2017 at 15:34, Beelz Ryuzaki <i93othman@gmail.com> wrote:

> Thank you, Karl. I will try to combine Postgresql with zookeeper and let
> you know.
>
> Othman.
>
> On Wed, 6 Sep 2017 at 13:18, Karl Wright <daddywri@gmail.com> wrote:
>
>> No, you can use whatever supported database you like.
>>
>> Karl
>>
>>
>> On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki <i93othman@gmail.com>
>> wrote:
>>
>>> As far as I know, when you use zookeeper , you obligatory need to use
>>> HSQLDB to go with it, right?
>>>
>>> Thanks,
>>> Othman
>>>
>>> On Wed, 6 Sep 2017 at 12:56, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Othman,
>>>>
>>>> HSQLDB stores all tables in memory so you need to size it accordingly.
>>>> That is one reason we prefer Postgresql for production deployments.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Wed, Sep 6, 2017 at 6:21 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> I resolved the elasticsearch problem however the application doesn't
>>>>> seem to work after I have run a job to crawl over 500k documents. I get an
>>>>> GC overhead limit exceeded in the hsql database. How many should I allocate
>>>>> for it?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Othman
>>>>>
>>>>> On Tue, 5 Sep 2017 at 12:43, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Hi Othman,
>>>>>>
>>>>>> Thanks for doing the evaluation of the problem.
>>>>>>
>>>>>> Generally, the ManifoldCF project does not have the expertise to
>>>>>> diagnose problems with external systems like Solr or Elasticsearch.  So
>>>>>> going to another newsgroup for those kinds of issues would be a good idea.
>>>>>>
>>>>>> Thanks!
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> I have analyzed the error and found out that it was mainly an
>>>>>>> elasticsearch problem. I saw in some forums that one of the adopted
>>>>>>> solution is to modify elasticsearch.yml and set the http.max_content_length
>>>>>>> to a greater value. However, the job got stuck in the last two indexable
>>>>>>> files ( two pptx files with 22Mo and 2Mo respectively). The job eventually
>>>>>>> ended but a stack trace showed that elasticsearch ran out of memory. For
>>>>>>> your information, I have allocated 4Go for elasticsearch execution. Is it
>>>>>>> enough in order to have a good performance. You will find attached the
>>>>>>> stack traces of elasticsearch.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Othman BELHAJ.
>>>>>>>
>>>>>>> On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I'm sorry to bother on your holiday. I will try to analyze it today
>>>>>>>> and let it you know what I have found. Enjoy your day !
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Othman BELHAJ.
>>>>>>>>
>>>>>>>> On Mon, 4 Sep 2017 at 16:06, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Othman,
>>>>>>>>>
>>>>>>>>> I won't be able to look at this today; it is a holiday here.  But,
>>>>>>>>> the "socket write" error is coming from ElasticSearch.  If ES is configured
>>>>>>>>> to not accept documents greater than a certain size, that might explain
>>>>>>>>> it.  Maybe the ES logs would help?
>>>>>>>>>
>>>>>>>>> I'm afraid you're going to need to do the work to find out what is
>>>>>>>>> going wrong in those cases now.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki <i93othman@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>>
>>>>>>>>>> This morning, I have tried the zookeeper based file and it worked
>>>>>>>>>> really good. However, I still have one error which is bugging me. It is a
>>>>>>>>>> socket write error. You will find attached the simple history report.
>>>>>>>>>> Surprisingly, I didn't have any stack trace in the ManifoldCF log file.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Othman.
>>>>>>>>>>
>>>>>>>>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright <daddywri@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is from file locking yet again.
>>>>>>>>>>>
>>>>>>>>>>> I have uploaded a new RC.  Please download and try out the
>>>>>>>>>>> zookeeper locking.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <
>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> There is another issue as well that gives the following stack
>>>>>>>>>>>> trace.
>>>>>>>>>>>>
>>>>>>>>>>>> Othman.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the
>>>>>>>>>>>>> version 3.9 of POI and when I changed the version to 3.15 it worked fine. I
>>>>>>>>>>>>> really want to try the zookeeper if as you told me its performance is
>>>>>>>>>>>>> better than the file-based example. For the time being, I'm using the
>>>>>>>>>>>>> file-based because it is the only part that works for me but I actually
>>>>>>>>>>>>> need a stable version for my production environment. That is one point.
>>>>>>>>>>>>> Another point is, the path's tab is still an issue for me
>>>>>>>>>>>>> because I exclude some files and it still crawls them. I want to exclude
>>>>>>>>>>>>> some specific extensions of files and some specific directories. For
>>>>>>>>>>>>> instance, i don't want to index .exe files and contains a specific word. I
>>>>>>>>>>>>> do as follows I make the first exclude with *.exe and the second one with
>>>>>>>>>>>>> *word*. Only the second one which doesn't work. How can I solve this issue,
>>>>>>>>>>>>> please?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much, have a nice week-end,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman
>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper
>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The failure you are seeing is "NoSuchMethodError".
>>>>>>>>>>>>>> Therefore, the class is being found, but it is the *wrong* class.  When you
>>>>>>>>>>>>>> deployed the new release, did you deploy it in a new directory, or did you
>>>>>>>>>>>>>> overwrite the previous deployment?  If you overwrote it, you probably have
>>>>>>>>>>>>>> multiple versions of the POI jars.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have just tried the new release of ManifoldCF. At first,
>>>>>>>>>>>>>>> the first job ended normally, but in the second I got a new stack trace
>>>>>>>>>>>>>>> concerning the POI. Moreover, the runzookeeper.bat doesn't run properly. It
>>>>>>>>>>>>>>> shows me the stack trace attached.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ps:
>>>>>>>>>>>>>>> The second attached file contains the POI stack trace.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You do not need a new database instance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You can download MCF 2.8.1 RC0 from here:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you very much for your help, I'm going to try out
>>>>>>>>>>>>>>>>> the zookeeper example. Should I initialize a new database? And how can I
>>>>>>>>>>>>>>>>> run the zookeeper start-agent ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> These exceptions are now coming from file locking and are
>>>>>>>>>>>>>>>>>> due to permissions problems.  I suggest you go to Zookeeper for file
>>>>>>>>>>>>>>>>>> locking.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am building a 2.8.1 release candidate.  When it
>>>>>>>>>>>>>>>>>> available for download, I'll send you the URL.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This morning, I have followed the steps you told me to
>>>>>>>>>>>>>>>>>>> do and I still got stack traces. I have attached the stack traces as well
>>>>>>>>>>>>>>>>>>> as the content of my lib repo and option.env.
>>>>>>>>>>>>>>>>>>> I have installed zookeeper and I'm ready to use the
>>>>>>>>>>>>>>>>>>> zookeeper example. Could you guide through it? I don't know if I follow the
>>>>>>>>>>>>>>>>>>> same steps in the file based example, I may not get stack traces.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please do the following:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>> (3) Move commons-collections4*.jar from
>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to
>>>>>>>>>>>>>>>>>>>> lib.
>>>>>>>>>>>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to
>>>>>>>>>>>>>>>>>>>> lib.
>>>>>>>>>>>>>>>>>>>> (6) Modify your options.env to include all of the jars
>>>>>>>>>>>>>>>>>>>> you moved.
>>>>>>>>>>>>>>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>> (8) If you still get stack traces, please send them to
>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> By 'other place', do you mean the \lib repository? If
>>>>>>>>>>>>>>>>>>>>> that so, then I have already tried it and it didn't work.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I used the java dependency inspector to see what the
>>>>>>>>>>>>>>>>>>>>>> issue is and it turns out that poi-ooxml.jar does refer back to poi.jar in
>>>>>>>>>>>>>>>>>>>>>> the class that is failing.  So you will need to move poi-3.15.jar and
>>>>>>>>>>>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI project
>>>>>>>>>>>>>>>>>>>>>> code; it is definitely not using reasonable engineering practices, and I
>>>>>>>>>>>>>>>>>>>>>> will be opening a ticket with them.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm using the file based example and all the changes
>>>>>>>>>>>>>>>>>>>>>>> you told me to do. I reproduced them in the file based example. I'll try to
>>>>>>>>>>>>>>>>>>>>>>> install zookeeper and use the zookeeper example. Will I need a
>>>>>>>>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the
>>>>>>>>>>>>>>>>>>>>>>>> file-based example?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If these jars have all been moved, and the
>>>>>>>>>>>>>>>>>>>>>>>> options.env includes them, then I have to conclude that Apache POI's
>>>>>>>>>>>>>>>>>>>>>>>> pom.xml is incorrect too.  It will take a while to figure out what's
>>>>>>>>>>>>>>>>>>>>>>>> missing that poi-ooxml.jar needs that is not listed.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have already
>>>>>>>>>>>>>>>>>>>>>>>>> been added in the options.env.win file in the multiprocess-file-example
>>>>>>>>>>>>>>>>>>>>>>>>> repository.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file.
>>>>>>>>>>>>>>>>>>>>>>>>>> Should it be the one in the multiprocess-zk-example document or
>>>>>>>>>>>>>>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's
>>>>>>>>>>>>>>>>>>>>>>>>>>>> version ? I'm actually using 2.1.0 which is pretty old for this new version
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of ManifoldCF?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different is showing. You will find the stack trace attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not have moved poi-3.15.jar.  Please move that back, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies of poi.jar must also be included.  This would mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that curvesapi-1.04.jar and commons-collections4-4.1.jar should also be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned and another one : poi-3.15.jar . Unfortunately, there is another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error showing. This time, it concerns excel files. You will find attached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls back into another jar, which will also need to be moved.  *That* jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has yet another dependency too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My apologies for the bad quality of the image, I'm doing my best to send
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you the stack trace as I don't have the right to send documents outside the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diagnose what the problem is.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem. I looked into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and you expected the crawling resumed. How about the regular expressions?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How can I make complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you know if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your options.env files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another workaround you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your crawl resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the external Tika server transformer rather than the embedded Tika
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extractor.  I'm still looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> latest binary version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can see it in the attached file. For your information, the job started
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> address this, if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version. For security reasons, I can't send any files from my computer. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have copied the stack trace and scanned it with my cellphone. I hope it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be helpful. Meanwhile, I have read the documentation about how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling and I don't think the '|' works in the specified. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, I would like to restrict the crawling for the documents that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed as follows: *(SON)* . the document is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed that it didn't take it into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with the windows share connector is by specifying information on the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Paths" tab in jobs that crawl windows shares.  There is end-user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation both online and distributed with all binary distributions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that describe how to do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will start using zookeeper and I will let you know if it works. I have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another question to ask. Actually, I need to make some filters while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to crawl some files and some folders. Could you give
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me an example of how to use the regex. Does the regex allow to use /i to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated because people often have problems with getting file permissions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> right, and they do not understand how to shut processes down cleanly, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient against that.  I highly recommend using zookeeper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not put files into memory so you do not need huge amounts of memory.  The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default values are more than enough for 35,000 files, which is a pretty
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> small job for ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper. i want to know how is zookeeper different from file based sync?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also need a guidance on how to manage my pc's memory. How many Go should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I allocate for the start-agent of ManifoldCF? Is 4Go enough in order to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for some reason, and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead of file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> still get failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 9:37 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your quick response. I have looked into the ManifoldCF log file and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lock 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file; disk may be full. Shutting down process; locks may be left dangling.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You must cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being the elasticsearch output connection. Moreover, the job uses Tika to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a file system as a repository connection. During the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the content of the documents. I was wandering if the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if there's an error that looks like it might go away on retry, but does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not.  It can be either on the repository side or on the output side.  If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you look at the Simple History in the UI, or at the manifoldcf.log file,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get a better sense of what went wrong.  Without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5:33 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software engineer from société générale in France. I'm actually using your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recent version of manifoldCF 2.8 . I'm working on an internal search
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine. For this reason, I'm using manifoldcf in order to index documents
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I encountered a serious problem while crawling 35K
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the time, when manifoldcf start crawling a big sized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for example), it ends the job with the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repeated service interruptions - failure processing document : software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused connection abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tips on how to solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>

Mime
View raw message