manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 08 Sep 2017 18:07:37 GMT
Sorry to bother you again, but what is the difference between indexable
files and files in the path tab of a job ?

Thanks,

Othman BELHAJ

On Fri, 8 Sep 2017 at 19:27, Beelz Ryuzaki <i93othman@gmail.com> wrote:

> Hi Karl,
>
> My zookeeper is still pointing to the HSQL database. What should I do in
> order to change it so that it points to my PostgreSQL database ?
>
> Best regards,
>
> Othman Belhaj .
>
> On Wed, 6 Sep 2017 at 15:34, Beelz Ryuzaki <i93othman@gmail.com> wrote:
>
>> Thank you, Karl. I will try to combine Postgresql with zookeeper and let
>> you know.
>>
>> Othman.
>>
>> On Wed, 6 Sep 2017 at 13:18, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> No, you can use whatever supported database you like.
>>>
>>> Karl
>>>
>>>
>>> On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>> wrote:
>>>
>>>> As far as I know, when you use zookeeper , you obligatory need to use
>>>> HSQLDB to go with it, right?
>>>>
>>>> Thanks,
>>>> Othman
>>>>
>>>> On Wed, 6 Sep 2017 at 12:56, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Hi Othman,
>>>>>
>>>>> HSQLDB stores all tables in memory so you need to size it
>>>>> accordingly.  That is one reason we prefer Postgresql for production
>>>>> deployments.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Wed, Sep 6, 2017 at 6:21 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> I resolved the elasticsearch problem however the application doesn't
>>>>>> seem to work after I have run a job to crawl over 500k documents. I get an
>>>>>> GC overhead limit exceeded in the hsql database. How many should I allocate
>>>>>> for it?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Othman
>>>>>>
>>>>>> On Tue, 5 Sep 2017 at 12:43, Karl Wright <daddywri@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Othman,
>>>>>>>
>>>>>>> Thanks for doing the evaluation of the problem.
>>>>>>>
>>>>>>> Generally, the ManifoldCF project does not have the expertise to
>>>>>>> diagnose problems with external systems like Solr or Elasticsearch.  So
>>>>>>> going to another newsgroup for those kinds of issues would be a good idea.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> I have analyzed the error and found out that it was mainly an
>>>>>>>> elasticsearch problem. I saw in some forums that one of the adopted
>>>>>>>> solution is to modify elasticsearch.yml and set the http.max_content_length
>>>>>>>> to a greater value. However, the job got stuck in the last two indexable
>>>>>>>> files ( two pptx files with 22Mo and 2Mo respectively). The job eventually
>>>>>>>> ended but a stack trace showed that elasticsearch ran out of memory. For
>>>>>>>> your information, I have allocated 4Go for elasticsearch execution. Is it
>>>>>>>> enough in order to have a good performance. You will find attached the
>>>>>>>> stack traces of elasticsearch.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Othman BELHAJ.
>>>>>>>>
>>>>>>>> On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Karl,
>>>>>>>>>
>>>>>>>>> I'm sorry to bother on your holiday. I will try to analyze it
>>>>>>>>> today and let it you know what I have found. Enjoy your day !
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Othman BELHAJ.
>>>>>>>>>
>>>>>>>>> On Mon, 4 Sep 2017 at 16:06, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Othman,
>>>>>>>>>>
>>>>>>>>>> I won't be able to look at this today; it is a holiday here.
>>>>>>>>>> But, the "socket write" error is coming from ElasticSearch.  If ES is
>>>>>>>>>> configured to not accept documents greater than a certain size, that might
>>>>>>>>>> explain it.  Maybe the ES logs would help?
>>>>>>>>>>
>>>>>>>>>> I'm afraid you're going to need to do the work to find out what
>>>>>>>>>> is going wrong in those cases now.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki <
>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>
>>>>>>>>>>> This morning, I have tried the zookeeper based file and it
>>>>>>>>>>> worked really good. However, I still have one error which is bugging me. It
>>>>>>>>>>> is a socket write error. You will find attached the simple history report.
>>>>>>>>>>> Surprisingly, I didn't have any stack trace in the ManifoldCF log file.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> Othman.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is from file locking yet again.
>>>>>>>>>>>>
>>>>>>>>>>>> I have uploaded a new RC.  Please download and try out the
>>>>>>>>>>>> zookeeper locking.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <
>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> There is another issue as well that gives the following stack
>>>>>>>>>>>>> trace.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <
>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the
>>>>>>>>>>>>>> version 3.9 of POI and when I changed the version to 3.15 it worked fine. I
>>>>>>>>>>>>>> really want to try the zookeeper if as you told me its performance is
>>>>>>>>>>>>>> better than the file-based example. For the time being, I'm using the
>>>>>>>>>>>>>> file-based because it is the only part that works for me but I actually
>>>>>>>>>>>>>> need a stable version for my production environment. That is one point.
>>>>>>>>>>>>>> Another point is, the path's tab is still an issue for me
>>>>>>>>>>>>>> because I exclude some files and it still crawls them. I want to exclude
>>>>>>>>>>>>>> some specific extensions of files and some specific directories. For
>>>>>>>>>>>>>> instance, i don't want to index .exe files and contains a specific word. I
>>>>>>>>>>>>>> do as follows I make the first exclude with *.exe and the second one with
>>>>>>>>>>>>>> *word*. Only the second one which doesn't work. How can I solve this issue,
>>>>>>>>>>>>>> please?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you very much, have a nice week-end,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper
>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The failure you are seeing is "NoSuchMethodError".
>>>>>>>>>>>>>>> Therefore, the class is being found, but it is the *wrong* class.  When you
>>>>>>>>>>>>>>> deployed the new release, did you deploy it in a new directory, or did you
>>>>>>>>>>>>>>> overwrite the previous deployment?  If you overwrote it, you probably have
>>>>>>>>>>>>>>> multiple versions of the POI jars.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have just tried the new release of ManifoldCF. At first,
>>>>>>>>>>>>>>>> the first job ended normally, but in the second I got a new stack trace
>>>>>>>>>>>>>>>> concerning the POI. Moreover, the runzookeeper.bat doesn't run properly. It
>>>>>>>>>>>>>>>> shows me the stack trace attached.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ps:
>>>>>>>>>>>>>>>> The second attached file contains the POI stack trace.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You do not need a new database instance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You can download MCF 2.8.1 RC0 from here:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you very much for your help, I'm going to try out
>>>>>>>>>>>>>>>>>> the zookeeper example. Should I initialize a new database? And how can I
>>>>>>>>>>>>>>>>>> run the zookeeper start-agent ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> These exceptions are now coming from file locking and
>>>>>>>>>>>>>>>>>>> are due to permissions problems.  I suggest you go to Zookeeper for file
>>>>>>>>>>>>>>>>>>> locking.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am building a 2.8.1 release candidate.  When it
>>>>>>>>>>>>>>>>>>> available for download, I'll send you the URL.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This morning, I have followed the steps you told me to
>>>>>>>>>>>>>>>>>>>> do and I still got stack traces. I have attached the stack traces as well
>>>>>>>>>>>>>>>>>>>> as the content of my lib repo and option.env.
>>>>>>>>>>>>>>>>>>>> I have installed zookeeper and I'm ready to use the
>>>>>>>>>>>>>>>>>>>> zookeeper example. Could you guide through it? I don't know if I follow the
>>>>>>>>>>>>>>>>>>>> same steps in the file based example, I may not get stack traces.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please do the following:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>> (3) Move commons-collections4*.jar from
>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to
>>>>>>>>>>>>>>>>>>>>> lib.
>>>>>>>>>>>>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to
>>>>>>>>>>>>>>>>>>>>> lib.
>>>>>>>>>>>>>>>>>>>>> (6) Modify your options.env to include all of the jars
>>>>>>>>>>>>>>>>>>>>> you moved.
>>>>>>>>>>>>>>>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>>> (8) If you still get stack traces, please send them to
>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> By 'other place', do you mean the \lib repository? If
>>>>>>>>>>>>>>>>>>>>>> that so, then I have already tried it and it didn't work.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I used the java dependency inspector to see what the
>>>>>>>>>>>>>>>>>>>>>>> issue is and it turns out that poi-ooxml.jar does refer back to poi.jar in
>>>>>>>>>>>>>>>>>>>>>>> the class that is failing.  So you will need to move poi-3.15.jar and
>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI
>>>>>>>>>>>>>>>>>>>>>>> project code; it is definitely not using reasonable engineering practices,
>>>>>>>>>>>>>>>>>>>>>>> and I will be opening a ticket with them.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm using the file based example and all the
>>>>>>>>>>>>>>>>>>>>>>>> changes you told me to do. I reproduced them in the file based example.
>>>>>>>>>>>>>>>>>>>>>>>> I'll try to install zookeeper and use the zookeeper example. Will I need a
>>>>>>>>>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the
>>>>>>>>>>>>>>>>>>>>>>>>> file-based example?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> If these jars have all been moved, and the
>>>>>>>>>>>>>>>>>>>>>>>>> options.env includes them, then I have to conclude that Apache POI's
>>>>>>>>>>>>>>>>>>>>>>>>> pom.xml is incorrect too.  It will take a while to figure out what's
>>>>>>>>>>>>>>>>>>>>>>>>> missing that poi-ooxml.jar needs that is not listed.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have already
>>>>>>>>>>>>>>>>>>>>>>>>>> been added in the options.env.win file in the multiprocess-file-example
>>>>>>>>>>>>>>>>>>>>>>>>>> repository.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Should it be the one in the multiprocess-zk-example document or
>>>>>>>>>>>>>>>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version ? I'm actually using 2.1.0 which is pretty old for this new version
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of ManifoldCF?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a different is showing. You will find the stack trace attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not have moved poi-3.15.jar.  Please move that back, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> though.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies of poi.jar must also be included.  This would mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that curvesapi-1.04.jar and commons-collections4-4.1.jar should also be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned and another one : poi-3.15.jar . Unfortunately, there is another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error showing. This time, it concerns excel files. You will find attached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls back into another jar, which will also need to be moved.  *That* jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has yet another dependency too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My apologies for the bad quality of the image, I'm doing my best to send
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you the stack trace as I don't have the right to send documents outside the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diagnose what the problem is.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem. I looked into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and you expected the crawling resumed. How about the regular expressions?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How can I make complex regular expressions in the job's paths tab ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you know if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your options.env files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another workaround you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your crawl resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the external Tika server transformer rather than the embedded Tika
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extractor.  I'm still looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> latest binary version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can see it in the attached file. For your information, the job started
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to address this, if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version. For security reasons, I can't send any files from my computer. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have copied the stack trace and scanned it with my cellphone. I hope it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be helpful. Meanwhile, I have read the documentation about how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling and I don't think the '|' works in the specified. For
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, I would like to restrict the crawling for the documents that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed as follows: *(SON)* . the document is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed that it didn't take it into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents with the windows share connector is by specifying information on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the "Paths" tab in jobs that crawl windows shares.  There is end-user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation both online and distributed with all binary distributions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that describe how to do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will start using zookeeper and I will let you know if it works. I have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another question to ask. Actually, I need to make some filters while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to crawl some files and some folders. Could you give
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me an example of how to use the regex. Does the regex allow to use /i to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated because people often have problems with getting file permissions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> right, and they do not understand how to shut processes down cleanly, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient against that.  I highly recommend using zookeeper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not put files into memory so you do not need huge amounts of memory.  The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default values are more than enough for 35,000 files, which is a pretty
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> small job for ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 11:58 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper. i want to know how is zookeeper different from file based sync?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also need a guidance on how to manage my pc's memory. How many Go should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I allocate for the start-agent of ManifoldCF? Is 4Go enough in order to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for some reason, and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync instead of file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> still get failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 9:37 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your quick response. I have looked into the ManifoldCF log file and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lock 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file; disk may be full. Shutting down process; locks may be left dangling.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You must cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being the elasticsearch output connection. Moreover, the job uses Tika to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extract metadata and a file system as a repository connection. During the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job, I don't extract the content of the documents. I was wandering if the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if there's an error that looks like it might go away on retry, but does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not.  It can be either on the repository side or on the output side.  If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you look at the Simple History in the UI, or at the manifoldcf.log file,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you should be able to get a better sense of what went wrong.  Without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> further information, I can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5:33 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> software engineer from société générale in France. I'm actually using your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recent version of manifoldCF 2.8 . I'm working on an internal search
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine. For this reason, I'm using manifoldcf in order to index documents
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on windows shares. I encountered a serious problem while crawling 35K
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents. Most of the time, when manifoldcf start crawling a big sized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for example), it ends the job with the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repeated service interruptions - failure processing document : software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused connection abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tips on how to solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>

Mime
View raw message