manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From msaunier <msaun...@citya.com>
Subject RE: Modify job to add excludes files and directory
Date Tue, 13 Mar 2018 17:50:26 GMT
Hello Karl,

 

I have created 3 situations :

 

1.      Create job manually (1_job_manually.json | 1_job_manually.png)

2.      Create job with script and modify the order manually (2_job_mixte.json | 2_job_mixte.png)

3.      Create job with script (3_job_script.json | 3_job_script.png)

 

I do not see the difference.

 

So : 1 and 2 work good, with the good order, but 3 have included files and directories in
first.

 

Thanks,

Maxence

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : lundi 12 mars 2018 21:29
À : user@manifoldcf.apache.org
Cc : Fabien Harrang <FHARRANG@citya.com>; REUILLON Dominique <dreuillon@citya.com>
Objet : Re: Modify job to add excludes files and directory

 

Here is an idea.  Define your job in the ui and use the API to fetch the json for it.

 

Karl

 

On Mon, Mar 12, 2018, 12:51 PM Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

I will need to look at this later tonight before I can respond in detail.

The document specification part of the API uses EXACTLY the same data as is stored for the
job.  There only difference is that the job specification is stored in XML, not JSON.  The
converters between the two do preserve ordering, however.

 

Karl

 

 

On Mon, Mar 12, 2018 at 12:38 PM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

1 :

I have find a problem on the file system connector parts in this page (I think) : https://manifoldcf.apache.org/release/release-2.9.1/en_US/programmatic-operation.html

 

You have read this JSON :

 

{"startpoint":[{"_attribute_path":"c:\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]}

 

I think, the json syntax is bad. I fink the correct JSON is :

 

{"startpoint":[{"_attribute_path":"c:\\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc","_attribute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]}

 

Corrections list : 

{"startpoint":[{"_attribute_path":"c:\\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]}

 

But, this configuration does not working with the Windows Share connector. Syntax error on
the exclude.

 

2 :

For my problem, the JSON format is not the problem. It work. I join the json, generated with
my python script and my database. (srvics33.json)

 

If I go on the interface after PUT the configuration, they included files are in first and
excluded in second. (image1.png) In my JSON, I have add excludes in first, but they are in
second.

I am forced to go on the interface and manually modify the order to optain a good result.
(image2.png)

 

Can I enter an order parameter [1-*] to place excluded files and directories in first?

 

Thanks.

 

Maxence

 

De : Karl Wright [mailto:daddywri@gmail.com <mailto:daddywri@gmail.com> ] 
Envoyé : lundi 12 mars 2018 14:38


À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Cc : Fabien Harrang <FHARRANG@citya.com <mailto:FHARRANG@citya.com> >; REUILLON
Dominique <DREUILLON@citya.com <mailto:DREUILLON@citya.com> >
Objet : Re: Modify job to add excludes files and directory

 

Hi Maxence,

 

You can have as many clauses in your JSON rule list as you like.  You do not need to have
both include and exclude rules in each clause.  So you can precisely do in the JSON what you
do in the UI.

 

Thanks,

Karl

 

 

On Mon, Mar 12, 2018 at 9:07 AM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

Ok. I have read that on the documentation :

 

 Rules are evaluated from top to bottom, and the first rule that matches the file name is
the one that is chosen. 

 

But, in the API, if I PUT a new Job definition with the good order, ManifoldCF add included
documents in first all the time. If I need to exlude in first, I can’t with API definition.
I add the JSON at this email.

 

API have an order parameter for the Startpoint, included and excluded files/directories ?

 

(PS : I prefer exclude in first and include * to have a total control on the GED, to keep
an eye on they documents)

(PS2 : I generate this JSON and send it with a python script and it working good)

 

Thanks

 

De : Karl Wright [mailto:daddywri@gmail.com <mailto:daddywri@gmail.com> ] 
Envoyé : vendredi 9 mars 2018 12:53
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Cc : Fabien Harrang <FHARRANG@citya.com <mailto:FHARRANG@citya.com> >; REUILLON
Dominique <DREUILLON@citya.com <mailto:DREUILLON@citya.com> >
Objet : Re: Modify job to add excludes files and directory

 

Hi Maxence,

 

In the middle of job run, if you change the specification of what documents are included and
excluded, the implementation of the connector determines how it will behave.  There is no
guarantee that documents that are excluded will be removed, for example if the connector filters
documents only when they are queued.  You may need to run the job a second time to be sure
everything is removed.

So the official answer is that "it depends". 

 

Karl

 

 

On Fri, Mar 9, 2018 at 5:38 AM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

Hello Karl,

 

If I add on a job (in live) new files and directories to exclude, ManifoldCF delete old indexed
files that meet these exclusions? Or I need to reseed all of my documents?

 

Thanks you.

 

Maxence SAUNIER

 

 

 

 

 

 


Mime
View raw message