manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Farrell <pfarr...@funnelback.com>
Subject Re: Alfresco WebScript Connector - Testing Question
Date Wed, 28 Oct 2015 10:33:01 GMT
Hi all,

In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco
Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>)
in case someone may be able to advise on it’s usage. 

The reason I went to this is due to the lack of change control detection with either of the
packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the
crawl runs each night and picks up any and all changes to the documents from the previous
24 hours. A common scenario.

Unfortunately, I am still to achieve this. 

Having built and installed both the AMP and JAR files needed for the new connector, changes
are still not coming through. In fact, I have two observations so far:

1. Changes to document content or properties does not cause the same document to be picked
up by the Alfresco connector on the next run
2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

IN DETAIL
1. Failing to pick up modified content

Looking at the log files (which are set to debug) I can see that, upon the first crawl of
Alfresco, Manifold sends the following requests:

DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1[\r][\n]"

This picks up all of the content e.g. documents. 

Running a second crawl, without any other actions being done, results in the following requests:

DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1[\r][\n]”

So I can see that, in the first instance, we are targeting content directly while, in the
second, we are asking for changes. The problem is that no changes are returned from the second
set of requests. The response from these calls is:

DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes"
: "0", [\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime"
: "8",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" : [[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_txn_id"
: "352",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    "last_acl_changeset_id"
: "13",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id"
: "SpacesStore",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol"
: "workspace"[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"

Regardless of what changes I make to a document that I have been using for testing, the document
is not updated. The response from the calls for changes (totalNodes) is always ‘0’.


2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up

Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc
library I have three test docs. No other changes have been made to the Alfresco instance.

Running a crawl with no filter configurations set returns 81 items. This is via the URL in
a browser.
If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items
when I re-run the crawl. 
I can see that the term ‘Finance’ is being added to the URL but this does not seem to
change the behaviour. 


I am happy to spend time diagnosing this is there is anyone available to assist. 

Thanks

Paul



> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
> 
> Hi all,
> 
> This is a question regarding the relatively new Alfresco Webscript connector. 
> 
> SETUP
> I have a vanilla Alfresco Community 5.0 installation
> One site has been created called 'Finance'
> A handful of documents have been created in 'Finance' Doc Library.
> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer)
and have applied the AMP and CLIENT packages to their respective environments. 
> 
> 
> ISSUE
> The issue is that the default API call used by Manifold is returning nothing. The full
API call used by Manifold, and based on my config, is :
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
> 
> 
> TESTS
> I have identified two streamlined URL's. The first one returns the documents that exist
in the doc library of the 'Finance' site. This URL is:
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
> 
> The second URL simply adds the site restriction. This URL returns nothing:
> 
> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
> 
> 
> 
> Can anyone explain why the documents do not return when only the containing site is named
in the API URL?
> 
> Cheers
> 
> Paul
> 
> 


Mime
View raw message