manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Farrell <pfarr...@funnelback.com>
Subject Re: Alfresco WebScript Connector - Testing Question
Date Wed, 28 Oct 2015 11:07:28 GMT
The alfresco log snippet doesn’t really shed any more light. It simple doesn’t think that
the document content has changed. 

09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getNodesByTransactionId]
On Store workspace://SpacesStore
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getLastTransactionID]
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getNodesByAclChangesetId]
On Store workspace://SpacesStore
09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-5] [getLastAclChangeSetID]
09:56:42,070 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-5]
Attaching 0 nodes to the WebScript template
09:56:42,079 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-9]
Invoking Changes Webscript, using the following params
lastTxnId: 352
lastAclChangesetId: 13
storeId: SpacesStore
storeProtocol: workspace
indexingFilters: {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}

09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getNodesByTransactionId]
On Store workspace://SpacesStore
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getLastTransactionID]
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getNodesByAclChangesetId]
On Store workspace://SpacesStore
09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] [http-apr-8080-exec-9] [getLastAclChangeSetID]
09:56:42,087 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] [http-apr-8080-exec-9]
Attaching 0 nodes to the WebScript template

Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered
address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number:
07004264.

> On 28 Oct 2015, at 10:50, Rafa Haro <rharoapache@gmail.com> wrote:
> 
> You’re welcome Paul. Just in case, could you check the Alfresco logs to see if there
is something informative there?
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pfarrell@funnelback.com <mailto:pfarrell@funnelback.com>>
wrote:
> 
> I see. That makes sense. 
> 
> No problem. Thanks for the feedback Rafa. Much appreciated. 
> 
> 
> 
> Paul Farrell
> Senior Search Consultant
>  
> 109-123 Clifton Street, London EC2A 4LD
> T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
> 
> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
> 
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter
<https://twitter.com/funnelback>
> 
> Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered
address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number:
07004264.
> 
>> On 28 Oct 2015, at 10:45, Rafa Haro <rharoapache@gmail.com <mailto:rharoapache@gmail.com>>
wrote:
>> 
>> Hi Paul, 
>> 
>> Before contributing the Alfresco connector, we performed several tests similar to
yours using an Alfresco 4.x version. Therefore, initially, my guess is the Webscript is not
behaving correctly for Alfresco 5 instances. I’m including Maurizio Pillitu (Alfresco Indexer
main developer) in the email thread. He might can provide some feedback about this or just
confirm my suspicions. 
>> 
>> Cheers,
>> Rafa
>> 
>> 
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com <mailto:pfarrell@funnelback.com>>
wrote:
>> 
>> Hi all,
>> 
>> In follow up to my recent email (below) I thought I would share my findings with
the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>)
in case someone may be able to advise on it’s usage. 
>> 
>> The reason I went to this is due to the lack of change control detection with either
of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby
the crawl runs each night and picks up any and all changes to the documents from the previous
24 hours. A common scenario.
>> 
>> Unfortunately, I am still to achieve this. 
>> 
>> Having built and installed both the AMP and JAR files needed for the new connector,
changes are still not coming through. In fact, I have two observations so far:
>> 
>> 1. Changes to document content or properties does not cause the same document to
be picked up by the Alfresco connector on the next run
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked
up
>> 
>> IN DETAIL
>> 1. Failing to pick up modified content
>> 
>> Looking at the log files (which are set to debug) I can see that, upon the first
crawl of Alfresco, Manifold sends the following requests:
>> 
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET
/alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET
/alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET
/alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET
/alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET
/alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET
/alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET
/alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET
/alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
HTTP/1.1[\r][\n]"
>> 
>> This picks up all of the content e.g. documents. 
>> 
>> Running a second crawl, without any other actions being done, results in the following
requests:
>> 
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET
/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
HTTP/1.1[\r][\n]”
>> 
>> So I can see that, in the first instance, we are targeting content directly while,
in the second, we are asking for changes. The problem is that no changes are returned from
the second set of requests. The response from these calls is:
>> 
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "totalNodes"
: "0", [\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "elapsedTime"
: "8",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs"
: [[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  ],[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "   
"last_txn_id" : "352",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "   
"last_acl_changeset_id" : "13",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_id"
: "SpacesStore",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "store_protocol"
: "workspace"[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>> 
>> Regardless of what changes I make to a document that I have been using for testing,
the document is not updated. The response from the calls for changes (totalNodes) is always
‘0’.
>> 
>> 
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked
up
>> 
>> Within my test Alfresco environment I have one site set up (Finance). Within the
Finance doc library I have three test docs. No other changes have been made to the Alfresco
instance. 
>> Running a crawl with no filter configurations set returns 81 items. This is via the
URL in a browser.
>> If I then set the Site Filter configuration to ‘Finance’ and apply, I still get
81 items when I re-run the crawl. 
>> I can see that the term ‘Finance’ is being added to the URL but this does not
seem to change the behaviour. 
>> 
>> 
>> I am happy to spend time diagnosing this is there is anyone available to assist.

>> 
>> Thanks
>> 
>> Paul
>> 
>> 
>> 
>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com <mailto:pfarrell@funnelback.com>
wrote:
>>> 
>>> Hi all,
>>> 
>>> This is a question regarding the relatively new Alfresco Webscript connector.

>>> 
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' (https://github.com/maoo/alfresco-indexer
<https://github.com/maoo/alfresco-indexer>) and have applied the AMP and CLIENT packages
to their respective environments. 
>>> 
>>> 
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning nothing.
The full API call used by Manifold, and based on my config, is :
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> 
>>> 
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the documents
that exist in the doc library of the 'Finance' site. This URL is:
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>> 
>>> The second URL simply adds the site restriction. This URL returns nothing:
>>> 
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
<http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>>> 
>>> 
>>> 
>>> Can anyone explain why the documents do not return when only the containing site
is named in the API URL?
>>> 
>>> Cheers
>>> 
>>> Paul
>>> 
>>> 
>> 
>> 
> 
> 


Mime
View raw message