manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Delapasse, Deanna" <ddelapa...@oceaneering.com>
Subject Re: Alfresco WebScript Connector - Testing Question
Date Wed, 28 Oct 2015 12:08:24 GMT
Hi Paul,

I haven't read the entire thread, so I apologize if this is way off base...

When I worked with the CMIS connector I had to modify the logic to append
document.getLastModificationDate().getTimeInMillis() to the versionString
for it to pick up changes.  The Alfresco document version won't update when
you modify metadata.  My memory is terrible, but I believe that even
modifying content may not do it unless you have the proper 'versioning'
aspect applied.

Check inside Alfresco and see if your "version" is actually incrementing as
you expect. I was using an older Alfresco version and was not able to run
with the Alfresco connector, but the CMIS connector worked great for us!

Good luck!
Deanna




On Wed, Oct 28, 2015 at 6:07 AM, Paul Farrell <pfarrell@funnelback.com>
wrote:

> The alfresco log snippet doesn’t really shed any more light. It simple
> doesn’t think that the document content has changed.
>
> 09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getNodesByTransactionId] On Store
> workspace://SpacesStore
> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getLastTransactionID]
> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store
> workspace://SpacesStore
> 09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-5] [getLastAclChangeSetID]
> 09:56:42,070 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
> 09:56:42,079 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-9] Invoking Changes Webscript, using the following
> params
> lastTxnId: 352
> lastAclChangesetId: 13
> storeId: SpacesStore
> storeProtocol: workspace
> indexingFilters:
> {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}
>
> 09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getNodesByTransactionId] On Store
> workspace://SpacesStore
> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getLastTransactionID]
> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store
> workspace://SpacesStore
> 09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
> [http-apr-8080-exec-9] [getLastAclChangeSetID]
> 09:56:42,087 DEBUG
> [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
> [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template
>
> *Paul Farrell*
> Senior Search Consultant
>
> 109-123 Clifton Street, London EC2A 4LD
> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>
> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
> Twitter <https://twitter.com/funnelback>
>
> Funnelback UK Ltd is a limited liability company registered in England &
> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
> EC2A 4LD. Company registration number: 07004264.
>
> On 28 Oct 2015, at 10:50, Rafa Haro <rharoapache@gmail.com> wrote:
>
> You’re welcome Paul. Just in case, could you check the Alfresco logs to
> see if there is something informative there?
>
> Cheers,
> Rafa
>
>
>
>
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <pfarrell@funnelback.com>
> wrote:
>
>> I see. That makes sense.
>>
>> No problem. Thanks for the feedback Rafa. Much appreciated.
>>
>>
>>
>> *Paul Farrell*
>> Senior Search Consultant
>>
>> 109-123 Clifton Street, London EC2A 4LD
>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>>
>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>>
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
>> Twitter <https://twitter.com/funnelback>
>>
>> Funnelback UK Ltd is a limited liability company registered in England &
>> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
>> EC2A 4LD. Company registration number: 07004264.
>>
>> On 28 Oct 2015, at 10:45, Rafa Haro <rharoapache@gmail.com> wrote:
>>
>> Hi Paul,
>>
>> Before contributing the Alfresco connector, we performed several tests
>> similar to yours using an Alfresco 4.x version. Therefore, initially, my
>> guess is the Webscript is not behaving correctly for Alfresco 5 instances.
>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the
>> email thread. He might can provide some feedback about this or just confirm
>> my suspicions.
>>
>> Cheers,
>> Rafa
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <pfarrell@funnelback.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In follow up to my recent email (below) I thought I would share my
>>> findings with the ‘Alfresco Indexer’ connector (
>>> https://github.com/maoo/alfresco-indexer) in case someone may be able
>>> to advise on it’s usage.
>>>
>>> The reason I went to this is due to the lack of change control detection
>>> with either of the packaged Manifold Alfresco connectors (AtomPub or
>>> WebService). I needed a method whereby the crawl runs each night and picks
>>> up any and all changes to the documents from the previous 24 hours. A
>>> common scenario.
>>>
>>> Unfortunately, I am still to achieve this.
>>>
>>> Having built and installed both the AMP and JAR files needed for the new
>>> connector, changes are still not coming through. In fact, I have two
>>> observations so far:
>>>
>>> 1. Changes to document content or properties does not cause the same
>>> document to be picked up by the Alfresco connector on the next run
>>> 2. Adding ‘Filter Configuration’ seems to do very little to change what
>>> is picked up
>>>
>>> *IN DETAIL*
>>> *1. Failing to pick up modified content*
>>>
>>> Looking at the log files (which are set to debug) I can see that, upon
>>> the first crawl of Alfresco, Manifold sends the following requests:
>>>
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>
>>> "GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>
>>> GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>
>>> "GET
>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>
>>> GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>
>>> "GET
>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>> HTTP/1.1[\r][\n]"
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>
>>> GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>
>>> "GET
>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>> HTTP/1.1[\r][\n]"
>>>
>>> This picks up all of the content e.g. documents.
>>>
>>> Running a second crawl, without any other actions being done, results in
>>> the following requests:
>>>
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>> GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1
>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >>
>>> "GET
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> HTTP/1.1[\r][\n]”
>>>
>>> So I can see that, in the first instance, we are targeting content
>>> directly while, in the second, we are asking for changes. The problem is
>>> that no changes are returned from the second set of requests. The response
>>> from these calls is:
>>>
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "totalNodes" : "0", [\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "elapsedTime" : "8",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "docs" : [[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  ],[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>    "last_txn_id" : "352",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>    "last_acl_changeset_id" : "13",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "store_id" : "SpacesStore",[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>>>  "store_protocol" : "workspace"[\r][\n]"
>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>>>
>>> Regardless of what changes I make to a document that I have been using
>>> for testing, the document is not updated. The response from the calls for
>>> changes (totalNodes) is always ‘0’.
>>>
>>>
>>> *2. Adding ‘Filter Configuration’ seems to do very little to change what
>>> is picked up*
>>>
>>> Within my test Alfresco environment I have one site set up (Finance).
>>> Within the Finance doc library I have three test docs. No other changes
>>> have been made to the Alfresco instance.
>>> Running a crawl with no filter configurations set returns 81 items. This
>>> is via the URL in a browser.
>>> If I then set the Site Filter configuration to ‘Finance’ and apply, I
>>> still get 81 items when I re-run the crawl.
>>> I can see that the term ‘Finance’ is being added to the URL but this
>>> does not seem to change the behaviour.
>>>
>>>
>>> I am happy to spend time diagnosing this is there is anyone available to
>>> assist.
>>>
>>> Thanks
>>>
>>> Paul
>>>
>>>
>>>
>>> On 27 Oct 2015, at 18:14, pfarrell@funnelback.com wrote:
>>>
>>> Hi all,
>>>
>>> This is a question regarding the relatively new Alfresco Webscript
>>> connector.
>>>
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' (
>>> https://github.com/maoo/alfresco-indexer) and have applied the AMP and
>>> CLIENT packages to their respective environments.
>>>
>>>
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning
>>> nothing. The full API call used by Manifold, and based on my config, is :
>>>
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>
>>>
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the
>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>>
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>>
>>> The second URL simply adds the site restriction. This URL returns
>>> nothing:
>>>
>>>
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>
>>>
>>>
>>> Can anyone explain why the documents do not return when only the
>>> containing site is named in the API URL?
>>>
>>> Cheers
>>>
>>> Paul
>>>
>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message