jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomek Rękawek (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore
Date Fri, 18 Dec 2015 09:40:46 GMT

    [ https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063759#comment-15063759
] 

Tomek Rękawek commented on OAK-3637:
------------------------------------

[~reschke], thanks for the detailed review!

{quote}
RDBDocumentStore#createOrUpdate()
JRE: the for loop is hard to understand (where do the constants and the conditions come from?):
maybe this should be a while loop
{quote}
Refactored to while loop and added comments.

{quote}
RDBDocumentStoreJDBC#MAX_LIST_LENGTH
JRE: that sounds a bit like something that needs to be fixed separately anyway (it would hit
us already if somebody set CHUNKSIZE to a large value, right?); potentiall in RDBBlobStore
as well (we should address this as saparate problem to be fixed in 1.2 and 1.0 right away)
{quote}
Created OAK-3807.

{quote}
RDBDocumentStoreJDBC#update
JRE: are we sure that the sequence of statements implemented here will actually do the right
thing when done concurrently with the transaction isolation that we have?
{quote}
There are two stages: update and insert. In the first one we include the modcount as a condition,
so as long as single updates are done atomically, we won't modify anything we don't expect.

The insert stage is more complex - (1) first we look for documents that has been already created
(so we won't try to insert them) and then (2) we call the proper insert. There's a chance
that the document is inserted between (1) and (2). In this case the operation will throw an
exception and - according to the createOrUpdate javadoc - the caller has to find out which
documents has been updated. It's the same race condition as in {{RDBDocumentStore#internalCreateOrUpdate}}.

{quote}
RDBDocumentStoreJDBC#update
JRE: ok, so we drop the documents which have been modified from the update? why is this ok?
what's the contract here?
{quote}

At this point we already tried to update the documents. Now it's time for the 2nd stage -
insert. We have a list of remaining documents. There might be documents which hasn't been
updated correctly because of conflicts or completely new documents which should be added.
We're interested only in the latter, that's why we remove documents with existing IDs from
the {{remainingDocuments}}. Please note, that these documents won't be added to the {{successfulUpdates}}
list, which contains all updates that has been applied.

So, the contract is that we try to update and insert as many documents as we can. The successful
updates are returned in a list and the failed are not - it's caller responsibility to take
care of the failed ones. I've added a javadoc stating this.

Regarding the other comments - I fixed all of them as suggested.

> Bulk document updates in RDBDocumentStore
> -----------------------------------------
>
>                 Key: OAK-3637
>                 URL: https://issues.apache.org/jira/browse/OAK-3637
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: rdbmk
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
>
>         Attachments: OAK-3637.comments.txt, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message