jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6611) [upgrade][oak-blob-cloud] Many S3DataStore errors during migration with oak-upgrade
Date Wed, 06 Sep 2017 06:06:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154852#comment-16154852
] 

Amit Jain commented on OAK-6611:
--------------------------------

BTW the errors do indicate that there are still uploads in progress and hence the stats should
have indicated that (since, there is a lag in the purge controlled by {{stagingPurgeInterval}}
and subsequent update in the stats. The problem is due to the fact that a meaningful stats
provider has to be initialized to get accurate stats as proposed below.

{code}
Index: oak-upgrade/src/main/java/org/apache/jackrabbit/oak/upgrade/cli/blob/S3DataStoreFactory.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- oak-upgrade/src/main/java/org/apache/jackrabbit/oak/upgrade/cli/blob/S3DataStoreFactory.java
(revision 59dfb84da88157ed59c7c5fb670d9956257b0d18)
+++ oak-upgrade/src/main/java/org/apache/jackrabbit/oak/upgrade/cli/blob/S3DataStoreFactory.java
(revision 93fb425bc7f30f2299b6b24e76c1addd192a79ad)
@@ -22,6 +22,7 @@
 import java.io.IOException;
 import java.util.HashSet;
 import java.util.Properties;
+import java.util.concurrent.Executors;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
@@ -35,6 +36,8 @@
 
 import com.google.common.io.Closer;
 import com.google.common.io.Files;
+import org.apache.jackrabbit.oak.stats.DefaultStatisticsProvider;
+import org.apache.jackrabbit.oak.stats.StatisticsProvider;
 
 public class S3DataStoreFactory implements BlobStoreFactory {
 
@@ -72,6 +75,13 @@
         S3DataStore delegate = new S3DataStore();
         delegate.setProperties(props);
         delegate.setPath(directory);
+
+        // Initialize a default stats provider
+        StatisticsProvider statsProvider = new DefaultStatisticsProvider(Executors.newSingleThreadScheduledExecutor());
+        delegate.setStatisticsProvider(statsProvider);
+        // Reduce staging purge interval to 60 seconds
+        delegate.setStagingPurgeInterval(60);
+
         try {
             delegate.init(tempHomeDir.getPath());
         } catch (RepositoryException e) {

{code}

What I found confusing is that the logs related to the close of the caches are missing which
should have been around the time tar was closed. 

{noformat}
05.09.2017 12:04:24.028 INFO   o.a.j.o.s.f.FileStore: TarMK closed: /data/cq/crx-quickstart/repository-segment-tar-20170905-120210/segmentstore
05.09.2017 12:04:24.827 INFO   o.a.j.o.p.s.f.FileStore: TarMK closed: /data/cq/crx-quickstart/repository/segmentstore
05.09.2017 12:04:25.107 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [8632286ebab5d4a9ebda0a2e0abab4c55ca4825c2f0358bb05f5926dda367f99],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/86/32/28/8632286ebab5d4a9ebda0a2e0abab4c55ca4825c2f0358bb05f5926dda367f99]
05.09.2017 12:04:25.107 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [c685419afb510a9fa010f50a808a18b72e2bc0701ce522ea783e8a1e2a292e9d],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/c6/85/41/c685419afb510a9fa010f50a808a18b72e2bc0701ce522ea783e8a1e2a292e9d]
05.09.2017 12:04:25.209 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [8bfa063c184e18178393452f9e4760a53a482ce3f7501f0cb85fcd47e3cab0c1],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/8b/fa/06/8bfa063c184e18178393452f9e4760a53a482ce3f7501f0cb85fcd47e3cab0c1]
05.09.2017 12:04:25.209 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [45f573b27fea6b0b54be49391ea32e62c1ef1d31f66257772a75473d30692e77],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/45/f5/73/45f573b27fea6b0b54be49391ea32e62c1ef1d31f66257772a75473d30692e77]
05.09.2017 12:04:25.213 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [e8df0bbc0286e2e4673c5ae964d03ba4fcf5869eec61b1f2b510aa0c83ed0cb0],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/e8/df/0b/e8df0bbc0286e2e4673c5ae964d03ba4fcf5869eec61b1f2b510aa0c83ed0cb0]
05.09.2017 12:04:25.311 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [febf1aaa839a16b43d93ba4424dc88c93c1c54755cf1c5b4368207f936d29275],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/fe/bf/1a/febf1aaa839a16b43d93ba4424dc88c93c1c54755cf1c5b4368207f936d29275]
05.09.2017 12:04:25.311 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [06498358dde411634fe5ef9f31c66bce03fe6adef3f3aded2513419b2a76a74f],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/06/49/83/06498358dde411634fe5ef9f31c66bce03fe6adef3f3aded2513419b2a76a74f]
05.09.2017 12:04:25.316 INFO   o.a.j.o.p.b.UploadStagingCache: Successfully added [769f241c95dee87eee65ff846c17b14bbadc8febac293a3953fd1141d8af3385],
[/data/cq/crx-quickstart/repository/repository/datastore/upload/76/9f/24/769f241c95dee87eee65ff846c17b14bbadc8febac293a3953fd1141d8af3385]
{noformat}

> [upgrade][oak-blob-cloud] Many S3DataStore errors during migration with oak-upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: OAK-6611
>                 URL: https://issues.apache.org/jira/browse/OAK-6611
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, upgrade
>    Affects Versions: 1.8, 1.7.7
>            Reporter: Arek Kita
>            Assignee: Amit Jain
>            Priority: Critical
>             Fix For: 1.8, 1.7.7
>
>         Attachments: oak-upgrade-oak-blob-cloud-20170905.log.gz, oak-upgrade-with-oak-blob-cloud.fragment.log
>
>
> [~tomek.rekawek], [~amitjain] Due to async nature of S3 datastore format/upload process
the migration ends up way quicker than S3 datastore is being migrated. This leads to a huge
number of exceptions shown due to *non synchronised* nature of *oak-upgrade* migration process
vs async S3 datastore background processes. 
> I see a few possible solutions for that:
> * disable migration/uploading of S3 cache for the time of migration (bad idea IMHO)
> * wait for it (it might be desired or a bad idea as it might take longer than migration
for a few cases)
> * pause it when migration is completed in a clean way (so some binaries aren't uploaded
and moved to a new datastore format) -- not sure if such mixed state is ok at all
> WDYT? 
> Please also note that this happens only when {{\-\-src-s3config \-\-src-s3datastore}}
options are specified during migration which in many cases is true (this would be the same
for the destination DataStore options). 
> Referencing a source datastore is needed (even if {{\-\-copy-binaries}} is not included)
in example to copy checkpoints properly.
> The example exception is like the below:
> {code}
> 01.09.2017 11:39:41.088 ERROR  o.a.j.o.p.b.UploadStagingCache: Error adding file to backend
> java.lang.IllegalStateException: Connection pool shut down
> 	at org.apache.http.util.Asserts.check(Asserts.java:34)
> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184)
> 	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:251)
> 	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
> 	at com.amazonaws.http.conn.$Proxy3.requestConnection(Unknown Source)
> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:175)
> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> 	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
> 	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:880)
> 	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:723)
> 	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:475)
> 	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:437)
> 	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:386)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3996)
> 	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1161)
> 	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1136)
> 	at org.apache.jackrabbit.oak.blob.cloud.s3.S3Backend.write(S3Backend.java:201)
> 	at org.apache.jackrabbit.oak.plugins.blob.AbstractSharedCachingDataStore$2.write(AbstractSharedCachingDataStore.java:170)
> 	at org.apache.jackrabbit.oak.plugins.blob.UploadStagingCache$4.call(UploadStagingCache.java:341)
> 	at org.apache.jackrabbit.oak.plugins.blob.UploadStagingCache$4.call(UploadStagingCache.java:336)
> 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> 	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message