beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (BEAM-3820) SolrIO: Allow changing batchSize for writes
Date Mon, 03 Sep 2018 08:32:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-3820?focusedWorklogId=140493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-140493
]

ASF GitHub Bot logged work on BEAM-3820:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Sep/18 08:31
            Start Date: 03/Sep/18 08:31
    Worklog Time Spent: 10m 
      Work Description: iemejia closed pull request #6283: [BEAM-3820] Exposing batchSize
for SolrIO Writes
URL: https://github.com/apache/beam/pull/6283
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java b/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java
index 00139df470b..ddc5dc66b89 100644
--- a/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java
+++ b/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java
@@ -535,14 +535,11 @@ public Write to(String collection) {
     /**
      * Provide a maximum size in number of documents for the batch. Depending on the execution
      * engine, size of bundles may vary, this sets the maximum size. Change this if you need
to have
-     * smaller batch.
+     * smaller batch. Default max batch size is 1000.
      *
      * @param batchSize maximum batch size in number of documents
      */
-    @VisibleForTesting
-    Write withMaxBatchSize(int batchSize) {
-      // TODO remove this configuration, we can figure out the best number
-      // by tuning batchSize when pipelines run.
+    public Write withMaxBatchSize(int batchSize) {
       checkArgument(batchSize > 0, "batchSize must be larger than 0, but was: %s", batchSize);
       return builder().setMaxBatchSize(batchSize).build();
     }
diff --git a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java
index 78732edf775..8ff521d596d 100644
--- a/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java
+++ b/sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java
@@ -71,6 +71,7 @@
   private static final long NUM_DOCS = 400L;
   private static final int NUM_SCIENTISTS = 10;
   private static final int BATCH_SIZE = 200;
+  private static final int DEFAULT_BATCH_SIZE = 1000;
 
   private static AuthorizedSolrClient<CloudSolrClient> solrClient;
   private static SolrIO.ConnectionConfiguration connectionConfiguration;
@@ -317,4 +318,16 @@ public void testDefaultRetryPredicate() {
         DEFAULT_RETRY_PREDICATE.test(
             new SolrException(SolrException.ErrorCode.UNSUPPORTED_MEDIA_TYPE, "test")));
   }
+
+  /** Tests batch size default and changed value. */
+  @Test
+  public void testBatchSize() {
+    SolrIO.Write write1 =
+        SolrIO.write()
+            .withConnectionConfiguration(connectionConfiguration)
+            .withMaxBatchSize(BATCH_SIZE);
+    assertTrue(write1.getMaxBatchSize() == BATCH_SIZE);
+    SolrIO.Write write2 = SolrIO.write().withConnectionConfiguration(connectionConfiguration);
+    assertTrue(write2.getMaxBatchSize() == DEFAULT_BATCH_SIZE);
+  }
 }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 140493)
    Time Spent: 40m  (was: 0.5h)

> SolrIO: Allow changing batchSize for writes
> -------------------------------------------
>
>                 Key: BEAM-3820
>                 URL: https://issues.apache.org/jira/browse/BEAM-3820
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>            Reporter: Tim Robertson
>            Assignee: Ravi Pathak
>            Priority: Trivial
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The SolrIO hard codes the batchSize for writes at 1000.  It would be a good addition
to allow the user to set the batchSize explicitly (similar to the ElasticsearchIO)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message