hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hadoop] hadoop-yetus commented on a change in pull request #879: HADOOP-15563 Full S3Guard Support for on-demand DDB tables
Date Fri, 31 May 2019 19:46:00 GMT
hadoop-yetus commented on a change in pull request #879: HADOOP-15563 Full S3Guard Support
for on-demand DDB tables
URL: https://github.com/apache/hadoop/pull/879#discussion_r289528151
 
 

 ##########
 File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
 ##########
 @@ -315,63 +322,90 @@ Next, you can choose whether or not the table will be automatically
created
 </property>
 ```
 
-### 7. If creating a table: Set your DynamoDB I/O Capacity
+### 7. If creating a table: Choose your billing mode (and perhaps I/O Capacity)
+
+Next, you need to decide whether to use On-Demand DynamoDB and its
+pay-per-request billing (recommended), or to explicitly request a
+provisioned IO capacity.
 
-Next, you need to set the DynamoDB read and write throughput requirements you
-expect to need for your cluster.  Setting higher values will cost you more
-money.  *Note* that these settings only affect table creation when
+Before AWS offered pay-per-request billing, the sole billing mechanism,
+was "provisioned capacity". This mechanism requires you to choose 
+the DynamoDB read and write throughput requirements you
+expect to need for your expected uses of the S3Guard table.
+Setting higher values cost you more money -*even when the table was idle*
+  *Note* that these settings only affect table creation when
 `fs.s3a.s3guard.ddb.table.create` is enabled.  To change the throughput for
 an existing table, use the AWS console or CLI tool.
 
 For more details on DynamoDB capacity units, see the AWS page on [Capacity
 Unit Calculations](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#CapacityUnitCalculations).
 
-The charges are incurred per hour for the life of the table, *even when the
+Provisioned IO capacity is billed per hour for the life of the table, *even when the
 table and the underlying S3 buckets are not being used*.
 
 There are also charges incurred for data storage and for data I/O outside of the
 region of the DynamoDB instance. S3Guard only stores metadata in DynamoDB: path names
 and summary details of objects —the actual data is stored in S3, so billed at S3
 rates.
 
+With provisioned I/O capacity, attempting to perform more I/O than the capacity
+requested throttles the operation and may result in operations failing.
+Larger I/O capacities cost more.
+
+With the introduction of On-Demand DynamoDB, you can now avoid paying for
+provisioned capacity by creating an on-demand table.
+With an on-demand table you are not throttled if your DynamoDB requests exceed
+any pre-provisioned limit, nor do you pay per hour even when a table is idle.
+
+You do, however, pay more per DynamoDB operation.
+Even so, the ability to cope with sudden bursts of read or write requests, combined
+with the elimination of charges for idle tables, suit the use patterns made of 
+S3Guard tables by applications interacting with S3. That is: periods when the table
+is rarely used, with intermittent high-load operations when directory trees
+are scanned (query planning and similar), or updated (rename and delete operations).
+
+
+We recommending using On-Demand DynamoDB for maximum performance in operations
+such as query planning, and lowest cost when S3 buckets are not being accessed.
+
+This is the default, as configured in the default configuration options.
+
 ```xml
 <property>
   <name>fs.s3a.s3guard.ddb.table.capacity.read</name>
-  <value>500</value>
+  <value>0</value>
   <description>
     Provisioned throughput requirements for read operations in terms of capacity
-    units for the DynamoDB table.  This config value will only be used when
-    creating a new DynamoDB table, though later you can manually provision by
-    increasing or decreasing read capacity as needed for existing tables.
-    See DynamoDB documents for more information.
+    units for the DynamoDB table. This config value will only be used when
+    creating a new DynamoDB table.
+    If set to 0 (the default), new tables are created with "per-request" capacity.
+    If a positive integer is provided for this and the write capacity, then
+    a table with "provisioned capacity" will be created.
+    You can change the capacity of an existing provisioned-capacity table
+    through the "s3guard set-capacity" command.
   </description>
 </property>
 
 <property>
   <name>fs.s3a.s3guard.ddb.table.capacity.write</name>
-  <value>100</value>
+  <value>0</value>
   <description>
     Provisioned throughput requirements for write operations in terms of
-    capacity units for the DynamoDB table.  Refer to related config
-    fs.s3a.s3guard.ddb.table.capacity.read before usage.
+    capacity units for the DynamoDB table.
 
 Review comment:
   whitespace:end of line
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message