hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12403) Enable multiple writes in flight for HBase WAL writing backed by WASB
Date Tue, 03 Jan 2017 10:59:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794782#comment-15794782

Hadoop QA commented on HADOOP-12403:

| (x) *{color:red}-1 overall{color}* |
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  0s{color} | {color:blue}
Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} | {color:red}
HADOOP-12403 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute
for help. {color} |
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-12403 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12755309/HADOOP-12403.03.patch
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11342/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |

This message was automatically generated.

> Enable multiple writes in flight for HBase WAL writing backed by WASB
> ---------------------------------------------------------------------
>                 Key: HADOOP-12403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12403
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>         Attachments: HADOOP-12403.01.patch, HADOOP-12403.02.patch, HADOOP-12403.03.patch
> Azure HDI HBase clusters use Azure blob storage as file system. We found that the bottle
neck was during writing to write ahead log (WAL). The latest HBase WAL write model (HBASE-8755)
uses multiple AsyncSyncer threads to sync data to HDFS. However, our WASB driver is still
based on a single thread model. Thus when the sync threads call into WASB layer, every time
only one thread will be allowed to send data to Azure storage.This jira is to introduce a
new write model in WASB layer to allow multiple writes in parallel.
> 1. Since We use page blob for WAL, this will cause "holes" in the page blob as every
write starts on a new page. We use the first two bytes of every page to record the actual
data size of the current page.
> 2. When reading WAL, we need to know the actual size of the WAL. This should be the sum
of the number represented by the first two bytes of every page. However looping over every
page to get the size will be very slow, considering normal WAL size is 128MB and each page
is 512 bytes. So during writing, every time a write succeeds, a metadata of the blob called
"total_data_uploaded" will be updated.
> 3. Although we allow multiple writes in flight, we need to make sure the sync threads
which call into WASB layers return in order. Reading HBase source code FSHLog.java, we find
that every sync request is associated with a transaction id. If the sync succeeds, all the
transactions prior to this transaction id are assumed to be in Azure Storage. We use a queue
to store the sync requests and make sure they return to HBase layer in order.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message