hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10809) hadoop-azure: page blob support
Date Wed, 08 Oct 2014 18:15:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163898#comment-14163898
] 

Chris Nauroth commented on HADOOP-10809:
----------------------------------------

Hi, [~ehans].  Nice work!  I have just a few comments remaining.

# findbugs-exclude.xml: There is a minor typo in a comment: "interate" intsead of "iterate".
# {{Wasb}}: The new imports in this class appear to be unneeded.
# In the {{FileSystem}} service provider configuration file, I recommend adding this line,
so that we get automatic loading of the classes for the "wasbs" scheme too:
{code}
org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure
{code}
# Going along with the above, I recommend removing the {{fs.wasb.impl}} and {{fs.wasbs.impl}}
configuration properties from azure-test.xml.  Now that we have the service provider configuration
file, these are no longer necessary for the tests to be able to load the classes, and removing
them gives us a good test that the service provider configuration file is working as expected.

Additionally, there are a few items from earlier rounds of feedback that still need to be
addressed:

# {{NativeAzureFileSystem#seek}}: Please wrap the debug log statement in a check for {{if
(LOG.isDebugEnabled())}}.
# {{PageBlobOutputStream}}: In the constructor, let's remove the commented-out assignment
of {{this.ioQueue}}.
# {{PageBlobOutputStream.WriteRequest#runInternal}}: There is just one line left in here that
goes over 80 characters.  Can this be wrapped please?
# {{PageBlobOutputStream#killIoThreads}}: This method has a comment saying that it's only
intended for unit tests. Would you please add the {{VisibleForTesting}} annotation to this
method?
# {{TestNativeAzureFileSystemOperationsMocked}}: There are some TODO comments about things
done "during manual merge". Is it appropriate to remove these comments now?
# Let's update README.txt to mention page blob support and discuss the new configuration property
for setting paths to use page blobs.  We could also mention intended use cases for one vs.
the other (i.e. Block blobs are good for scans/MapReduce and page blobs are good for random
access/HBase.)
# There is a potential internationalizaton issue on this line:
{code}
+    if (asUri.getAuthority() == null 
+        || asUri.getAuthority().equalsIgnoreCase(sessionUri.getAuthority())) {
{code}
-should be fixed to {{toLower(LOCALE.EN_US)}}

There were a few other broad suggestions that came up earlier in discussion.  Let's defer
these to separate issues at a later time, because there is already a lot happening in this
patch:

# Convert to slf4j logging.
# Use executors instead of directly using thread pools.
# Start using the new file system contract tests.


> hadoop-azure: page blob support
> -------------------------------
>
>                 Key: HADOOP-10809
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10809
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>            Reporter: Mike Liddell
>            Assignee: Eric Hanson
>         Attachments: HADOOP-10809.02.patch, HADOOP-10809.03.patch, HADOOP-10809.04.patch,
HADOOP-10809.05.patch, HADOOP-10809.06.patch, HADOOP-10809.07.patch, HADOOP-10809.08.patch,
HADOOP-10809.09.patch, HADOOP-10809.1.patch, HADOOP-10809.10.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  Block-blobs are
the general purpose kind that support convenient APIs and are the basis for the Azure Filesystem
for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different low-level feature
set.  Most importantly, page-blobs can cope with an effectively infinite number of small accesses
whereas block-blobs can only tolerate 50K appends before relatively manual rewriting of the
data is necessary.  A simple analogy is that page-blobs are like a regular disk and the basic
API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some introductory
material.
> The primary driving scenario for page-blob support is for HBase transaction log files
which require an access pattern of many small writes.  Additional scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can determine
whether to create a block- or page-blob.  To permit scenarios where application code doesn't
know about the details of azure storage we would like the configuration to be Aspect-style,
ie configured by the Administrator and transparent to the application. The current solution
is to use hadoop configuration to declare a list of page-blob folders -- Azure Filesystem
for Hadoop will create files in these folders using page-blob flavor.  The configuration key
is "fs.azure.page.blob.dir", and description can be found in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and specialized
BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of createNonRecursive(),
flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message