hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
Date Tue, 03 Feb 2015 14:29:37 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303336#comment-14303336

Steve Loughran commented on HADOOP-9565:

h3. Semantics directly off {{FileContext}} and {{FileSystem}}?

# Having a clear separation between object store and FS tells the world that if something
doesn't say {{extends ObjectStore}} then its an FS with all the normal expectations of consistency,
atomicity, durability, etc. Having that extra subclass can exist to warn that something may
be wrong. 
# making {{getSemantics()}} abstract forces everyone to look at what their semantics really
are and declare them, rather than take a possibly incorrect default. (we couldn't make it
abstract and would have to default to POSIX)

That said:

# those object stores that can replace HDFS are effectively filesystems. The {{ObjectStore}}
extension would then only be needed if/when we added more features (e.g PUT?)
# having it everywhere makes it easier to chain filesystems together; some wrapper FS client
(like a performance counter) could relay the probe without caring about FS type; callers would
know it is there too.
# we could add something alongside querying capabilities. Today we have filesystems that don't
support append (checksum FS), seek on streams (FTP), truncate, extended attributes, encryption
flags, etc. There's no cue that they are missing other than exceptions when you try to use

I do fear that trying to add semantics and feature flags to the FS API itself is going to
prove more controversial. We could start with ObjectStore and then decide whether to pull
up at a later date.

h3. enumset vs bitmask?


It's easier to manipulate during chaining. Something like Netflix S3mper injects consistency
atop s3, so could do

long getSemantics() {
  return inner.getSemantics() | STORE_CONSISTENCY_COMPLETE;

or —and this is the hard one in enumset —, something that removed a feature
long getSemantics() {
  return inner.getSemantics()  & ! CONSISTENT_CREATE ;

we could also use the operation in reporting error messages, such as highlighting which requirements
weren't met in the exception text:
long s = store.getSemantics();
  throw new IOException("Missing semantics:" + ( s & STORE_POSIX_WRITE_SEMANTICS) + "
see https://wiki.apache.org/hadoop/ObjectStore");

Where it really excels though, is the fact that a numeric value can be defined in a hadoop
configuration XML. As a hex value. 

Thus someone could say

I think we will need precisely that for S3 clients, because some S3-API endpoints (e.g what
Amplidata are doing) do offer stricter semantics, and even amazon itself varies between "nothing",
0x0 on US-East, to create, 0x01 , everywhere else.

The only way we could let people configure it in the XML file is to use integers, ideally
with the values (including common aggregated values) listed somewhere. The javadocs will do
this,  —automatically for the decimal values, manually for the hex ones if we add that (I've
postponed it until the patch is ready & the values are fixed)

Therefore while I agree with anyone who thinks it is a low-level C/C++ view of the world,
in the hands of the competent, it is more powerful than the Java work that tries to wrap it
all in set theory.

> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>                 Key: HADOOP-9565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9565
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3, fs/swift
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch
> We can make the fact that some {{FileSystem}} implementations are really blobstores,
with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to
add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores
implement at server-side copy operation as a substitute for rename.

This message was sent by Atlassian JIRA

View raw message