hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance
Date Sat, 06 Jul 2013 10:51:48 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701311#comment-13701311
] 

Steve Loughran commented on HADOOP-9361:
----------------------------------------

This patch

# specifies the a hadoop filesystem
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem modeled as a
set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted from the HADOOP-8545
tests), and a test suite for {{FileSystem.concat()}}


h4. Tries to define the behaviour of the {{FileSystem}} class rigorously
# It does this with a model of a filesystem as a set of paths mapping to metadata and data,
which lets us use set-theory to select parts of the filesystem, and in describing how a modified
filesystem is derived from is predecessor as a result of an operation.
# specifying the preconditions and postconditions of the main operations -based on the behaviour
of HDFS. 
# begins the notion of FS contract tests, with a Hadoop-compatible filesystem modeled as a
set of paths mapping to data and metadata.
# defines a few initial tests: directory operations, seek operations (lifted from the HADOOP-8545
tests), and a test suite for {{FileSystem.concat()}}

Every filesystem  to be tested must  implement a subclass  of {{AbstractFileSystemContract}}
which  contains a factory for  their {{FileSystem}}
instances (and could  easily do the same for {{FileContext}},  as well as a method {{boolean
 supports(feature:string)}} that returns true/false
for features like {{"supports-append"}}.  For the local fs and HDFS, the  definition of supported
features is done from  XML config files, using
keys like {{fs.$filesystem.contract.supports-append}}. This allows the future option for 
these declarations to move into the core- XML files.
The  {{LocalFSContract}} class  updates some  of  its supported  features  ({{is-case-sensitive}},
{{supports-unix-permissions}})  based on  the
features of underlying FS. Every contract  also has a couple of test-only features: {{supports-root-tests}}
(can you  do things like {{rm /}} in
a test run?) and a method to see if the tests are enabled ({{boolean enabled()}}). This lets
contract test suites disable themselves if they can't  run (i.e when the logon details for
a blobstore aren't  in the test configurations).  The enabled() option is checked  in the
test setups; the  other features can be  tested in the
specific tests -and all downgrade to a skipped test if not set. This makes it visible which
tests have not actually run.

h4. Instantiates contract tests for the localFS and HDFS. Note that LocalFS fails the seek
tests, as it downgrades negative seeks to a no-op.

Having written these tests, I can see some limitations of the process.

* the tests have to look for the loosest exception type coming back from a failure -e.g. {{IOException}}
over {{EOFException}}. 
* needs a story for timeouts: we must have these to deal with blobstores &c, and either
need a way to allow FS contracts to define these, let the FS specific tests define them, or
just have some base timeouts large enough to deal with blobstores with operations like {{delete(dir)}}
being {{O(children(dir))}}

h4. Specification Syntax

I'm not convinced the syntax for specifications is great, nor am I sure I'm even using it
consistently. I've just had
to do something minimal that fits into unformatted text in a {{.apt}} file:

{code}
FileSystem.listStatus(Path P, PathFilter Filter) 

A PathFilter) is a predicate function that returns true iff the path P
meets the filter's requirements.

  Preconditions

-----------

  #path must exist
  exists(FS, P) || throw FileNotFoundException
  
-----------

  Postconditions
  
-----------

  isFile(FS, P) && Filter(P) => [FileStatus(FS, P)]
  
  isFile(FS, P) && !Filter(P) => []
  
  isDir(FS, P) => [all C in children(FS, P) where Filter(C) == true] 
  
-----------
{code}

This blurs C/Java symbols with bits of set theory, and trying to define what exceptions to
throw
on failed preconditions is something that really needs improving. 

If we could use LaTeX we could do proper set syntax, though we'd still need a consistent language
for defining filesystem implementation behaviours.

{code} 
isFile(FS, P) \land \neg Filter(P) \Rightarrow \emptyset

isDir(FS,P) \Rightarrow  \left\{ \forall C \in children(FS, P) : Filter(C) \right} 
{code}
This is actually harder to read. 

One other tactic could be to move the specification entirely to javadocs, with something at
the package level for
the core model & notation, then add precond/postcond specifics as preformatted areas in
the docs.

This would keep the spec by the signature, increase likelihood of maintenance, and for those
people who
understood the syntax, be able to understand what the methods do.
                
> Strictly define the expected behavior of filesystem APIs and write tests to verify compliance
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9361
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, test
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while HDFS gets
tested downstream, other filesystems, such as blobstore bindings, don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, which HADOOP-9258
shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, instead
of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, such as atomic-rename,
atomic-delete, umask, immediate-consistency -test methods can downgrade to skipped test cases
if a feature is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message