hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
Date Mon, 04 Jun 2018 05:12:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499746#comment-16499746

Xiao Chen commented on HADOOP-15217:

Thanks Zsolt for revving and the discussion about testing.

In general I agree test code should be as explicit as possible, to make it easier to debug
when something fails. Looking at the specific code, however
try {
} catch (IOException ex) {
  Assert.fail("Unable to open stream to file: " + ex.getMessage());
IMO this is unnecessary and may make test failures harder to debug. :)
 * This is wrapping a code that's supposed to pass. What's the boundary of wrapping this line
v.s. wrapping some other lines?
 * Wrapping here in the test doesn't give more information. My assumption is a developer seeing
the IOE from {{myUrl2.openStream();}} would know this means that the openStream failed. :)
 * By {{Assert.fail}} with ex.getMessage(), we actually lose some information, which is the
stacktrace of the IOE. Without the assert, the IOE will fail the test case, presenting us
both its message and its stacktrace
 * Let's assume there is an IOE from this in the future. It feels to me it won't be too hard
to figure out what's been tested here (since there are code comments, and one can also git

If this were for a failure code, it would be completely reasonable to do something like:
try {
  Assert.fail("Open stream should fail because xxx");
} catch (IOException expected) {
  // check 'expected' is exactly the test wanted. Maybe by GenericTestUtils.assertExceptionContains
or equivalent

> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> ----------------------------------------------------------------------
>                 Key: HADOOP-15217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Joseph Fourny
>            Assignee: Zsolt Venczel
>            Priority: Major
>         Attachments: HADOOP-15217.01.patch, HADOOP-15217.02.patch, HADOOP-15217.03.patch,
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when Spark is
initialized), it breaks URLs with spaces (even though they are properly URI-encoded). I traced
the problem down to _FSUrlConnection.connect()_ method. It naively gets the path from the
URL, which contains encoded spaces, and pases it to _org.apache.hadoop.fs.Path(String)_ constructor.
This is not correct, because the docs clearly say that the string must NOT be encoded. Doing
so causes double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics an issue I ran into when trying to use Commons Configuration 1.9
AFTER initializing Spark. Commons Configuration uses URL class to load configuration files,
but Spark installs _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using
an AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by _org.apache.hadoop.fs.Path(String)_ constructor.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message