hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zsolt Venczel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
Date Mon, 04 Jun 2018 09:06:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499923#comment-16499923

Zsolt Venczel commented on HADOOP-15217:

Thanks [~xiaochen] for sharing your thoughts in this matter.

>From a code readability and issue debugging perspective I completely agree with you and
I think the problem is twofold:
1) How readable the code is and how much help we provide for the developer while debugging
the code.
2) How easy it is to triage a unit test failure which is quite important from a code maintainability
perspective on the long run: if I see a unit test producing an error I can think of many things
but if it explicitly fails I can assume more safely that some recent change introduced a regression.

>From the second perspective the response to your points:
??This is wrapping a code that's supposed to pass. What's the boundary of wrapping this line
v.s. wrapping some other lines???
The limit should be as narrow to the tested code as possible. Wrapping more lines would not
be meaningful from the actual tests perspective.
??Wrapping here in the test doesn't give more information. My assumption is a developer seeing
the IOE from myUrl2.openStream(); would know this means that the openStream failed.??
I was trying to defined the test case to fail on openStream only if the fix is missing and
if I managed to set it up correctly it fails only in case of a true regression. Nothing is
perfect though, I agree.
??By Assert.fail with ex.getMessage(), we actually lose some information, which is the stacktrace
of the IOE. Without the assert, the IOE will fail the test case, presenting us both its message
and its stacktrace??
This is a really good point. I updated the patch to include the cause of the exception.
??Let's assume there is an IOE from this in the future. It feels to me it won't be too hard
to figure out what's been tested here (since there are code comments, and one can also git
Currently in the hadoop code base quite a lot of unit tests are failing at each test suite
run. If it's an error one might think it's flaky and treat it as so. If it's a failure that
feels to be more intentional and one might be more careful and look for a regression.

Please let me know if this makes any sense but I can go happily by your suggestions as well.

> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> ----------------------------------------------------------------------
>                 Key: HADOOP-15217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Joseph Fourny
>            Assignee: Zsolt Venczel
>            Priority: Major
>         Attachments: HADOOP-15217.01.patch, HADOOP-15217.02.patch, HADOOP-15217.03.patch,
HADOOP-15217.04.patch, TestCase.java
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when Spark is
initialized), it breaks URLs with spaces (even though they are properly URI-encoded). I traced
the problem down to _FSUrlConnection.connect()_ method. It naively gets the path from the
URL, which contains encoded spaces, and pases it to _org.apache.hadoop.fs.Path(String)_ constructor.
This is not correct, because the docs clearly say that the string must NOT be encoded. Doing
so causes double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics an issue I ran into when trying to use Commons Configuration 1.9
AFTER initializing Spark. Commons Configuration uses URL class to load configuration files,
but Spark installs _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using
an AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by _org.apache.hadoop.fs.Path(String)_ constructor.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message