spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Zsolt Piros (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-32149) Improve file path name normalisation at block resolution within the external shuffle service
Date Wed, 01 Jul 2020 13:32:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-32149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149440#comment-17149440
] 

Attila Zsolt Piros commented on SPARK-32149:
--------------------------------------------

I am working on this

> Improve file path name normalisation at block resolution within the external shuffle
service
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32149
>                 URL: https://issues.apache.org/jira/browse/SPARK-32149
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 3.0.1
>            Reporter: Attila Zsolt Piros
>            Priority: Major
>
> In the external shuffle service during the block resolution the file paths (for disk
persisted RDD and for shuffle blocks) are normalized by a custom Spark code which uses an
OS dependent regexp. This is a redundant code of the package-private JDK counterpart.
> As the code not a perfect match even it could happen one method results in a bit different
(but semantically equal) path. 
> The reason of this redundant transformation is the interning of the normalized path to
save some heap here which is only possible if both results in the same string.
> Checking the JDK code I believe there is a better solution which is perfect match for
the JDK code as it uses that package private method. Moreover based on some benchmarking even
this new method seams to be more performant too. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message