spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-22330) Linear containsKey operation for serialized maps.
Date Sun, 22 Oct 2017 21:45:00 GMT
Alexander created SPARK-22330:
---------------------------------

             Summary: Linear containsKey operation for serialized maps.
                 Key: SPARK-22330
                 URL: https://issues.apache.org/jira/browse/SPARK-22330
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0, 1.2.1
            Reporter: Alexander


One of our production application which aggressively uses cached spark RDDs degraded after
increasing volumes of data though it shouldn't. Fast profiling session showed that the slowest
part was SerializableMapWrapper#containsKey: it delegates get and remove to actual implementation,
but containsKey is inherited from AbstractMap which is implemented in linear time via iteration
over whole keySet. A workaround was simple: replacing all containsKey with get(key) != null
solved the issue.

Nevertheless, it would be much simpler for everyone if the issue will be fixed once and for
all.
A fix is straightforward, delegate containsKey to actual implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message