spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (Jira)" <>
Subject [jira] [Resolved] (SPARK-30964) Accelerate InMemoryStore with a new index
Date Mon, 02 Mar 2020 10:58:00 GMT


Wenchen Fan resolved SPARK-30964.
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 27716

> Accelerate InMemoryStore with a new index
> -----------------------------------------
>                 Key: SPARK-30964
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Web UI
>    Affects Versions: 3.1.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.0.0
> Spark uses the class `InMemoryStore` as the KV storage for live UI and history server(by
default if no LevelDB file path is provided).
> In `InMemoryStore`, all the task data in one application is stored in a hashmap, which
key is the task ID and the value is the task data. This fine for getting or deleting with
a provided task ID.
> However, Spark stage UI always shows all the task data in one stage and the current implementation
is to look up all the values in the hashmap. The time complexity is O(numOfTasks). 
> Also, when there are too many stages (>spark.ui.retainedStages), Spark will linearly
try to look up all the task data of the stages to be deleted as well.
> This can be very bad for a large application with many stages and tasks. We can improve
it by allowing the natural key of an entity to have a real parent index. So that on each lookup
with parent node provided, Spark can look up all the natural keys(in our case, the task IDs)
first, and then find the data with the natural keys in the hashmap.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message