spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <>
Subject [jira] [Updated] (SPARK-2532) Fix issues with consolidated shuffle
Date Fri, 01 Aug 2014 20:59:39 GMT


Matei Zaharia updated SPARK-2532:

    Fix Version/s:     (was: 1.1.0)

> Fix issues with consolidated shuffle
> ------------------------------------
>                 Key: SPARK-2532
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>         Environment: All
>            Reporter: Mridul Muralidharan
>            Assignee: Mridul Muralidharan
>            Priority: Critical
> Will file PR with changes as soon as merge is done (earlier merge became outdated in
2 weeks unfortunately :) ).
> Consolidated shuffle is broken in multiple ways in spark :
> a) Task failure(s) can cause the state to become inconsistent.
> b) Multiple revert's or combination of close/revert/close can cause the state to be inconsistent.
> (As part of exception/error handling).
> c) Some of the api in block writer causes implementation issues - for example: a revert
is always followed by close : but the implemention tries to keep them separate, resulting
in surface for errors.
> d) Fetching data from consolidated shuffle files can go badly wrong if the file is being
actively written to : it computes length by subtracting next offset from current offset (or
length if this is last offset)- the latter fails when fetch is happening in parallel to write.
> Note, this happens even if there are no task failures of any kind !
> This usually results in stream corruption or decompression errors.

This message was sent by Atlassian JIRA

View raw message