spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <>
Subject [jira] [Resolved] (SPARK-6717) Clear shuffle files after checkpointing in ALS
Date Tue, 03 May 2016 07:19:13 GMT


Xiangrui Meng resolved SPARK-6717.
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request 11919

> Clear shuffle files after checkpointing in ALS
> ----------------------------------------------
>                 Key: SPARK-6717
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: holdenk
>              Labels: als
>             Fix For: 2.0.0
> In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle files. However,
whether to clean shuffle files depends on the system GC, which may not be triggered in ALS
iterations. So after checkpointing, before we let the RDD object go out of scope, we should
clean its shuffle dependencies explicitly. This function could either stay inside ALS or go
to Core.
> Without this feature, we can call System.gc() periodically to clean shuffle files of
RDDs that went out of scope.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message