spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antony Mayi (JIRA)" <>
Subject [jira] [Commented] (SPARK-6717) Clear shuffle files after checkpointing in ALS
Date Tue, 01 Dec 2015 00:51:10 GMT


Antony Mayi commented on SPARK-6717:

this seems to be even bigger problem in 1.5 as the workaround from SPARK-6334 is no longer
working (pushing GC doesn't trigger the cleanup).

> Clear shuffle files after checkpointing in ALS
> ----------------------------------------------
>                 Key: SPARK-6717
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>              Labels: als
> In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle files. However,
whether to clean shuffle files depends on the system GC, which may not be triggered in ALS
iterations. So after checkpointing, before we let the RDD object go out of scope, we should
clean its shuffle dependencies explicitly. This function could either stay inside ALS or go
to Core.
> Without this feature, we can call System.gc() periodically to clean shuffle files of
RDDs that went out of scope.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message