spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-1912) Compression memory issue during reduce
Date Thu, 28 Aug 2014 08:08:57 GMT


Apache Spark commented on SPARK-1912:

User 'rxin' has created a pull request for this issue:

> Compression memory issue during reduce
> --------------------------------------
>                 Key: SPARK-1912
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>             Fix For: 0.9.2, 1.0.1, 1.1.0
> When we need to read a compressed block, we will first create a compress stream instance(LZF
or Snappy) and use it to wrap that block.
> Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare
to read that 1000 blocks, which means create 1000 compression stream instance to wrap them.
But the initialization of compression instance will allocate some memory and when we have
many compression instance at the same time, it is a problem.
> Actually reducer reads the shuffle blocks one by one, so why we create compression instance
at the first time? Can we do it lazily that when a block is first read, create compression
instance for it.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message