spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <>
Subject Re: Writing wide parquet file in Spark SQL
Date Sun, 15 Mar 2015 17:10:17 GMT
This article by Ryan Blue should be helpful to understand the problem

The TL;DR is, you may decrease |parquet.block.size| to reduce memory 
consumption. Anyway, 100K columns is a really big burden for Parquet, 
but I guess your data should be pretty sparse.


On 3/11/15 4:13 AM, kpeng1 wrote:

> Hi All,
> I am currently trying to write a very wide file into parquet using spark
> sql.  I have 100K column records that I am trying to write out, but of
> course I am running into space issues(out of memory - heap space).  I was
> wondering if there are any tweaks or work arounds for this.
> I am basically calling saveAsParquetFile on the schemaRDD.
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message