spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: Parquet files are only 6-20MB in size?
Date Mon, 03 Nov 2014 18:21:33 GMT
Befire saveAsParquetFile(), you can call coalesce(N), then you will
have N files,
it will keep the order as before (repartition() will not).


On Mon, Nov 3, 2014 at 1:16 AM, ag007 <agrealy@mac.com> wrote:
> Thanks Akhil,
>
> Am I right in saying that the repartition will spread the data randomly so I
> loose chronological order?
>
> I really just want the csv --> parquet format in the same order it came in.
> If I set repartition with 1 will this not be random?
>
> cheers,
> Ag
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-files-are-only-6-20MB-in-size-tp17935p17941.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message