spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chanh Le <giaosu...@gmail.com>
Subject Re: Best practises to storing data in Parquet files
Date Mon, 29 Aug 2016 02:23:58 GMT
> Does parquet file has limit in size ( 1TB ) ? 
I did’t see any problem but 1TB is too big to operation need to divide into small pieces.
> Should we use SaveMode.APPEND for long running streaming app ?
Yes, but you need to partition it by time so it easy to maintain like update or delete a specific
time without rebuild them all.
> How should we store in HDFS (directory structure, ... )?
Should partition the file into small pieces.


> On Aug 28, 2016, at 9:43 PM, Kevin Tran <kevintvh@gmail.com> wrote:
> 
> Hi,
> Does anyone know what is the best practises to store data to parquet file?
> Does parquet file has limit in size ( 1TB ) ? 
> Should we use SaveMode.APPEND for long running streaming app ?
> How should we store in HDFS (directory structure, ... )?
> 
> Thanks,
> Kevin.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message