spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Avro or Parquet ?
Date Sun, 07 Jun 2015 13:46:54 GMT
Usually Parquet can be more efficient because of its columnar nature. 
Say your table has 10 columns but your join query only touches 3 of 
them, Parquet only reads those 3 columns from disk while Avro must load 
all data.

Cheng

On 6/5/15 3:00 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> We currently have data in avro format and we do joins between avro and 
> sequence file data.
> Will storing these datasets in Parquet make joins any faster ?
>
> The dataset sizes are beyond are between 500 to 1000 GB.
> -- 
> Deepak
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message