spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Drake <dondr...@gmail.com>
Subject Re: AVRO vs Parquet
Date Fri, 04 Mar 2016 04:29:09 GMT
My tests show Parquet has better performance than Avro in just about every
test.  It really shines when you are querying a subset of columns in a wide
table.

-Don

On Wed, Mar 2, 2016 at 3:49 PM, Timothy Spann <tim.spann@airisdata.com>
wrote:

> Which format is the best format for SparkSQL adhoc queries and general
> data storage?
>
> There are lots of specialized cases, but generally accessing some but not
> all the available columns with a reasonable subset of the data.
>
> I am learning towards Parquet as it has great support in Spark.
>
> I also have to consider any file on HDFS may be accessed from other tools
> like Hive, Impala, HAWQ.
>
> Suggestions?
> —
> airis.DATA
> Timothy Spann, Senior Solutions Architect
> C: 609-250-5894
> http://airisdata.com/
> http://meetup.com/nj-datascience
>
>
>


-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143

Mime
View raw message