spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Loading tables using parquetFile vs. loading tables from Hive metastore with Parquet serde
Date Mon, 16 Feb 2015 08:29:37 GMT
Hi Jianshi,

When accessing a Hive table with Parquet SerDe, Spark SQL tries to convert
it into Spark SQL's native Parquet support for better performance. And yes,
predicate push-down, column pruning are applied here. In 1.3.0, we'll also
cover the write path except for writing partitioned table.

Cheng

On Sun Feb 15 2015 at 9:22:15 AM Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Hi,
>
> If I have a table in Hive metastore saved as Parquet, and I want to use it
> in Spark. It seems Spark will use Hive's Parquet serde to load the actual
> data.
>
> So is there any difference here? Will predicate pushdown, pruning and
> future Parquet optimizations in SparkSQL work for using Hive serde?
>
> Loading tables using parquetFile vs. loading tables from Hive metastore
> with Parquet serde
>
>
> Thanks,
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
View raw message