spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Hive From Spark: Jdbc VS sparkContext
Date Sun, 15 Oct 2017 13:34:44 GMT
Hi Nicolas,

what if the table has partitions and sub-partitions? And you do not want to
access the entire data?


Regards,
Gourav

On Sun, Oct 15, 2017 at 12:55 PM, Nicolas Paris <niparisco@gmail.com> wrote:

> Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> > I wonder the differences accessing HIVE tables in two different ways:
> > - with jdbc access
> > - with sparkContext
>
> Well there is also a third way to access the hive data from spark:
> - with direct file access (here ORC format)
>
>
> For example:
>
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> val people = sqlContext.read.format("orc").load("hdfs://cluster//orc_
> people")
> people.createOrReplaceTempView("people")
> sqlContext.sql("SELECT count(1) FROM people WHERE ...").show()
>
>
> This method looks much faster than both:
> - with jdbc access
> - with sparkContext
>
> Any experience on that ?
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message