spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: Hive From Spark: Jdbc VS sparkContext
Date Sun, 15 Oct 2017 11:55:11 GMT
Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext

Well there is also a third way to access the hive data from spark:
- with direct file access (here ORC format)


For example:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
val people = sqlContext.read.format("orc").load("hdfs://cluster//orc_people")
people.createOrReplaceTempView("people")
sqlContext.sql("SELECT count(1) FROM people WHERE ...").show()


This method looks much faster than both:
- with jdbc access
- with sparkContext

Any experience on that ?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message