spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From patcharee <Patcharee.Thong...@uni.no>
Subject sql query orc slow
Date Thu, 08 Oct 2015 08:43:00 GMT
Hi,

I am using spark sql 1.5 to query a hive table stored as partitioned orc 
file. We have the total files is about 6000 files and each file size is 
about 245MB.

What is the difference between these two query methods below:

1. Using query on hive table directly

hiveContext.sql("select col1, col2 from table1")

2. Reading from orc file, register temp table and query from the temp table

val c = hiveContext.read.format("orc").load("/apps/hive/warehouse/table1")
c.registerTempTable("regTable")
hiveContext.sql("select col1, col2 from regTable")

When the number of files is large (query all from the total 6000 files) 
, the second case is much slower then the first one. Any ideas why?

BR,




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message