spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: Hive From Spark: Jdbc VS sparkContext
Date Fri, 13 Oct 2017 07:22:37 GMT
> In case a table has a few
> million records, it all goes through the driver.

This sounds clear in JDBC mode, the driver get all the rows and then it
spreads the RDD over the executors.

I d'say that most use cases deal with SQL to aggregate huge datasets,
and retrieve small amount of rows to be then transformed for ML tasks.
Then using JDBC offers the robustness of HIVE to produce a small aggregated
dataset into spark. While using SPARK SQL uses RDD to produce the small
one from huge.

Not very clear how SPARK SQL deal with huge HIVE table. Does it load
everything into memory and crash, or does this never happend?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message