spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Hive From Spark: Jdbc VS sparkContext
Date Sun, 15 Oct 2017 17:24:18 GMT
Hi Nicolas,

without the hive thrift server, if you try to run a select * on a table
which has around 10,000 partitions, SPARK will give you some surprises.
PRESTO works fine in these scenarios, and I am sure SPARK community will
soon learn from their algorithms.


Regards,
Gourav

On Sun, Oct 15, 2017 at 3:43 PM, Nicolas Paris <niparisco@gmail.com> wrote:

> > I do not think that SPARK will automatically determine the partitions.
> Actually
> > it does not automatically determine the partitions. In case a table has
> a few
> > million records, it all goes through the driver.
>
> Hi Gourav
>
> Actualy spark jdbc driver is able to deal direclty with partitions.
> Sparks creates a jdbc connection for each partition.
>
> All details explained in this post :
> http://www.gatorsmile.io/numpartitionsinjdbc/
>
> Also an example with greenplum database:
> http://engineering.pivotal.io/post/getting-started-with-greenplum-spark/
>

Mime
View raw message