spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Hive From Spark: Jdbc VS sparkContext
Date Tue, 10 Oct 2017 11:29:30 GMT
That is not correct, IMHO. If I am not wrong, Spark will still load data in
executor, by running some stats on the data itself to identify
partitions....

On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 <guopengfei1987v@126.com> wrote:

>
> > 在 2017年10月4日,上午2:08,Nicolas Paris <niparisco@gmail.com> 写道:
> >
> > Hi
> >
> > I wonder the differences accessing HIVE tables in two different ways:
> > - with jdbc access
> > - with sparkContext
> >
> > I would say that jdbc is better since it uses HIVE that is based on
> > map-reduce / TEZ and then works on disk.
> > Using spark rdd can lead to memory errors on very huge datasets.
> >
> >
> > Anybody knows or can point me to relevant documentation ?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
> The jdbc will load data into the driver node, this may slow down the
> speed,and may OOM.
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Mime
View raw message