spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meetu Maltiar <meetu.malt...@gmail.com>
Subject Re: Handling Hive Table With large number of rows
Date Mon, 08 Feb 2016 06:45:15 GMT
Thanks Jörn,

We have to construct an XML on HDFS location from couple of Hive tables and
they join on one key.
The data in both tables we have to join is large. Was wondering for the
right approach.
XML creation will also be tricky as we cannot hold objects in memory.
Old Spark 1.2.1 is a bummer, sure no one can justify.





On Mon, Feb 8, 2016 at 11:53 AM, Jörn Franke <jornfranke@gmail.com> wrote:

> Can you provide more details? Your use case does not sound you need Spark.
> Your version is anyway too old. It does not make sense to develop now with
> 1.2.1 . There is no "project limitation" that is able to justify this.
>
> > On 08 Feb 2016, at 06:48, Meetu Maltiar <meetu.maltiar@gmail.com> wrote:
> >
> > Hi,
> >
> > I am working on an application that reads a single Hive Table and do
> some manipulations on each row of it. Finally construct an XML.
> > Hive table will be a large data set, no chance to fit it in memory. I
> intend to use SparkSQL 1.2.1 (due to project limitations).
> > Any pointers to me on handling this large data-set will be helpful
> (Fetch Size….).
> >
> > Thanks in advance.
> >
> > Kind Regards,
> > Meetu Maltiar
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>



-- 
Meetu Maltiar,
Mobile-09717005168

Mime
View raw message