spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Best way to Hive to Spark migration
Date Thu, 05 Apr 2018 07:02:11 GMT
And the usual hint when migrating - do not migrate only but also optimize the ETL process design
- this brings the most benefit s

> On 5. Apr 2018, at 08:18, Jörn Franke <jornfranke@gmail.com> wrote:
> 
> Ok this is not much detail, but you are probably best off if you migrate them to SparkSQL.
> 
> Depends also on the Hive version and Spark version. If you have a recent one with TEZ+llap
I would not expect so much difference. It can be also less performant -Spark SQL got only
recently some features suchst cost based optimizer.
> 
>> On 5. Apr 2018, at 08:02, Pralabh Kumar <pralabhkumar@gmail.com> wrote:
>> 
>> Hi 
>> 
>> I have lot of ETL jobs (complex ones) , since they are SLA critical , I am planning
them to migrate to spark.
>> 
>>> On Thu, Apr 5, 2018 at 10:46 AM, Jörn Franke <jornfranke@gmail.com> wrote:
>>> You need to provide more context on what you do currently in Hive and what do
you expect from the migration.
>>> 
>>>> On 5. Apr 2018, at 05:43, Pralabh Kumar <pralabhkumar@gmail.com> wrote:
>>>> 
>>>> Hi Spark group
>>>> 
>>>> What's the best way to Migrate Hive to Spark
>>>> 
>>>> 1) Use HiveContext of Spark
>>>> 2) Use Hive on Spark (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
>>>> 3) Migrate Hive to Calcite to Spark SQL
>>>> 
>>>> 
>>>> Regards
>>>> 
>> 

Mime
View raw message