spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pralabh Kumar <pralabhku...@gmail.com>
Subject Re: Hive on Spark vs Spark on Hive(HiveContext)
Date Thu, 01 Jul 2021 10:50:20 GMT
Hi mich

Thx for replying.your answer really helps. The comparison was done in 2016.
I would like to know the latest comparison with spark 3.0

Also what you are suggesting is to migrate queries to Spark ,which is
hivecontxt or hive on spark, which is what Facebook also did
. Is that understanding correct ?

Regards
Pralabh

On Thu, 1 Jul 2021, 15:44 Mich Talebzadeh, <mich.talebzadeh@gmail.com>
wrote:

> Hi Prahabh,
>
> This question has been asked before :)
>
> Few years ago (late 2016),  I made a presentation on running Hive Queries
> on the Spark execution engine for Hortonworks.
>
>
> https://www.slideshare.net/MichTalebzadeh1/query-engines-for-hive-mr-spark-tez-with-llap-considerations
>
> The issue you will face will be compatibility problems with versions of
> Hive and Spark.
>
> My suggestion would be to use Spark as a massive parallel processing and
> Hive as a storage layer. However, you need to test what can be migrated or
> not.
>
> HTH
>
>
> Mich
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 1 Jul 2021 at 10:52, Pralabh Kumar <pralabhkumar@gmail.com> wrote:
>
>> Hi Dev
>>
>> I am having thousands of legacy hive queries .  As a plan to move to
>> Spark , we are planning to migrate Hive queries on Spark .  Now there are
>> two approaches
>>
>>
>>    1.  One is Hive on Spark , which is similar to changing the execution
>>    engine in hive queries like TEZ.
>>    2. Another one is migrating hive queries to Hivecontext/sparksql , an
>>    approach used by Facebook and presented in Spark conference.
>>    https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql#:~:text=Spark%20SQL%20in%20Apache%20Spark,SQL%20with%20minimal%20user%20intervention
>>    .
>>
>>
>> Can you please guide me which option to go for . I am personally inclined
>> to go for option 2 . It also allows the use of the latest spark .
>>
>> Please help me on the same , as there are not much comparisons online
>> available keeping Spark 3.0 in perspective.
>>
>> Regards
>> Pralabh Kumar
>>
>>
>>

Mime
View raw message