spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nancy henry <nancyhenry6...@gmail.com>
Subject Re: spark-sql use case beginner question
Date Thu, 09 Mar 2017 07:28:10 GMT
okay what is difference between keep set hive.execution.engine =spark
and
running the script through hivecontext.sql

Show quoted text


On Mar 9, 2017 8:52 AM, "ayan guha" <guha.ayan@gmail.com> wrote:

> Hi
>
> Subject to your version of Hive & Spark, you may want to set
> hive.execution.engine=spark as beeline command line parameter, assuming you
> are running hive scripts using beeline command line (which is suggested
> practice for security purposes).
>
>
>
> On Thu, Mar 9, 2017 at 2:09 PM, nancy henry <nancyhenry6542@gmail.com>
> wrote:
>
>>
>> Hi Team,
>>
>> basically we have all data as hive tables ..and processing it till now in
>> hive on MR.. now that we have hivecontext which can run hivequeries on
>> spark, we are making all these complex hive scripts to run using
>> hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically
>> running hive queries on spark and not coding anything yet in scala still we
>> see just making hive queries to run on spark is showing a lot difference in
>> time than run on MR..
>>
>> so as we already have hivescripts lets make those complex hivescript run
>> using hc.sql as hc.sql is able to do it
>>
>> or is this not best practice even though spark can do it its still better
>> to load all those individual hive tables in spark and make rdds and write
>> scala code to get the same functionality happening in hive
>>
>> its becoming difficult for us to choose whether to leave it to hc.sql to
>> do the work of running complex scripts also or we have to code in
>> scala..will it be worth the effort of manual intervention in terms of
>> performance
>>
>> ex of our sample scripts
>> use db;
>> create tempfunction1 as com.fgh.jkl.TestFunction;
>>
>> create destable in hive;
>> insert overwrite desttable select (big complext transformations and usage
>> of hive udf)
>> from table1,table2,table3 join table4 on some condition complex and join
>> table 7 on another complex condition where complex filtering
>>
>> So please help what would be best approach and why i should not give
>> entire script for hivecontext to make its own rdds and run on spark if we
>> are able to do it
>>
>> coz all examples i see online are only showing hc.sql("select * from
>> table1) and nothing complex than that
>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message