spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Spark SQL and number of task
Date Thu, 04 Aug 2016 11:25:49 GMT
Hi,

Please type `sqlCtx.sql("select * .... ").explain` to show execution plans.
Also, you can kill jobs from webUI.

// maropu


On Thu, Aug 4, 2016 at 4:58 PM, Marco Colombo <ing.marco.colombo@gmail.com>
wrote:

> Hi all, I've a question on how hive+spark are handling data.
>
> I've started a new HiveContext and I'm extracting data from cassandra.
> I've configured spark.sql.shuffle.partitions=10.
> Now, I've following query:
>
> select d.id, avg(d.avg) from v_points d where id=90 group by id;
>
> I see that 10 task are submitted and execution is fast. Every id on that
> table has 2000 samples.
>
> But if I just add a new id, as:
>
> select d.id, avg(d.avg) from v_points d where id=90 or id=2 group by id;
>
> it adds 663 task and query does not end.
>
> If I write query with in () like
>
> select d.id, avg(d.avg) from v_points d where id in (90,2) group by id;
>
> query is again fast.
>
> How can I get the 'execution plan' of the query?
>
> And also, how can I kill the long running submitted tasks?
>
> Thanks all!
>



-- 
---
Takeshi Yamamuro

Mime
View raw message