spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <>
Subject Re: Spark Thrift Server Concurrency
Date Thu, 23 Jun 2016 17:42:25 GMT
There are  a lot of moving parts and a lot of unknowns from your description. 
Besides the version stuff. 

How many executors, how many cores? How much memory? 
Are you persisting (memory and disk) or just caching (memory) 

During the execution… same tables… are  you seeing a lot of shuffling of data for some
queries and not others? 

It sounds like an interesting problem… 

> On Jun 23, 2016, at 5:21 AM, Prabhu Joseph <> wrote:
> Hi All,
>    On submitting 20 parallel same SQL query to Spark Thrift Server, the query execution
time for some queries are less than a second and some are more than 2seconds. The Spark Thrift
Server logs shows all 20 queries are submitted at same time 16/06/23 12:12:01 but the result
schema are at different times.
> 16/06/23 12:12:01 INFO SparkExecuteStatementOperation: Running query 'select distinct
val2 from philips1 where key>=1000 and key<=1500
> 16/06/23 12:12:02 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(val2#2110)
> 16/06/23 12:12:03 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(val2#2182)
> 16/06/23 12:12:04 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(val2#2344)
> 16/06/23 12:12:05 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(val2#2362)
> There are sufficient executors running on YARN. The concurrency is affected by Single
Driver. How to improve the concurrency and what are the best practices.
> Thanks,
> Prabhu Joseph

View raw message