spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sourav Mazumder <sourav.mazumde...@gmail.com>
Subject Re: Spark SQL with Thrift Server is very very slow and finally failing
Date Wed, 10 Jun 2015 05:28:12 GMT
>From log file I noticed that the ExecutorLostFailure happens after the
memory used by Executor becomes more than the Executor memory value.
However, even if I increase the value of Executor Memory the Executor fails
- only that it takes longer time.

I'm wondering that for joining 2 Hive tables, one with 100 MB data (around
1 M rows) and another with 20 KB data (around 100 rows) why an executor is
consuming so much of memory. Even if I increase the memory to 20 GB. The
same failure happens.

Regards,
Sourav

On Tue, Jun 9, 2015 at 12:58 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi,
>
> I'm just doing a select statement which is supposed to return 10 MB data
> maximum. The driver memory is 2G and executor memory is 20 G.
>
> The query I'm trying to run is something like
>
> SELECT PROJECT_LIVE_DT, FLOORPLAN_NM, FLOORPLAN_DB_KEY
> FROM POG_PRE_EXT P, PROJECT_CALENDAR_EXT C
> WHERE PROJECT_TYPE = 'CR'
>
> Not sure what exactly you mean by physical plan.
>
> Here is he stack trace from the machine where the thrift process is
> running.
>
> Regards,
> Sourav
>
> On Mon, Jun 8, 2015 at 11:18 PM, Cheng, Hao <hao.cheng@intel.com> wrote:
>
>>  Is it the large result set return from the Thrift Server? And can you
>> paste the SQL and physical plan?
>>
>>
>>
>> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
>> *Sent:* Tuesday, June 9, 2015 12:01 PM
>> *To:* Sourav Mazumder
>> *Cc:* user
>> *Subject:* Re: Spark SQL with Thrift Server is very very slow and
>> finally failing
>>
>>
>>
>> Which Spark release are you using ?
>>
>>
>>
>> Can you pastebin the stack trace w.r.t. ExecutorLostFailure ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Mon, Jun 8, 2015 at 8:52 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>      Hi,
>>
>> I am trying to run a SQL form a JDBC driver using Spark's Thrift Server.
>>
>> I'm doing a join between a Hive Table of size around 100 GB and another
>> Hive Table with 10 KB, with a filter on a particular column
>>
>> The query takes more than 45 minutes and then I get ExecutorLostFailure.
>> That is because of memory as once I increase the memory the failure happens
>> but after a long time.
>>
>> I'm having executor memory 20 GB, Spark DRiver Memory 2 GB, Executor
>> Instances 2 and Executor Core 2.
>>
>> Running the job using Yarn with master as 'yarn-client'.
>>
>> Any idea if I'm missing any other configuration ?
>>
>> Regards,
>>
>> Sourav
>>
>>
>>
>
>

Mime
View raw message