spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: spark sql StackOverflowError
Date Sat, 29 May 2021 19:34:08 GMT
On top of my head have you tried windowing function, something like below

sqltext = """
SELECT distinct *
      SELECT security
      , MIN(price) OVER (PARTITION BY security) AS MINPrice
      , AVG(price) OVER (PARTITION BY security) AS AVGPric
      , MAX(price) OVER (PARTITION BY security) AS MAXPrice
      FROM test.prices WHERE security > " "
     ) tmp
ORDER by security

Replace price with date and security with tag_id


   view my Linkedin profile

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Sat, 29 May 2021 at 14:53, Deemo <> wrote:

> When I use spark SQL in hive, I got en error like that
> The SQL query what is executed is here.The table is partitioned by tag_id
> and day, I want to get the last day in each tag_ids.
> select tag_id,max(day) as last
> from dim.dim_t_group_result
> group by tag_id
> In dev environment I resolve it by set the following property.But it
> doesn't work in real-environment. The dev only has 1000 partitions but the
> real has 80000+
> spark.executor.extraJavaOptions="-Xss60M"
> The memory of executor is 6G and 4 v-core.
> Current I use an other way to get what I wanted. But it's to slow.
> select /*+ BROADCAST(b) */ cast(a.tag_id as string),max( as last
> from dim.dim_t_group_tag_task_result a
> join tag_id_list b on a.tag_id = b.tag_id
> group by a.tag_id
> How could I do something with it? Thanks guys

View raw message