spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Nadjib MAMI <m...@iai.uni-bonn.de>
Subject Re: Spark SQL - java.lang.StackOverflowError after caching table
Date Thu, 24 Mar 2016 22:33:07 GMT
I ran:

sqlContext.cacheTable("product")
var df = sqlContext.sql("...complex query...")
df.explain(true)

...and obtained: http://pastebin.com/k9skERsr

...where "[...]" corresponds therein to huge lists of records from the 
addressed table (product)

The query is of the following form:
"SELECT distinct p.id, p.`aaa`, p.`bbb` FROM product p, (SELECT distinct 
p1.id FROM product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p3.id 
FROM product p3 WHERE p3.`ccc`='ddd') p4 WHERE p.`eee` = '1' AND 
p.id=p2.id AND p.`eee` > 137 AND p4.id=p.id UNION SELECT distinct 
p.id,p.`bbb`, p.`bbb` FROM product p, (SELECT distinct p1.id FROM 
product p1 WHERE p1.`ccc`='fff') p2, (SELECT distinct p5.id FROM product 
p5 WHERE p5.`ccc`='ggg') p6 WHERE p.`eee` = '1' AND p.id=p2.id AND 
p.`hhh` > 93 AND p6.id=p.id ORDER BY p.`bbb` LIMIT 10"


On 24.03.2016 22:16, Ted Yu wrote:
> Can you obtain output from explain(true) on the query after 
> cacheTable() call ?
>
> Potentially related JIRA:
>
> [SPARK-13657] [SQL] Support parsing very long AND/OR expressions
>
>
> On Thu, Mar 24, 2016 at 12:55 PM, Mohamed Nadjib MAMI 
> <mami@iai.uni-bonn.de <mailto:mami@iai.uni-bonn.de>> wrote:
>
>     Here is the stack trace: http://pastebin.com/ueHqiznH
>
>     Here's the code:
>
>         val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>         val table = sqlContext.read.parquet("hdfs://...parquet_table")
>         table.registerTempTable("table")
>
>         sqlContext.sql("...complex query...").show() /** works */
>
>         sqlContext.cacheTable("table")
>
>         sqlContext.sql("...complex query...").show() /** works */
>
>         sqlContext.sql("...complex query...").show() /** fails */
>
>
>
>     On 24.03.2016 13:40, Ted Yu wrote:
>>     Can you pastebin the stack trace ?
>>
>>     If you can show snippet of your code, that would help give us more clue.
>>
>>     Thanks
>>
>>>     On Mar 24, 2016, at 2:43 AM, Mohamed Nadjib MAMI<mami@iai.uni-bonn.de>
<mailto:mami@iai.uni-bonn.de>  wrote:
>>>
>>>     Hi all,
>>>     I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a
problem with table caching (sqlContext.cacheTable()), using spark-shell of Spark 1.5.1.
>>>
>>>     After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes
longer the first time (well, for the lazy execution reason) but it finishes and returns results.
However, the weird thing is that after I run the same query again, I get the error: "java.lang.StackOverflowError".
>>>
>>>     I Googled it but didn't find the error appearing with table caching and querying.
>>>     Any hint is appreciated.
>>>
>>>
>>>     -- 
>>>     Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις,
问
>>>     候, تحياتي. Mohamed Nadjib Mami
>>>     PhD Student - EIS Department - Bonn University, Germany.
>>>     Website <http://www.mohamednadjibmami.com>.
>>>     LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.
>
>
>
> -- 
> Regards, Grüße, Cordialement, Recuerdos, Saluti, προσρήσεις, 问候, 
> تحياتي. Mohamed Nadjib Mami
> PhD Student - EIS Department - Bonn University, Germany.
> Website <http://www.mohamednadjibmami.com>.
> LinkedIn <http://fr.linkedin.com/in/mohamednadjibmami>.

Mime
View raw message