spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Gaido (JIRA)" <>
Subject [jira] [Commented] (SPARK-22226) Code generation fails for dataframes with 10000 columns
Date Mon, 09 Oct 2017 16:36:00 GMT


Marco Gaido commented on SPARK-22226:

[~kiszk] I am not sure that the PR you mentioned solves the same issue. I tried it and currently
it doesn't.
As you can see in [the branch I prepared|]
what I am changing is different from what is done in that PR. Despite this, maybe that PR
will include also a solution to this, of course I don't know what it is going to be like.
As [~srowen] pointed out, I choose a bad title for the JIRA. I am updating it with a better

> Code generation fails for dataframes with 10000 columns
> -------------------------------------------------------
>                 Key: SPARK-22226
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Marco Gaido
> Code generation for very wide datasets can fail because of the Constant Pool limit reached.
> This can be caused by many reasons. One of them is that we are currently splitting the
definition of the generated methods among several {{NestedClass}} but all these methods are
called in the main class. Since we have entries added to the constant pool for each method
invocation, this is limiting the number of rows and is leading for very wide dataset to:
> {noformat}
> org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection
has grown past JVM limit of 0xFFFF
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message