spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jungtaek Lim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0
Date Wed, 13 Feb 2019 10:03:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767008#comment-16767008
] 

Jungtaek Lim commented on SPARK-26777:
--------------------------------------

I actually missed reporter's last comment. Looks like it is the case of former: so I'd rather
just say Apache community is not responsible to resolve any issue on versions which are specific
to packaging vendors and cloud vendors. So if you mind reproducing it in Apache version, you
may need to just ask to AWS technical support rather than complaining to Apache community.
What you're paying to AWS includes the right to ask to AWS for technical issue.

> SQL worked in 2.3.2 and fails in 2.4.0
> --------------------------------------
>
>                 Key: SPARK-26777
>                 URL: https://issues.apache.org/jira/browse/SPARK-26777
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yuri Budilov
>            Priority: Major
>
> Following SQL worked in Spark 2.3.2 and now fails on 2.4.0 (AWS EMR Spark)
> ------------ PySpark call below:
> spark.sql("select partition_year_utc,partition_month_utc,partition_day_utc \
> from datalake_reporting.copy_of_leads_notification \
> where partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification)
\
> and partition_month_utc = \
>  (select max(partition_month_utc) from datalake_reporting.copy_of_leads_notification
as m \
>  where \
>  m.partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification))
\
>  and partition_day_utc = (select max(d.partition_day_utc) from datalake_reporting.copy_of_leads_notification
as d \
>  where d.partition_month_utc = \
>  (select max(m1.partition_month_utc) from datalake_reporting.copy_of_leads_notification
as m1 \
>  where m1.partition_year_utc = \
>  (select max(y.partition_year_utc) from datalake_reporting.copy_of_leads_notification
as y) \
>  ) \
>  ) \
>  order by 1 desc, 2 desc, 3 desc limit 1 ").show(1,False)
> Error: (no need for data, this is syntax).
> py4j.protocol.Py4JJavaError: An error occurred while calling o1326.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#4495
[]
>  
> Note: all 3 columns in query are Partitioned columns - see bottom of the schema)
>  
> Hive EMR AWS Schema is:
>  
> CREATE EXTERNAL TABLE `copy_of_leads_notification`(
> `message.environment.siteorigin` string, `dcpheader.dcploaddateutc` string, `message.id`
int, `source.properties._country` string, `message.created` string, `dcpheader.generatedmessageid`
string, `message.tags` bigint, `source.properties._enqueuedtimeutc` string, `source.properties._leadtype`
string, `message.itemid` string, `message.prospect.postcode` string, `message.prospect.email`
string, `message.referenceid` string, `message.item.year` string, `message.identifier` string,
`dcpheader.dcploadmonthutc` string, `message.processed` string, `source.properties._tenant`
string, `message.item.price` string, `message.subscription.confirmresponse` boolean, `message.itemtype`
string, `message.prospect.lastname` string, `message.subscription.insurancequote` boolean,
`source.exchangename` string, `message.prospect.identificationnumbers` bigint, `message.environment.ipaddress`
string, `dcpheader.dcploaddayutc` string, `source.properties._itemtype` string, `source.properties._requesttype`
string, `message.item.make` string, `message.prospect.firstname` string, `message.subscription.survey`
boolean, `message.prospect.homephone` string, `message.extendedproperties` bigint, `message.subscription.financequote`
boolean, `message.uniqueidentifier` string, `source.properties._id` string, `dcpheader.sourcemessageguid`
string, `message.requesttype` string, `source.routingkey` string, `message.service` string,
`message.item.model` string, `message.environment.pagesource` string, `source.source` string,
`message.sellerid` string, `partition_date_utc` string, `message.selleridentifier` string,
`message.subscription.newsletter` boolean, `dcpheader.dcploadyearutc` string, `message.leadtype`
string, `message.history` bigint, `message.callconnect.calloutcome` string, `message.callconnect.datecreatedutc`
string, `message.callconnect.callrecordingurl` string, `message.callconnect.transferoutcome`
string, `message.callconnect.hiderecording` boolean, `message.callconnect.callstartutc` string,
`message.callconnect.code` string, `message.callconnect.callduration` string, `message.fraudnetinfo`
string, `message.callconnect.answernumber` string, `message.environment.sourcedevice` string,
`message.comments` string, `message.fraudinfo.servervariables` bigint, `message.callconnect.servicenumber`
string, `message.callconnect.callid` string, `message.callconnect.voicemailurl` string, `message.item.stocknumber`
string, `message.callconnect.answerduration` string, `message.callconnect.callendutc` string,
`message.item.series` string, `message.item.detailsurl` string, `message.item.pricetype` string,
`message.item.description` string, `message.item.colour` string, `message.item.badge` string,
`message.item.odometer` string, `message.environment.requestheader` string, `message.item.registrationnumber`
string, `message.item.bodytype` string, `message.item.fueltype` string, `message.item.redbookcode`
string, `message.item.spotid` string, `message.item.id` string, `message.item.transmission`
string, `message.item.vin` string, `message.item.enginedescription` string, `message.prospect.mobilephone`
string, `message.prospect.membertrackingid` string, `message.environment.username` string,
`message.prospect.workphone` string, `message.environment.servername` string, `message.environment.sessionid`
string, `message.tradein.type` string, `message.tradein.model` string, `message.tradein.year`
string, `message.tradein.make` string, `message.tradein.kms` string, `message.fraudinfo.servertimestamp`
string, `message.prospect.suburb` string, `message.callconnect.username` string, `message.callconnect.password`
string, `message.status` string, `message.tradein.colour` string, `message.prospect.address`
string, `message.prospect.state` string, `message.tradein.detailsurl` string, `message.prospect.faxnumber`
string, `message.prospect.companyname` string, `message.prospect.title` string, `message.callconnect.calloutcometext`
string, `message.prospect.preferredcontacttime` string, `message.fraudinfo.devicedata` string,
`message.prospect.preferredcontactmethod` string, `message.assignment.assigned` string, `message.assignment.email`
string, `message.assignment.name` string, `message.country` string, `message.financepackage.id`
string, `message.financepackage.version` string)
> PARTITIONED BY ( `partition_year_utc` string, `partition_month_utc` string, `partition_day_utc`
string, `job_run_guid` string)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED
AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION 's3://datalake/yurib_test/leads_notification'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message