spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "t oo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0
Date Tue, 12 Feb 2019 22:10:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766526#comment-16766526
] 

t oo commented on SPARK-26777:
------------------------------

[~yuri.budilov] - can you try in spark-shell instead of pyspark? Also with plain spark instead
of EMR spark?

> SQL worked in 2.3.2 and fails in 2.4.0
> --------------------------------------
>
>                 Key: SPARK-26777
>                 URL: https://issues.apache.org/jira/browse/SPARK-26777
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yuri Budilov
>            Priority: Major
>
> Following SQL worked in Spark 2.3.2 and now fails on 2.4.0 (AWS EMR Spark)
> ------------ PySpark call below:
> spark.sql("select partition_year_utc,partition_month_utc,partition_day_utc \
> from datalake_reporting.copy_of_leads_notification \
> where partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification)
\
> and partition_month_utc = \
>  (select max(partition_month_utc) from datalake_reporting.copy_of_leads_notification
as m \
>  where \
>  m.partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification))
\
>  and partition_day_utc = (select max(d.partition_day_utc) from datalake_reporting.copy_of_leads_notification
as d \
>  where d.partition_month_utc = \
>  (select max(m1.partition_month_utc) from datalake_reporting.copy_of_leads_notification
as m1 \
>  where m1.partition_year_utc = \
>  (select max(y.partition_year_utc) from datalake_reporting.copy_of_leads_notification
as y) \
>  ) \
>  ) \
>  order by 1 desc, 2 desc, 3 desc limit 1 ").show(1,False)
> Error: (no need for data, this is syntax).
> py4j.protocol.Py4JJavaError: An error occurred while calling o1326.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#4495
[]
>  
> Note: all 3 columns in query are Partitioned columns - see bottom of the schema)
>  
> Hive EMR AWS Schema is:
>  
> CREATE EXTERNAL TABLE `copy_of_leads_notification`(
> `message.environment.siteorigin` string, `dcpheader.dcploaddateutc` string, `message.id`
int, `source.properties._country` string, `message.created` string, `dcpheader.generatedmessageid`
string, `message.tags` bigint, `source.properties._enqueuedtimeutc` string, `source.properties._leadtype`
string, `message.itemid` string, `message.prospect.postcode` string, `message.prospect.email`
string, `message.referenceid` string, `message.item.year` string, `message.identifier` string,
`dcpheader.dcploadmonthutc` string, `message.processed` string, `source.properties._tenant`
string, `message.item.price` string, `message.subscription.confirmresponse` boolean, `message.itemtype`
string, `message.prospect.lastname` string, `message.subscription.insurancequote` boolean,
`source.exchangename` string, `message.prospect.identificationnumbers` bigint, `message.environment.ipaddress`
string, `dcpheader.dcploaddayutc` string, `source.properties._itemtype` string, `source.properties._requesttype`
string, `message.item.make` string, `message.prospect.firstname` string, `message.subscription.survey`
boolean, `message.prospect.homephone` string, `message.extendedproperties` bigint, `message.subscription.financequote`
boolean, `message.uniqueidentifier` string, `source.properties._id` string, `dcpheader.sourcemessageguid`
string, `message.requesttype` string, `source.routingkey` string, `message.service` string,
`message.item.model` string, `message.environment.pagesource` string, `source.source` string,
`message.sellerid` string, `partition_date_utc` string, `message.selleridentifier` string,
`message.subscription.newsletter` boolean, `dcpheader.dcploadyearutc` string, `message.leadtype`
string, `message.history` bigint, `message.callconnect.calloutcome` string, `message.callconnect.datecreatedutc`
string, `message.callconnect.callrecordingurl` string, `message.callconnect.transferoutcome`
string, `message.callconnect.hiderecording` boolean, `message.callconnect.callstartutc` string,
`message.callconnect.code` string, `message.callconnect.callduration` string, `message.fraudnetinfo`
string, `message.callconnect.answernumber` string, `message.environment.sourcedevice` string,
`message.comments` string, `message.fraudinfo.servervariables` bigint, `message.callconnect.servicenumber`
string, `message.callconnect.callid` string, `message.callconnect.voicemailurl` string, `message.item.stocknumber`
string, `message.callconnect.answerduration` string, `message.callconnect.callendutc` string,
`message.item.series` string, `message.item.detailsurl` string, `message.item.pricetype` string,
`message.item.description` string, `message.item.colour` string, `message.item.badge` string,
`message.item.odometer` string, `message.environment.requestheader` string, `message.item.registrationnumber`
string, `message.item.bodytype` string, `message.item.fueltype` string, `message.item.redbookcode`
string, `message.item.spotid` string, `message.item.id` string, `message.item.transmission`
string, `message.item.vin` string, `message.item.enginedescription` string, `message.prospect.mobilephone`
string, `message.prospect.membertrackingid` string, `message.environment.username` string,
`message.prospect.workphone` string, `message.environment.servername` string, `message.environment.sessionid`
string, `message.tradein.type` string, `message.tradein.model` string, `message.tradein.year`
string, `message.tradein.make` string, `message.tradein.kms` string, `message.fraudinfo.servertimestamp`
string, `message.prospect.suburb` string, `message.callconnect.username` string, `message.callconnect.password`
string, `message.status` string, `message.tradein.colour` string, `message.prospect.address`
string, `message.prospect.state` string, `message.tradein.detailsurl` string, `message.prospect.faxnumber`
string, `message.prospect.companyname` string, `message.prospect.title` string, `message.callconnect.calloutcometext`
string, `message.prospect.preferredcontacttime` string, `message.fraudinfo.devicedata` string,
`message.prospect.preferredcontactmethod` string, `message.assignment.assigned` string, `message.assignment.email`
string, `message.assignment.name` string, `message.country` string, `message.financepackage.id`
string, `message.financepackage.version` string)
> PARTITIONED BY ( `partition_year_utc` string, `partition_month_utc` string, `partition_day_utc`
string, `job_run_guid` string)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED
AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION 's3://datalake/yurib_test/leads_notification'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message