spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Faisal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19519) Groupby for multiple columns not working
Date Wed, 08 Feb 2017 21:36:41 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Faisal updated SPARK-19519:
---------------------------
    Description: 
        DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
        		.join(moduleCodeDf.as("mc"), moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
        		.join(dictDfCharCode.as("dc"), dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
        		.join(dictDfIsAChar, dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
          		

        joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
        		col("dc.propVal").as("mcaCtypeCode"),
        		max(col("mod.updatedDate")).as("mcaLastChangedDate"),
        		coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
                         max(when(col("mndtryInd").equalTo("N"), "N")),
                         max(col("mndtryInd"))).as("mcaMandatoryFlg"),
        		 lit("N").as("mcaLockedFlg"),
        		 coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
                         max(when(col("fldColInd").equalTo("N"), "I")),
                         max(col("fldColInd"))).as("mcaFieldCollectionFlg")).groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));


Throws below exception

User class threw exception: org.apache.spark.sql.AnalysisException: expression 'propVal' is
neither present in the group by, nor is it an aggregate function. Add to group by or wrap
in first() if you don't care which value you get.

  was:
        DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
        		.join(moduleCodeDf.as("mc"), moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
        		.join(dictDfCharCode.as("dc"), dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
        		.join(dictDfIsAChar, dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
          		;

        joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
        		col("dc.propVal").as("mcaCtypeCode"),
        		max(col("mod.updatedDate")).as("mcaLastChangedDate"),
        		coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
                         max(when(col("mndtryInd").equalTo("N"), "N")),
                         max(col("mndtryInd"))).as("mcaMandatoryFlg"),
        		 lit("N").as("mcaLockedFlg"),
        		 coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
                         max(when(col("fldColInd").equalTo("N"), "I")),
                         max(col("fldColInd"))).as("mcaFieldCollectionFlg")
        		 ).groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));


Throws below exception

User class threw exception: org.apache.spark.sql.AnalysisException: expression 'propVal' is
neither present in the group by, nor is it an aggregate function. Add to group by or wrap
in first() if you don't care which value you get.


> Groupby for multiple columns not working
> ----------------------------------------
>
>                 Key: SPARK-19519
>                 URL: https://issues.apache.org/jira/browse/SPARK-19519
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.5.0
>            Reporter: Faisal
>            Priority: Blocker
>
>         DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
>         		.join(moduleCodeDf.as("mc"), moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
>         		.join(dictDfCharCode.as("dc"), dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
>         		.join(dictDfIsAChar, dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
>           		
>         joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
>         		col("dc.propVal").as("mcaCtypeCode"),
>         		max(col("mod.updatedDate")).as("mcaLastChangedDate"),
>         		coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
>                          max(when(col("mndtryInd").equalTo("N"), "N")),
>                          max(col("mndtryInd"))).as("mcaMandatoryFlg"),
>         		 lit("N").as("mcaLockedFlg"),
>         		 coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
>                          max(when(col("fldColInd").equalTo("N"), "I")),
>                          max(col("fldColInd"))).as("mcaFieldCollectionFlg")).groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));
> Throws below exception
> User class threw exception: org.apache.spark.sql.AnalysisException: expression 'propVal'
is neither present in the group by, nor is it an aggregate function. Add to group by or wrap
in first() if you don't care which value you get.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message