spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1
Date Sun, 06 May 2018 22:39:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465307#comment-16465307
] 

Felix Cheung edited comment on SPARK-23291 at 5/6/18 10:38 PM:
---------------------------------------------------------------

actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, if I recall,
since the very beginning. and it is not a regression per se.

either users don't care since it has never been reported, or (most likely) users have adopted
to the behavior in which case we will break existing jobs in a patch release.

anyway, it's just my 2c.


was (Author: felixcheung):
actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, if I recall,
since the very beginning.

either users don't care since it has never been reported, or (most likely) users have adopted
to the behavior in which case we will break existing jobs in a patch release.

anyway, it's just my 2c.

> SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr"
is giving wrong result  when the position is greater than 1
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23291
>                 URL: https://issues.apache.org/jira/browse/SPARK-23291
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>            Reporter: Narendra
>            Assignee: Liang-Chi Hsieh
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Defect Description :
> -----------------------------
> For example ,an input string "2017-12-01" is read into a SparkR dataframe "df" with column
name "col1".
>  The target is to create a a new column named "col2" with the value "12" which is inside
the string ."12" can be extracted with "starting position" as "6" and "Ending position" as
"7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 'starting position',
is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates the 'ending
position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be the "actual
position + 1"
> Expected behavior :
> ----------------------------
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> -----------
>  This defect is observed with only when the starting position is greater than 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message