spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varsha Chandrashekar (JIRA)" <>
Subject [jira] [Created] (SPARK-24065) Issue with the property IgnoreLeadingWhiteSpace
Date Tue, 24 Apr 2018 09:01:00 GMT
Varsha Chandrashekar created SPARK-24065:

             Summary: Issue with the property IgnoreLeadingWhiteSpace
                 Key: SPARK-24065
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: Varsha Chandrashekar

"IgnoreLeadingWhiteSpace" property is not working properly for a corner case, Consider the
data below:
|   "A"  |   "Mark"   |   "US"   |
|   "B"   |   "Luke"   |   "UK"   |

Each cell conatins leadingWhiteSpaces and trailingWhiteSpaces, when i upload the dataset by
passing "ignoreTrailingWhiteSpace" as true, the trailing spaces are being trimmed which is
right. But, when i pass "ignoreLeadingWhiteSpace" as true it is not trimming the leading spaces.

The scenario was testes/executed in spark-shell. Refer the result below,

case 1: scala> var"com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar\\Desktop\\lds1.txt")
df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]

| Col1| Col2| Col3|
| "A" | "Mark" | "US" |
| "B" | "Luke" | "UK" |

case 2: scala> var"com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",true).option("ignoreTrailingWhiteSpace",false).load("C:\\Users\\vachandrashekar\\Desktop\\lds1.txt")
df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]

|Col1| Col2| Col3|
| A|Mark|US|
| B| Luke| UK|

case 3: scala> var"com.databricks.spark.csv").option("delimiter",",").option("qualifier","\"").option("escape","\\").option("header","true").option("inferSchema","true").option("ignoreLeadingWhiteSpace",false).option("ignoreTrailingWhiteSpace",true).load("C:\\Users\\vachandrashekar\\Desktop\\lds1.txt")
df: org.apache.spark.sql.DataFrame = [col1: string, Col2: string ... 1 more field]

| col1| Col2| Col3|
| "A"| "Mark"| "US"|
| "B"| "Luke"| "UK"|



Case 1 : Works fine, with "ignoreLeadingWhiteSpace" and "ignoreTrailingWhiteSpace" as false,
the data is previewed as in the file.


Case 2 : Not working!! with "ignoreLeadingWhiteSpace" as true and "ignoreTrailingWhiteSpace"
as false results in trimming trailing white spaces and retains leading white spaces. 

It does trim leading white space but only for two columns in the first row excluding the first
column in that row.


Case 3 : Works fine, with "ignoreLeadingWhiteSpace" as false and "ignoreTrailingWhiteSpace"
as true, only trailing white spaces have been trimmed and leading white spaces are retained.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message