spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Databricks fails to read the csv file with blank line at the file header
Date Sun, 27 Mar 2016 01:25:06 GMT
Hi,

I have a standard csv file (saved as csv in HDFS) that has first line of
blank at the header
as follows

[blank line]
Date, Type, Description, Value, Balance, Account Name, Account Number
[blank line]
22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN
AE","'638585-60125663",

When I read this file using the following standard

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

it crashes.

java.util.NoSuchElementException
        at java.util.ArrayList$Itr.next(ArrayList.java:794)

 If I go and manually delete the first blank line it works OK

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

df: org.apache.spark.sql.DataFrame = [Date: string,  Type: string,
Description: string,  Value: double,  Balance: double,  Account Name:
string,  Account Number: string]

I can easily write a shell script to get rid of blank line. I was wondering
if databricks does have a flag to get rid of the first blank line in csv
file format?

P.S. If the file is stored as DOS text file, this problem goes away.

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Mime
View raw message