spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Berman <igor.ber...@gmail.com>
Subject Re: Imported CSV file content isn't identical to the original file
Date Sun, 07 Feb 2016 09:42:29 GMT
show has argument of truncate
pass false so it wont truncate your results

On 7 February 2016 at 11:01, SLiZn Liu <sliznmailbox@gmail.com> wrote:

> Plus, I’m using *Spark 1.5.2*, with *spark-csv 1.3.0*. Also tried
> HiveContext, but the result is exactly the same.
> ​
>
> On Sun, Feb 7, 2016 at 3:44 PM SLiZn Liu <sliznmailbox@gmail.com> wrote:
>
>> Hi Spark Users Group,
>>
>> I have a csv file to analysis with Spark, but I’m troubling with
>> importing as DataFrame.
>>
>> Here’s the minimal reproducible example. Suppose I’m having a
>> *10(rows)x2(cols)* *space-delimited csv* file, shown as below:
>>
>> 1446566430 2015-11-04<SP>00:00:30
>> 1446566430 2015-11-04<SP>00:00:30
>> 1446566430 2015-11-04<SP>00:00:30
>> 1446566430 2015-11-04<SP>00:00:30
>> 1446566430 2015-11-04<SP>00:00:30
>> 1446566431 2015-11-04<SP>00:00:31
>> 1446566431 2015-11-04<SP>00:00:31
>> 1446566431 2015-11-04<SP>00:00:31
>> 1446566431 2015-11-04<SP>00:00:31
>> 1446566431 2015-11-04<SP>00:00:31
>>
>> the <SP> in column 2 represents sub-delimiter within that column, and
>> this file is stored on HDFS, let’s say the path is hdfs:///tmp/1.csv
>>
>> I’m using *spark-csv* to import this file as Spark *DataFrame*:
>>
>> sqlContext.read.format("com.databricks.spark.csv")
>>         .option("header", "false") // Use first line of all files as header
>>         .option("inferSchema", "false") // Automatically infer data types
>>         .option("delimiter", " ")
>>         .load("hdfs:///tmp/1.csv")
>>         .show
>>
>> Oddly, the output shows only a part of each column:
>>
>> [image: Screenshot from 2016-02-07 15-27-51.png]
>>
>> and even the boundary of the table wasn’t shown correctly. I also used
>> the other way to read csv file, by sc.textFile(...).map(_.split(" "))
>> and sqlContext.createDataFrame, and the result is the same. Can someone
>> point me out where I did it wrong?
>>
>> —
>> BR,
>> Todd Leo
>> ​
>>
>

Mime
View raw message