spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiří Syrový <...@hkfree.org>
Subject Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher
Date Wed, 28 Mar 2018 12:20:53 GMT
Quick comment:

Excel CSV (very special case though) supports arrays in CSV using "\n"
inside quotes, but you have to use as EOL for the row "\r\n" (Windows EOL).

Cheers,
Jiri

2018-03-28 14:14 GMT+02:00 Yong Zhang <java8964@hotmail.com>:

> Your dataframe has array data type, which is NOT supported by CSV. How csv
> file can include array or other nest structure?
>
>
> If you want your data to be human readable text, write out as json in your
> case then.
>
>
> Yong
>
>
> ------------------------------
> *From:* Mina Aslani <aslanimina@gmail.com>
> *Sent:* Wednesday, March 28, 2018 12:22 AM
> *To:* naresh Goud
> *Cc:* user @spark
> *Subject:* Re: java.lang.UnsupportedOperationException: CSV data source
> does not support struct/ERROR RetryingBlockFetcher
>
> Hi Naresh,
>
> Thank you for the quick response, appreciate it.
> Removing the option("header","true") and trying
>
> df = spark.read.parquet("test.parquet"), now can read the parquet works.
> However, I would like to find a way to have the data in csv/readable.
> still I cannot save df as csv as it throws.
> ava.lang.UnsupportedOperationException: CSV data source does not support
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
> data type.
>
> Any idea?
>
>
> Best regards,
>
> Mina
>
>
> On Tue, Mar 27, 2018 at 10:51 PM, naresh Goud <nareshgoud.dulam@gmail.com>
> wrote:
>
> In case of storing as parquet file I don’t think it requires header.
> option("header","true")
>
> Give a try by removing header option and then try to read it.  I haven’t
> tried. Just a thought.
>
> Thank you,
> Naresh
>
>
> On Tue, Mar 27, 2018 at 9:47 PM Mina Aslani <aslanimina@gmail.com> wrote:
>
> Hi,
>
>
> I am using pyspark. To transform my sample data and create model, I use
> stringIndexer and OneHotEncoder.
>
>
> However, when I try to write data as csv using below command
>
> df.coalesce(1).write.option("header","true").mode("overwrite
> ").csv("output.csv")
>
>
> I get UnsupportedOperationException
>
> java.lang.UnsupportedOperationException: CSV data source does not support
> struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
> data type.
>
> Therefore, to save data and avoid getting the error I use
>
>
> df.coalesce(1).write.option("header","true").mode("overwrite
> ").save("output")
>
>
> The above command saves data but it's in parquet format.
> How can I read parquet file and convert to csv to observe the data?
>
> When I use
>
> df = spark.read.parquet("1.parquet"), it throws:
>
> ERROR RetryingBlockFetcher: Exception while beginning fetch of 1
> outstanding blocks
>
> Your input is appreciated.
>
>
> Best regards,
>
> Mina
>
>
>
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>
>

Mime
View raw message