spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amrit Jangid <amrit.jan...@goibibo.com>
Subject Re: Data frame writing
Date Fri, 13 Jan 2017 06:12:36 GMT
Hi Rajendra,

It says your directory is not empty *s3n://**buccketName/cip/daily_date.*

Try to use save *mode. eg *

            df.write.mode(SaveMode.Overwrite).partitionBy("date").f
ormat("com.databricks.spark.csv").option("delimiter", "#").option("codec", "
org.apache.hadoop.io.compress.GzipCodec").save("s3n://buccketName/cip/daily_date"
)

 Hope it helps.

Regards
Amrit



On Fri, Jan 13, 2017 at 11:32 AM, Rajendra Bhat <rajhalkere@gmail.com>
wrote:

> Hi team,
>
> I am reading N number of csv and writing file based date partition. date
> is one column, it has integer value(ex 20170101)
>
>
>          val df = spark.read
>         .format("com.databricks.spark.csv")
>         .schema(schema)
>         .option("delimiter","#")
>         .option("nullValue","")
>         .option("treatEmptyValuesAsNulls","true")
>         .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
>
>         .load(filename)
>             df.write.partitionBy("date").format("com.databricks.spark.csv").option("delimiter",
> "#").option("codec", "org.apache.hadoop.io.compress
> .GzipCodec").save("s3n://buccketName/cip/daily_date" )
>
> above code troughs bellow error, in middle of execution.
> s3n://buccketName/cip/daily_date empty location while intilize job.
>
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: s3n://<bucketname>/cip/daily_date/date=20110418/part-r-00082-912033b1-a278-46a8-bf8d-0f97f493e3d8.csv.gz
> 	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:405)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:894)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:791)
> 	at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
> 	at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:191)
> 	at org.apache.spark.sql.execution.datasources.csv.CSVOutputWriterFactory.newInstance(CSVRelation.scala:169)
> 	at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
>
>  ... 14 more
>
> Please suggest why this error is coming and suggest solution
>
> Thanks and
> Regards
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>



-- 

Regards,
Amrit
Data Team

Mime
View raw message