spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bolshakov <bolshakov.de...@gmail.com>
Subject Re: Append more files to existing partitioned data
Date Sun, 18 Mar 2018 08:30:20 GMT
Please checkout.

org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand


and

org.apache.spark.sql.execution.datasources.WriteRelation


I guess it's managed by

job.getConfiguration.set(DATASOURCE_WRITEJOBUUID, uniqueWriteJobId.toString)


On 17 March 2018 at 20:46, Serega Sheypak <serega.sheypak@gmail.com> wrote:

> Hi Denis, great to see you here :)
> It works, thanks!
>
> Do you know how spark generates datafile names?  names look like part-0000
> with uuid appended after
>
> part-00000-124a8c43-83b9-44e1-a9c4-dcc8676cdb99.c000.snappy.parquet
>
>
>
>
> 2018-03-17 14:15 GMT+01:00 Denis Bolshakov <bolshakov.denis@gmail.com>:
>
>> Hello Serega,
>>
>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>
>> Please try SaveMode.Append option. Does it work for you?
>>
>>
>> сб, 17 мар. 2018 г., 15:19 Serega Sheypak <serega.sheypak@gmail.com>:
>>
>>> Hi, I', using spark-sql to process my data and store result as parquet
>>> partitioned by several columns
>>>
>>> ds.write
>>>   .partitionBy("year", "month", "day", "hour", "workflowId")
>>>   .parquet("/here/is/my/dir")
>>>
>>>
>>> I want to run more jobs that will produce new partitions or add more
>>> files to existing partitions.
>>> What is the right way to do it?
>>>
>>
>


-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.denis@gmail.com

Mime
View raw message