spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: Append more files to existing partitioned data
Date Sun, 18 Mar 2018 15:42:41 GMT
Thanks a lot!

2018-03-18 9:30 GMT+01:00 Denis Bolshakov <bolshakov.denis@gmail.com>:

> Please checkout.
>
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand
>
>
> and
>
> org.apache.spark.sql.execution.datasources.WriteRelation
>
>
> I guess it's managed by
>
> job.getConfiguration.set(DATASOURCE_WRITEJOBUUID, uniqueWriteJobId.toString)
>
>
> On 17 March 2018 at 20:46, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
>> Hi Denis, great to see you here :)
>> It works, thanks!
>>
>> Do you know how spark generates datafile names?  names look like
>> part-0000 with uuid appended after
>>
>> part-00000-124a8c43-83b9-44e1-a9c4-dcc8676cdb99.c000.snappy.parquet
>>
>>
>>
>>
>> 2018-03-17 14:15 GMT+01:00 Denis Bolshakov <bolshakov.denis@gmail.com>:
>>
>>> Hello Serega,
>>>
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> Please try SaveMode.Append option. Does it work for you?
>>>
>>>
>>> сб, 17 мар. 2018 г., 15:19 Serega Sheypak <serega.sheypak@gmail.com>:
>>>
>>>> Hi, I', using spark-sql to process my data and store result as parquet
>>>> partitioned by several columns
>>>>
>>>> ds.write
>>>>   .partitionBy("year", "month", "day", "hour", "workflowId")
>>>>   .parquet("/here/is/my/dir")
>>>>
>>>>
>>>> I want to run more jobs that will produce new partitions or add more
>>>> files to existing partitions.
>>>> What is the right way to do it?
>>>>
>>>
>>
>
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.denis@gmail.com
>

Mime
View raw message