spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Static partitioning in partitionBy()
Date Thu, 09 May 2019 03:23:54 GMT
Hi Burak,
Hurray!!!! so you made finally delta open source :)
I always thought of asking TD, is there any chance we could get the
streaming graphs back in the SPARK UI? It will just be wonderful.

Hi Shubham,
there are always easier way and super fancy way to solve problems,
filtering data before persisting is a simple way. Similarly handling data
skew in a simple way would be by using monotonically increasing id function
in spark with modulus operator. For the fancy way I am sure that someone in
the world will be working for mere mortals like me :)


Regards,
Gourav Sengupta





On Wed, May 8, 2019 at 1:41 PM Shubham Chaurasia <shubh.chaurasia@gmail.com>
wrote:

> Thanks
>
> On Wed, May 8, 2019 at 10:36 AM Felix Cheung <felixcheung_m@hotmail.com>
> wrote:
>
>> You could
>>
>> df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save
>>
>> It could get some data skew problem but might work for you
>>
>>
>>
>> ------------------------------
>> *From:* Burak Yavuz <brkyvz@gmail.com>
>> *Sent:* Tuesday, May 7, 2019 9:35:10 AM
>> *To:* Shubham Chaurasia
>> *Cc:* dev; user@spark.apache.org
>> *Subject:* Re: Static partitioning in partitionBy()
>>
>> It depends on the data source. Delta Lake (https://delta.io) allows you
>> to do it with the .option("replaceWhere", "c = c1"). With other file
>> formats, you can write directly into the partition directory
>> (tablePath/c=c1), but you lose atomicity.
>>
>> On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia <shubh.chaurasia@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Is there a way I can provide static partitions in partitionBy()?
>>>
>>> Like:
>>>
>>> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save
>>>
>>> Above code gives following error as it tries to find column `c=c1` in df.
>>>
>>> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not
>>> found in schema struct<a:string,b:string,c:string>;
>>>
>>> Thanks,
>>> Shubham
>>>
>>

Mime
View raw message