spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junfeng Chen <darou...@gmail.com>
Subject Re: How to delete empty columns in df when writing to parquet?
Date Sun, 08 Apr 2018 02:28:57 GMT
Hi,
Thanks for explaining!


Regard,
Junfeng Chen

On Wed, Apr 4, 2018 at 7:43 PM, Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Hi,
>
> I do not think that in a columnar database it makes much of a difference.
> The amount of data that you will be parsing will not be much anyways.
>
> Regards,
> Gourav Sengupta
>
> On Wed, Apr 4, 2018 at 11:02 AM, Junfeng Chen <darouwan@gmail.com> wrote:
>
>> Our users ask for it....
>>
>>
>> Regard,
>> Junfeng Chen
>>
>> On Wed, Apr 4, 2018 at 5:45 PM, Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> Hi Junfeng,
>>>
>>> can I ask why it is important to remove the empty column?
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Tue, Apr 3, 2018 at 4:28 AM, Junfeng Chen <darouwan@gmail.com> wrote:
>>>
>>>> I am trying to read data from kafka and writing them in parquet format
>>>> via Spark Streaming.
>>>> The problem is, the data from kafka are in variable data structure. For
>>>> example, app one has columns A,B,C, app two has columns B,C,D. So the data
>>>> frame I read from kafka has all columns ABCD. When I decide to write the
>>>> dataframe to parquet file partitioned with app name,
>>>> the parquet file of app one also contains columns D, where the columns
>>>> D is empty and it contains no data actually. So how to filter the empty
>>>> columns when I writing dataframe to parquet?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Regard,
>>>> Junfeng Chen
>>>>
>>>
>>>
>>
>

Mime
View raw message