spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: spark-avro aliases incompatible
Date Mon, 06 Nov 2017 21:29:23 GMT
Hi,

I may be wrong about this, but when you are using format("....") you are
basically using old SPARK classes, which still exists because of backward
compatibility.

Please refer to the following documentation to take advantage of the recent
changes in SPARK:
https://docs.databricks.com/spark/latest/data-sources/read-avro.html

Kindly let us know how things are going on.

Regards,
Gourav Sengupta

On Mon, Nov 6, 2017 at 8:04 PM, Gaspar Muñoz <gmunoz@datiobd.com> wrote:

> Of course,
>
> right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0.  I've
> just uploaded a snippet https://gist.github.com/gasparms/
> 5d0740bd61a500357e0230756be963e1
>
> Basically, my avro schema has a field with an alias and in the last part
> of code spark-avro is not able to read old data with old name using the
> alias.
>
> In spark-avro library Readme said that is not supported and I am asking if
> any of you has a workaround or how do you manage schema evolution?
>
> Regards.
>
> 2017-11-05 20:13 GMT+01:00 Gourav Sengupta <gourav.sengupta@gmail.com>:
>
>> Hi Gaspar,
>>
>> can you please provide the details regarding the environment, versions,
>> libraries and code snippets please?
>>
>> For example: SPARK version, OS, distribution, running on YARN, etc and
>> all other details.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz <gmunoz@datiobd.com> wrote:
>>
>>> Hi there,
>>>
>>> I use avro format to store historical due to avro schema evolution. I
>>> manage external schemas and read  them using avroSchema option so we have
>>> been able to add and delete columns.
>>>
>>> The problem is when I introduced aliases and Spark process didn't work
>>> as expected and then I read in spark-avro library "At the moment, it
>>> ignores docs, aliases and other properties present in the Avro file".
>>>
>>> How do you manage aliases and column renaming? Is there any workaround?
>>>
>>> Thanks in advance.
>>>
>>> Regards
>>>
>>> --
>>> Gaspar Muñoz Soria
>>>
>>> Vía de las dos Castillas, 33
>>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
>>> Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473
>>>
>>
>>
>
>
> --
> Gaspar Muñoz Soria
>
> Vía de las dos Castillas, 33
> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
> Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473
>

Mime
View raw message