spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaspar Muñoz <gmu...@datiobd.com>
Subject Re: spark-avro aliases incompatible
Date Tue, 07 Nov 2017 19:23:20 GMT
In the doc you refer:

// The Avro records get converted to Spark types, filtered, and// then
written back out as Avro recordsval df =
spark.read.avro("/tmp/episodes.avro")df.filter("doctor >
5").write.avro("/tmp/output")

Alternatively you can specify the format to use instead:
[image: Copy to clipboard]Copy

val df = spark.read
    .format("com.databricks.spark.avro")
    .load("/tmp/episodes.avro")

As far as I know  spark-avro is not built-in in spark 2.x. That is not the
problem, because also in that databricks doc said: *"At the moment, it
ignores docs, aliases and other properties present in the Avro file."*

Regards.


2017-11-06 22:29 GMT+01:00 Gourav Sengupta <gourav.sengupta@gmail.com>:

> Hi,
>
> I may be wrong about this, but when you are using format("....") you are
> basically using old SPARK classes, which still exists because of backward
> compatibility.
>
> Please refer to the following documentation to take advantage of the
> recent changes in SPARK: https://docs.databricks.com/spark/latest/
> data-sources/read-avro.html
>
> Kindly let us know how things are going on.
>
> Regards,
> Gourav Sengupta
>
> On Mon, Nov 6, 2017 at 8:04 PM, Gaspar Muñoz <gmunoz@datiobd.com> wrote:
>
>> Of course,
>>
>> right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0.
>> I've just uploaded a snippet https://gist.github.co
>> m/gasparms/5d0740bd61a500357e0230756be963e1
>>
>> Basically, my avro schema has a field with an alias and in the last part
>> of code spark-avro is not able to read old data with old name using the
>> alias.
>>
>> In spark-avro library Readme said that is not supported and I am asking
>> if any of you has a workaround or how do you manage schema evolution?
>>
>> Regards.
>>
>> 2017-11-05 20:13 GMT+01:00 Gourav Sengupta <gourav.sengupta@gmail.com>:
>>
>>> Hi Gaspar,
>>>
>>> can you please provide the details regarding the environment, versions,
>>> libraries and code snippets please?
>>>
>>> For example: SPARK version, OS, distribution, running on YARN, etc and
>>> all other details.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz <gmunoz@datiobd.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I use avro format to store historical due to avro schema evolution. I
>>>> manage external schemas and read  them using avroSchema option so we have
>>>> been able to add and delete columns.
>>>>
>>>> The problem is when I introduced aliases and Spark process didn't work
>>>> as expected and then I read in spark-avro library "At the moment, it
>>>> ignores docs, aliases and other properties present in the Avro file".
>>>>
>>>> How do you manage aliases and column renaming? Is there any workaround?
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards
>>>>
>>>> --
>>>> Gaspar Muñoz Soria
>>>>
>>>> Vía de las dos Castillas, 33
>>>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
>>>> Ática 4, 3ª Planta
>>>> 28224 Pozuelo de Alarcón, Madrid
>>>> Tel: +34 91 828 6473
>>>>
>>>
>>>
>>
>>
>> --
>> Gaspar Muñoz Soria
>>
>> Vía de las dos Castillas, 33
>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
>> Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473
>>
>
>


-- 
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473

Mime
View raw message