spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: Partitions at DataSource API V2
Date Wed, 13 Mar 2019 08:32:57 GMT
I'd like to bump this. I agree with Carlos that there is very little information at the DataSoruceWrite/DataSourceReader
level. To me, ideally, the DataSourceWriter/Reader should have as much information as possible.
Not only the number of partitions, but also ideally the whole execution plan.

This would not only enable things like automatic creation of kafka topics with the correct
number of partitions (like Carlos mentioned), but it would also allow advanced DataSources
that, for example, analyze the execution plan to choose the correct parameters to implement
differential privacy.

CC'ing in Ryan, since he is leading the DataSourceV2 workgroup (sorry I can't joint the sync
meetings, but I'm in CET time and the time logictics of that meeting don't work for Europe).

Ryan, do you think it would be a good idea to provide extra information at the DataSourceWriter/Reader
level to enable more advanced datasources? Would a PR contribution with these changed be a
welcome addition?


-----Mensaje original-----
Enviado el: jueves, 7 de marzo de 2019 10:19
Asunto: Partitions at DataSource API V2

Hello, I’m Carlos del Prado, developer at Telefonica.

We are working with Spark's DataSource API V2 building a custom Kafka connector that creates
the topic upon write. In order to do that, we need to know the number of partitions before
writing data in each partition, at the DataSourceWriter level.

Is there any way for us do that?

King regards,


Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede contener información
privilegiada o confidencial y es para uso exclusivo de la persona o entidad de destino. Si
no es usted. el destinatario indicado, queda notificado de que la lectura, utilización, divulgación
y/o copia sin autorización puede estar prohibida en virtud de la legislación vigente. Si
ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta
misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential information
intended only for the use of the individual or entity named above. If the reader of this message
is not the intended recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited. If you have received this transmission
in error, do not read it. Please immediately reply to the sender that you have received this
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode conter informação
privilegiada ou confidencial e é para uso exclusivo da pessoa ou entidade de destino. Se
não é vossa senhoria o destinatário indicado, fica notificado de que a leitura, utilização,
divulgação e/ou cópia sem autorização pode estar proibida em virtude da legislação
vigente. Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente
por esta mesma via e proceda a sua destruição
View raw message