spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Torres <joseph.tor...@databricks.com>
Subject Re: Partitions at DataSource API V2
Date Wed, 13 Mar 2019 16:20:15 GMT
The reader necessarily knows the number of partitions, since it's
responsible for generating its output partitions in the first place. I
won't speak for everyone, but it would make sense to me to pass in a
Partitioning instance to the writer, since it's already part of the v2
interface through the reader's SupportsReportPartitioning.

I don't think we can expose execution plans to the data source v2
interface; the exact Java structure of execution plans isn't stable across
even maintenance releases. Even if we could, I don't really see what the
use case would be - what information does the writer need that can't be
made available through either the input data or the input partitioning?
(The built-in Kafka sink, for example, handles metadata such as topic
switching by just accepting topic name as a column along with the data.)

On Wed, Mar 13, 2019 at 1:39 AM JOAQUIN GUANTER GONZALBEZ <
joaquin.guantergonzalbez@telefonica.com> wrote:

> I'd like to bump this. I agree with Carlos that there is very little
> information at the DataSoruceWrite/DataSourceReader level. To me, ideally,
> the DataSourceWriter/Reader should have as much information as possible.
> Not only the number of partitions, but also ideally the whole execution
> plan.
>
> This would not only enable things like automatic creation of kafka topics
> with the correct number of partitions (like Carlos mentioned), but it would
> also allow advanced DataSources that, for example, analyze the execution
> plan to choose the correct parameters to implement differential privacy.
>
> CC'ing in Ryan, since he is leading the DataSourceV2 workgroup (sorry I
> can't joint the sync meetings, but I'm in CET time and the time logictics
> of that meeting don't work for Europe).
>
> Ryan, do you think it would be a good idea to provide extra information at
> the DataSourceWriter/Reader level to enable more advanced datasources?
> Would a PR contribution with these changed be a welcome addition?
>
> Thanks,
> Ximo
>
> -----Mensaje original-----
> De: CARLOS DEL PRADO MOTA <carlos.delpradomota@telefonica.com>
> Enviado el: jueves, 7 de marzo de 2019 10:19
> Para: dev@spark.apache.org
> Asunto: Partitions at DataSource API V2
>
> Hello, I’m Carlos del Prado, developer at Telefonica.
>
> We are working with Spark's DataSource API V2 building a custom Kafka
> connector that creates the topic upon write. In order to do that, we need
> to know the number of partitions before writing data in each partition, at
> the DataSourceWriter level.
>
> Is there any way for us do that?
>
> King regards,
> Carlos.
>
> ________________________________
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>

Mime
View raw message