spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?
Date Mon, 01 May 2017 17:40:40 GMT
Hi,

Thanks Cody and Michael! I didn't expect to get two answers so quickly and
from THE brains behind spark - Kafka integration. #impressed

Yes, Michael has nailed it. Using save's path was so natural to me after
months with Spark that I was surprised to not have seen it instead of the
custom and surely not very obvious topic.

Imagine my day today when I'd discovered that I could use KafkaSource in
batch queries and then suddenly found out about no support for path in
save. I'm not faint-hearted so I survived :-)

I think that change would make KafkaSource even cooler. Please add support
if possible (and make it part of the upcoming 2.2.0, too!)

Thanks.

Jacek

On 1 May 2017 7:26 p.m., "Michael Armbrust" <michael@databricks.com> wrote:

> He's just suggesting that since the DataStreamWriter start() method can
> fill in an option named "path", we should make that a synonym for "topic".
> Then you could do something like.
>
> df.writeStream.format("kafka").start("topic")
>
> Seems reasonable if people don't think that is confusing.
>
> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <cody@koeninger.org> wrote:
>
>> I'm confused about what you're suggesting.  Are you saying that a
>> Kafka sink should take a filesystem path as an option?
>>
>> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <jacek@japila.pl> wrote:
>> > Hi,
>> >
>> > I've just found out that KafkaSourceProvider supports topic option
>> > that sets the Kafka topic to save a DataFrame to.
>> >
>> > You can also use topic column to assign rows to topics.
>> >
>> > Given the features, I've been wondering why "path" option is not
>> > supported (even of least precedence) so when no topic column or option
>> > are defined, save(path: String) would be the least priority.
>> >
>> > WDYT?
>> >
>> > It looks pretty trivial to support --> see KafkaSourceProvider at
>> > lines [1] and [2] if I'm not mistaken.
>> >
>> > [1] https://github.com/apache/spark/blob/master/external/kafka-
>> 0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/Kafk
>> aSourceProvider.scala#L145
>> > [2] https://github.com/apache/spark/blob/master/external/kafka-
>> 0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/Kafk
>> aSourceProvider.scala#L163
>> >
>> > Pozdrawiam,
>> > Jacek Laskowski
>> > ----
>> > https://medium.com/@jaceklaskowski/
>> > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>> > Follow me at https://twitter.com/jaceklaskowski
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>

Mime
View raw message