spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?
Date Mon, 01 May 2017 18:26:20 GMT
Yeah, seems reasonable.

On Mon, May 1, 2017 at 12:40 PM, Jacek Laskowski <jacek@japila.pl> wrote:
> Hi,
>
> Thanks Cody and Michael! I didn't expect to get two answers so quickly and
> from THE brains behind spark - Kafka integration. #impressed
>
> Yes, Michael has nailed it. Using save's path was so natural to me after
> months with Spark that I was surprised to not have seen it instead of the
> custom and surely not very obvious topic.
>
> Imagine my day today when I'd discovered that I could use KafkaSource in
> batch queries and then suddenly found out about no support for path in save.
> I'm not faint-hearted so I survived :-)
>
> I think that change would make KafkaSource even cooler. Please add support
> if possible (and make it part of the upcoming 2.2.0, too!)
>
> Thanks.
>
> Jacek
>
> On 1 May 2017 7:26 p.m., "Michael Armbrust" <michael@databricks.com> wrote:
>>
>> He's just suggesting that since the DataStreamWriter start() method can
>> fill in an option named "path", we should make that a synonym for "topic".
>> Then you could do something like.
>>
>> df.writeStream.format("kafka").start("topic")
>>
>> Seems reasonable if people don't think that is confusing.
>>
>> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <cody@koeninger.org> wrote:
>>>
>>> I'm confused about what you're suggesting.  Are you saying that a
>>> Kafka sink should take a filesystem path as an option?
>>>
>>> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski <jacek@japila.pl> wrote:
>>> > Hi,
>>> >
>>> > I've just found out that KafkaSourceProvider supports topic option
>>> > that sets the Kafka topic to save a DataFrame to.
>>> >
>>> > You can also use topic column to assign rows to topics.
>>> >
>>> > Given the features, I've been wondering why "path" option is not
>>> > supported (even of least precedence) so when no topic column or option
>>> > are defined, save(path: String) would be the least priority.
>>> >
>>> > WDYT?
>>> >
>>> > It looks pretty trivial to support --> see KafkaSourceProvider at
>>> > lines [1] and [2] if I'm not mistaken.
>>> >
>>> > [1]
>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145
>>> > [2]
>>> > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163
>>> >
>>> > Pozdrawiam,
>>> > Jacek Laskowski
>>> > ----
>>> > https://medium.com/@jaceklaskowski/
>>> > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> > Follow me at https://twitter.com/jaceklaskowski
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message