spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject Re: Multiple Kafka Spark Streaming Dataframe Join query
Date Wed, 14 Mar 2018 18:49:02 GMT
Hi,

I'm yet to.

Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package allows
Python? I read somewhere, as of now Scala and Java are the languages to be
used.

Please correct me if am wrong.

Thanks,
Aakash.

On 14-Mar-2018 8:24 PM, "Georg Heiler" <georg.kf.heiler@gmail.com> wrote:

> Did you try spark 2.3 with structured streaming? There watermarking and
> plain sql might be really interesting for you.
> Aakash Basu <aakash.spark.raj@gmail.com> schrieb am Mi. 14. März 2018 um
> 14:57:
>
>> Hi,
>>
>>
>>
>> *Info (Using):Spark Streaming Kafka 0.8 package*
>>
>> *Spark 2.2.1*
>> *Kafka 1.0.1*
>>
>> As of now, I am feeding paragraphs in Kafka console producer and my
>> Spark, which is acting as a receiver is printing the flattened words, which
>> is a complete RDD operation.
>>
>> *My motive is to read two tables continuously (being updated) as two
>> distinct Kafka topics being read as two Spark Dataframes and join them
>> based on a key and produce the output. *(I am from Spark-SQL background,
>> pardon my Spark-SQL-ish writing)
>>
>> *It may happen, the first topic is receiving new data 15 mins prior to
>> the second topic, in that scenario, how to proceed? I should not lose any
>> data.*
>>
>> As of now, I want to simply pass paragraphs, read them as RDD, convert to
>> DF and then join to get the common keys as the output. (Just for R&D).
>>
>> Started using Spark Streaming and Kafka today itself.
>>
>> Please help!
>>
>> Thanks,
>> Aakash.
>>
>

Mime
View raw message