spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" <yur...@gmail.com>
Subject Re: Spark Structured streaming - Kakfa - slowness with query 0
Date Wed, 21 Oct 2020 16:38:49 GMT
I think MaxOffsetsPerTrigger in Spark + Kafka integration docs would meet your requirement

Отправлено с iPhone

> 21 окт. 2020 г., в 12:36, KhajaAsmath Mohammed <mdkhajaasmath@gmail.com> написал(а):
> 
> Thanks. Do we have option to limit number of records ? Like process only 10000 or
the property we pass ? This way we can handle the amount of the data for batches that we need
. 
> 
> Sent from my iPhone
> 
>>> On Oct 21, 2020, at 12:11 AM, lec ssmi <shicheng31604@gmail.com> wrote:
>>> 
>> 
>>     Structured streaming's  bottom layer also uses a micro-batch mechanism. It seems
that the first batch is slower than  the latter, I also often encounter this problem. It feels
related to the division of batches. 
>>    Other the other hand, spark's batch size is usually bigger than flume transaction
bache size. 
>> 
>> 
>> KhajaAsmath Mohammed <mdkhajaasmath@gmail.com> 于2020年10月21日周三 下午12:19写道:
>>> Yes. Changing back to latest worked but I still see the slowness compared to
flume. 
>>> 
>>> Sent from my iPhone
>>> 
>>>>> On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31604@gmail.com>
wrote:
>>>>> 
>>>> 
>>>> Do you start your application  with  chasing the early Kafka data  ? 
>>>> 
>>>> Lalwani, Jayesh <jlalwani@amazon.com.invalid> 于2020年10月21日周三
上午2:19写道:
>>>>> Are you getting any output? Streaming jobs typically run forever, and
keep processing data as it comes in the input. If a streaming job is working well, it will
typically generate output at a certain cadence
>>>>> 
>>>>>  
>>>>> 
>>>>> From: KhajaAsmath Mohammed <mdkhajaasmath@gmail.com>
>>>>> Date: Tuesday, October 20, 2020 at 1:23 PM
>>>>> To: "user @spark" <user@spark.apache.org>
>>>>> Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with
query 0
>>>>> 
>>>>>  
>>>>> 
>>>>> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and know the content is
safe.
>>>>> 
>>>>>  
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>  
>>>>> 
>>>>> I have started using spark structured streaming for reading data from
kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job
is running forever. any suggestions for this please? 
>>>>> 
>>>>>  
>>>>> 
>>>>> <image001.png>
>>>>>  
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Asmath

Mime
View raw message