spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
Subject Re: Spark Structured streaming - Kakfa - slowness with query 0
Date Wed, 21 Oct 2020 09:35:42 GMT
Thanks. Do we have option to limit number of records ? Like process only 10000 or the property
we pass ? This way we can handle the amount of the data for batches that we need . 

Sent from my iPhone

> On Oct 21, 2020, at 12:11 AM, lec ssmi <shicheng31604@gmail.com> wrote:
> 
> 
>     Structured streaming's  bottom layer also uses a micro-batch mechanism. It seems
that the first batch is slower than  the latter, I also often encounter this problem. It feels
related to the division of batches. 
>    Other the other hand, spark's batch size is usually bigger than flume transaction
bache size. 
> 
> 
> KhajaAsmath Mohammed <mdkhajaasmath@gmail.com> 于2020年10月21日周三 下午12:19写道:
>> Yes. Changing back to latest worked but I still see the slowness compared to flume.

>> 
>> Sent from my iPhone
>> 
>>>> On Oct 20, 2020, at 10:21 PM, lec ssmi <shicheng31604@gmail.com> wrote:
>>>> 
>>> 
>>> Do you start your application  with  chasing the early Kafka data  ? 
>>> 
>>> Lalwani, Jayesh <jlalwani@amazon.com.invalid> 于2020年10月21日周三
上午2:19写道:
>>>> Are you getting any output? Streaming jobs typically run forever, and keep
processing data as it comes in the input. If a streaming job is working well, it will typically
generate output at a certain cadence
>>>> 
>>>>  
>>>> 
>>>> From: KhajaAsmath Mohammed <mdkhajaasmath@gmail.com>
>>>> Date: Tuesday, October 20, 2020 at 1:23 PM
>>>> To: "user @spark" <user@spark.apache.org>
>>>> Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query
0
>>>> 
>>>>  
>>>> 
>>>> CAUTION: This email originated from outside of the organization. Do not click
links or open attachments unless you can confirm the sender and know the content is safe.
>>>> 
>>>>  
>>>> 
>>>> Hi,
>>>> 
>>>>  
>>>> 
>>>> I have started using spark structured streaming for reading data from kaka
and the job is very slow. Number of output rows keeps increasing in query 0 and the job is
running forever. any suggestions for this please? 
>>>> 
>>>>  
>>>> 
>>>> <image001.png>
>>>>  
>>>> 
>>>> Thanks,
>>>> 
>>>> Asmath

Mime
View raw message