spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: SparkStreming logical plan leaf nodes is not equal pysical plan leaf nodes and streaming metrics cannot be reported.
Date Wed, 23 Oct 2019 22:10:56 GMT
Sorry I haven't checked the details on SPARK-24050. Looks like it was only
resolved with DSv2 sources, and there're some streaming sources still using
DSv1.
File stream source is one of the case, so SPARK-24050 may not help here. I
guess that was technical reason to only dealt with DSv2, so I'm not sure
there's a good way to deal with this.

Hopefully file stream source seems to be migrated to DSv2 in Spark 3.0, so
Spark 3.0 would help solving the problem.

On Wed, Oct 23, 2019 at 11:21 PM Reminia Scarlet <reminia.scarlet@gmail.com>
wrote:

> @Jungtaek
> I'm using  Spark 2.4 (HDI 4.0)  in Azure.
> Maybe there are other corner cases not taking into consideration.
> Also I will decompile the spark jar from Azure to check the source code .
>
> On Wed, Oct 23, 2019 at 9:39 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
> wrote:
>
>> Which version of Spark you are using?
>> I guess there was relevant issue SPARK-24050 [1] which was fixed in Spark
>> 2.4.0 so you may want to check the latest version out and try if you use
>> lower version.
>>
>> - Jungtaek Lim (HeartSaVioR)
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-24050
>>
>> On Wed, Oct 23, 2019 at 9:57 PM Reminia Scarlet <
>> reminia.scarlet@gmail.com> wrote:
>>
>>> Hi all:
>>>  I use StreamingQueryListener to report batch inputRecordsNum as metrics.
>>>  But the numInputRows is aways 0. And the debug log  in
>>> MicroBatchExecution.scala said:
>>>
>>>  2019-10-23 06:56:05 WARN  MicroBatchExecution:66 - Could not report metrics
as number leaves in trigger logical plan did not match that of the execution plan:
>>>
>>>  And this causes num input rows by sources always 0 from below codes in ProgressReporter.scala
when number of leaves size not matches in logical plan and execution plan.
>>>
>>> [image: image.png]
>>> Attached the output logical plan && physical plan leaves. I think there
might be some bugs. Seems LogicalRDD is duplicate as Relation in the logical plan.
>>> And counting twice as leaf.If we remove the LogcialRDD, leave size should be
the same.
>>>
>>> [image: image.png]
>>> [image: image.png]
>>>
>>> Can anyone help? Thx very much.
>>>
>>>

Mime
View raw message