spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket R More <Aniket.M...@bitwiseglobal.com>
Subject Records processed metric for intermediate datasets
Date Thu, 08 Dec 2016 13:18:56 GMT
Hi ,


I have created a spark job using DATASET API. There is chain of operations performed until
the final result which is collected on HDFS.

But I also need to know how many records were read for each intermediate dataset. Lets say
I apply 5 operations on dataset (could be map, groupby etc), I need to know how many records
were there for each of 5 intermediate dataset. Can anybody suggest how this can be obtained
at dataset level. I guess I can find this out at task level (using listeners) but not sure
how to get it at dataset level.

Thanks

**************************************Disclaimer******************************************
This e-mail message and any attachments may contain confidential information and is for the
sole use of the intended recipient(s) only. Any views or opinions presented or implied are
solely those of the author and do not necessarily represent the views of BitWise. If you are
not the intended recipient(s), you are hereby notified that disclosure, printing, copying,
forwarding, distribution, or the taking of any action whatsoever in reliance on the contents
of this electronic information is strictly prohibited. If you have received this e-mail message
in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise
does not accept liability for any virus introduced by this e-mail or any attachments. ********************************************************************************************

Mime
View raw message