spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <AShaita...@nz.imshealth.com>
Subject Check if dataframe is empty
Date Tue, 07 Mar 2017 03:52:06 GMT
Hello!

I am pretty sure that I am asking something which has been already asked lots of times. However,
I cannot find the question in the mailing list archive.

The question is - I need to check whether dataframe is empty or not. I receive a dataframe
from 3rd party library and this dataframe can be potentially empty, but also can be really
huge - millions of rows. Thus, I want to avoid of doing some logic in case the dataframe is
empty. How can I efficiently check it?

Right now I am doing it in the following way:

private def isEmpty(df: Option[DataFrame]): Boolean = {
  df.isEmpty || (df.isDefined && df.get.limit(1).rdd.isEmpty())
}

But the performance is really slow for big dataframes. I would be grateful for any suggestions.

Thank you in advance.


Best regards,

Artem

________________________________
********************** IMPORTANT--PLEASE READ ************************ This electronic message,
including its attachments, is CONFIDENTIAL and may contain PROPRIETARY or LEGALLY PRIVILEGED
or PROTECTED information and is intended for the authorized recipient of the sender. If you
are not the intended recipient, you are hereby notified that any use, disclosure, copying,
or distribution of this message or any of the information included in it is unauthorized and
strictly prohibited. If you have received this message in error, please immediately notify
the sender by reply e-mail and permanently delete this message and its attachments, along
with any copies thereof, from all locations received (e.g., computer, mobile device, etc.).
Thank you. ************************************************************************

Mime
View raw message