spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yeikel <>
Subject Re: What is the best way to take the top N entries from a hive table/data source?
Date Wed, 22 Apr 2020 04:17:08 GMT
Hi Zhang. Thank you for your response 

While your answer clarifies my confusion with `CollectLimit` it still does
not clarify what is the recommended way to extract large amounts of data
(but not all the records) from a source and maintain a high level of

For example , at some instances trying to extract 1 million records from a
table with over 100M records , I see my cluster using 1-2 cores out of the
hundreds that I have available. 

Sent from:

To unsubscribe e-mail:

View raw message