spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From immerrr again <>
Subject pyspark: dataframe.take is slow
Date Tue, 05 Jul 2016 09:27:58 GMT
Hi all!

I'm having a strange issue with pyspark 1.6.1. I have a dataframe,

    df ='/path/to/data')

whose "df.take(10)" is really slow, apparently scanning the whole
dataset to take the first ten rows. "df.first()" works fast, as does

I have found that
should have fixed it in 1.6.0, but it has not. What am i doing wrong
here and how can I fix this?


To unsubscribe e-mail:

View raw message