spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yeikel valdes <em...@yeikel.com>
Subject What is the best way to take the top N entries from a hive table/data source?
Date Tue, 14 Apr 2020 06:35:30 GMT
When I use .limit() , the number of partitions for the returning dataframe is 1 which normally
fails most jobs.


val df = spark.sql("select * from table limit n")
df.write.parquet(....)




Thanks!






Mime
View raw message