spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王晓龙/01111515 <>
Subject Is there a way to merge parquet small files?
Date Fri, 20 May 2016 03:50:34 GMT
I’m using a spark streaming program to store log message into parquet file every 10 mins.
Now, when I query the parquet, it usually takes hundreds of thousands of stages to compute
a single count.
I looked into the parquet file’s path and find a great amount of small files.

Do the small files caused the problem? Can I merge them, or is there a better way to solve

Lots of thanks.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message