spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王晓龙/01111515 <roland8...@cmbchina.com>
Subject Is there a way to merge parquet small files?
Date Fri, 20 May 2016 03:50:34 GMT
I’m using a spark streaming program to store log message into parquet file every 10 mins.
Now, when I query the parquet, it usually takes hundreds of thousands of stages to compute
a single count.
I looked into the parquet file’s path and find a great amount of small files.

Do the small files caused the problem? Can I merge them, or is there a better way to solve
it?

Lots of thanks.

________________________________
此邮件内容仅代表发送者的个人观点和意见,与招商银行股份有限公司及其下属分支机构的观点和意见无关,招商银行股份有限公司及其下属分支机构不对此邮件内容承担任何责任。此邮件内容仅限收件人查阅,如误收此邮件请立即删除。

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Mime
View raw message