spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhangliyun <kelly...@126.com>
Subject Please help view the problem of spark dynamic partition
Date Fri, 23 Aug 2019 07:43:19 GMT
Hi all:
  when i use spark dynamic partition feature , i met a problem about hdfs quota.  I found
that it is every easy to meet quota problem (exceed the max value of quota of directory)


I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which contains 584M records
and will insert it to a  partitioned table "bsl12.email_edge_lyh_partitioned2"
--select count(*) from bsl12.email_edge_lyh_mth1; --584652128
--INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION (link_crtd_date) SELECT
* FROM bsl12.email_edge_lyh_mth1;



when i viewed the temporary directory when sql running, i saw  multiple  file with link_crd_date=2018-01-01***,
I guess one record one temporary file.  as  there are 584M data in the unpartitioned table,
 is there any parameters for us to control the temporary file count  to avoid the quota problem.

```

 

133    hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-0112%3A35%3A29

137    hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
12%3A35%3A47

136    hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
12%3A38%3A23

132    hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
12%3A38%3A54

536    hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
12%3A40%3A01



```


Best Regards


Kelly Zhang
Mime
View raw message