spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Triones,Deng(vip.com)" <triones.d...@vipshop.com>
Subject 答复: 答复: spark append files to the same hdfs dir issue for LeaseExpiredException
Date Wed, 01 Mar 2017 09:25:46 GMT
Thanks for your email

         My situation I, there is a hive table partitioned by five minutes, I want to write
data every 30s into the hdfs location where the table located. So I when the first  batch
is delay, then the next batch may have the chance to touch the _SUCCESS file at the same time.
Then may be crash for spark for exception.




发件人: Charles O. Bajomo [mailto:charles.bajomo@pretechconsulting.co.uk]
发送时间: 2017年2月28日 20:10
收件人: 邓刚[产品技术中心]
抄送: user; dev@spark.apache.org
主题: Re: 答复: spark append files to the same hdfs dir issue for LeaseExpiredException

Unless this is a managed hive table I would expect you can just MSCK REPAIR the table to get
the new partition. of course you will need to change the schema to reflect the new partition

Kind Regards

________________________________
From: "Triones,Deng(vip.com)" <triones.deng@vipshop.com<mailto:triones.deng@vipshop.com>>
To: "Charles O. Bajomo" <charles.bajomo@pretechconsulting.co.uk<mailto:charles.bajomo@pretechconsulting.co.uk>>
Cc: "user" <user@spark.apache.org<mailto:user@spark.apache.org>>, dev@spark.apache.org<mailto:dev@spark.apache.org>
Sent: Tuesday, 28 February, 2017 10:47:47
Subject: 答复: spark append files to the same hdfs dir issue for  LeaseExpiredException

I am writing data to hdfs file, also the hdfs dir is a hive partition file dir. Hive does
not support sub dirs.. for example my partition folder is ***/dt=20170224/hm=1400  that means
I need to write all the data between 1400 to 1500 to the same folder.

发件人: Charles O. Bajomo [mailto:charles.bajomo@pretechconsulting.co.uk]
发送时间: 2017年2月28日 18:04
收件人: 邓刚[产品技术中心]
抄送: user; dev@spark.apache.org<mailto:dev@spark.apache.org>
主题: Re: spark append files to the same hdfs dir issue for LeaseExpiredException


I see this problem as well with the _temporary directory but from what I have been able to
gather, there is no way around it in that situation apart from making sure all reducers write
to different folders. In the past I partitioned by executor id. I don't know if this is the
best way though.

Kind Regards

________________________________
From: "Triones,Deng(vip.com)" <triones.deng@vipshop.com<mailto:triones.deng@vipshop.com>>
To: "user" <user@spark.apache.org<mailto:user@spark.apache.org>>, dev@spark.apache.org<mailto:dev@spark.apache.org>
Sent: Tuesday, 28 February, 2017 09:35:00
Subject: spark append files to the same hdfs dir issue for  LeaseExpiredException

Hi dev and users

         Now  I am running spark  streaming , (spark version 2.0.2)  to write file to hdfs.
When my spark.streaming.concurrentJobs  is more than one. Like 20.
I meet the exception as below.

         We know that when the batch finished, there will be a _SUCCESS file.
As I guess my spark application, if one batch is slow, and the another one run at the same
time,  two spark streaming batch may be try to make use of the _SUCCESS file at the same time.
So the error as below happened

Anyone knows that whether I am right. Or any suggestions to avoid this problem?



Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on  ***********/dt=20170224/hm=1400
/_SUCCESS (inode 17483293037): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_******_*****,
pendingcreates: 1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3362)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3450)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3420)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:691)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
                   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy28.complete(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:404)
        at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy29.complete(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2116)
        at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:319)
        at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
        at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
        ... 40 more


Thanks

Triones


本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
This communication is intended only for the addressee(s) and may contain information that
is privileged and confidential. You are hereby notified that, if you are not an intended recipient
listed above, or an authorized employee or agent of an addressee of this communication responsible
for delivering e-mail messages to an intended recipient, any dissemination, distribution or
reproduction of this communication (including any attachments hereto) is strictly prohibited.
If you have received this communication in error, please notify us immediately by a reply
e-mail addressed to the sender and permanently delete the original e-mail communication and
any attachments from all storage devices without making or otherwise retaining a copy.
-->
本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
This communication is intended only for the addressee(s) and may contain information that
is privileged and confidential. You are hereby notified that, if you are not an intended recipient
listed above, or an authorized employee or agent of an addressee of this communication responsible
for delivering e-mail messages to an intended recipient, any dissemination, distribution or
reproduction of this communication (including any attachments hereto) is strictly prohibited.
If you have received this communication in error, please notify us immediately by a reply
e-mail addressed to the sender and permanently delete the original e-mail communication and
any attachments from all storage devices without making or otherwise retaining a copy.
-->
本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
This communication is intended only for the addressee(s) and may contain information that
is privileged and confidential. You are hereby notified that, if you are not an intended recipient
listed above, or an authorized employee or agent of an addressee of this communication responsible
for delivering e-mail messages to an intended recipient, any dissemination, distribution or
reproduction of this communication (including any attachments hereto) is strictly prohibited.
If you have received this communication in error, please notify us immediately by a reply
e-mail addressed to the sender and permanently delete the original e-mail communication and
any attachments from all storage devices without making or otherwise retaining a copy.
Mime
View raw message