spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: orgin of error
Date Sun, 15 May 2016 16:42:26 GMT
Adding back user@spark

>From namenode audit log, you should be able to find out who deleted
part-r-00163-e94fa2c5-aa0d-4a08-b4c3-9fe7087ca493.gz.parquet and when.

There might be other errors in the executor log which would give you more
clue.

On Sun, May 15, 2016 at 9:08 AM, pseudo oduesp <pseudo20140@gmail.com>
wrote:

> ERROR hdfs.DFSClient: Failed to close inode 49551738
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /user/data2015/df_join2015/_temporary/0/_temporary/attempt_201605151649_0019_m_000163_0/part-r-00163-e94fa2c5-aa0d-4a08-b4c3-9fe7087ca493.gz.parquet
> (inode 49551738): File does not exist. Holder
> DFSClient_NONMAPREDUCE_-24268488_69 does not have any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3602)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3690)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3660)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:738)
>         at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:243)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:527)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:985)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2740)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2757)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 16/05/15 16:51:39 ERROR hdfs.DFSClient: Failed to close inode 49551465
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /user/data2015/df_join2015/_temporary/0/_temporary/attempt_201605151647_0019_m_000030_0/part-r-00030-e94fa2c5-aa0d-4a08-b4c3-9fe7087ca493.gz.parquet
> (inode 49551465): File does not exist. Holder
> DFSClient_NONMAPREDUCE_-24268488_69 does not have any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3602)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3690)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3660)
>
>
>
>    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3602)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3690)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3660)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:738)
>         at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:243)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:527)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at com.sun.proxy.$Proxy15.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>
>
>
>
> 2016-05-15 17:58 GMT+02:00 pseudo oduesp <pseudo20140@gmail.com>:
>
>> i mean 1.5.0
>>
>> 2016-05-15 17:57 GMT+02:00 pseudo oduesp <pseudo20140@gmail.com>:
>>
>>> hi ,
>>> my version it s 15.0 on clouder distribution and yarn.
>>>
>>>
>>> 2016-05-15 17:50 GMT+02:00 Ted Yu <yuzhihong@gmail.com>:
>>>
>>>> bq. ExecutorLostFailure (executor 4 lost)
>>>>
>>>> Can you check executor log for more clue ?
>>>>
>>>> Which Spark release are you using ?
>>>>
>>>> Cheers
>>>>
>>>> On Sun, May 15, 2016 at 8:47 AM, pseudo oduesp <pseudo20140@gmail.com>
>>>> wrote:
>>>>
>>>>> someone can help me about this issues
>>>>>
>>>>>
>>>>>
>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>> o126.parquet.
>>>>> : org.apache.spark.SparkException: Job aborted.
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:156)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>>>>>         at
>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
>>>>>         at
>>>>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
>>>>>         at
>>>>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
>>>>>         at
>>>>> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
>>>>>         at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>>>>>         at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>>>>>         at
>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>>>>         at
>>>>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>>>>>         at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
>>>>>         at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
>>>>>         at
>>>>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
>>>>>         at
>>>>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
>>>>>         at
>>>>> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>>>         at
>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>>         at py4j.Gateway.invoke(Gateway.java:259)
>>>>>         at
>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>>>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>>>>> failure: Task 69 in stage 19.0 failed 4 times, most recent failure: Lost
>>>>> task 69.3 in stage 19.0 (TID 3788, prssnbd1s003.bigplay.bigdata.intraxa):
>>>>> ExecutorLostFailure (executor 4 lost)
>>>>> Driver stacktrace:
>>>>>         at org.apache.spark.scheduler.DAGScheduler.org
>>>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281)
>>>>>         at
>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>         at
>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>>>>>         at scala.Option.foreach(Option.scala:236)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
>>>>>         at
>>>>> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
>>>>>         at
>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
>>>>>         at
>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
>>>>>         at
>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1914)
>>>>>         at
>>>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:150)
>>>>>         ... 28 more
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message