spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: output folder structure not getting commited and remains as _temporary
Date Wed, 01 Jul 2015 07:07:31 GMT
Looks like a jar conflict to me.

ava.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()


You are having multiple versions of the same jars in the classpath.

Thanks
Best Regards

On Wed, Jul 1, 2015 at 6:58 AM, nkd <kalidas.nimmagadda@gmail.com> wrote:

> I am running a spark application in standalone cluster on windows 7
> environment.
> Following are the details.
>
> spark version = 1.4.0
> Windows/Standalone mode
>
> built the Hadoop 2.6.0 on windows and set the env params like so
> HADOOP_HOME = E:\hadooptar260\hadoop-2.6.0
> HADOOP_CONF_DIR =E:\hadooptar260\hadoop-2.6.0\etc\hadoop  // where the
> core-site.xml resides
> added this to the path E:\hadooptar260\hadoop-2.6.0\bin
>
> Note: I am not starting Hadoop. Wanted to ensure that hadoop libraries are
> made available to Spark
> especially ensuringe hdsf.jar and haddop-common.jar are in classpath and
> winutils in system path
>
>
> @rem startMaster
> spark-class2.cmd org.apache.spark.deploy.master.Master --host
> machine1.QQQ.HYD  --port 7077
>
> @rem startWorker.This worker runs on the same machine as the master
> spark-class2.cmd org.apache.spark.deploy.worker.Worker
> spark://machine1.QQQ.HYD:7077
>
> @rem startWorker.This worker runs on a second machine
> spark-class2.cmd org.apache.spark.deploy.worker.Worker
> spark://machine1.QQQ.HYD:7077
>
> @rem startApp.This command is run from the machine where master and first
> worker are running
> spark-submit2 --verbose --jars /app/lib/ojdbc7.jar --driver-class-path
> /app/lib/ojdbc7.jar  --driver-library-path
> /programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class "org.ETLProcess"
> --name MyETL  --master spark://machine1.QQQ.HYD:7077 --deploy-mode client
> /app/appjar/myapp-0.1.0.jar ETLProcess 1 51
>
> @rem to avoid the NoSuchmethodException, tried the following
> spark-submit2 --verbose --jars
>
> /app/lib/ojdbc7.jar,/app/lib/hadoop-common-2.6.0.jar,/app/lib/hadoop-hdfs-2.6.0.jar
> --driver-class-path /app/lib/ojdbc7.jar  --driver-library-path
> /programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class
> "org.dwh.oem.transform.ETLProcess" --name SureETL  --master
> spark://machine1.QQQ.HYD:7077 --deploy-mode client
> /app/appjar/myapp-0.1.0.jar ETLProcess 1 51
>
> The above the ETL job is completing successfully by fetching the data from
> db and storing as json files on each of the worker nodes.
>
> *In the first node the files are proprly getting commited and I could see
> the removal of _temporary folder and marking it  as -SUCCESS*
>
> *The issue is, files in the second node remain in the _temporary folder
> making them as not usable for further jobs. Help required to overcome this
> this issue*
>
> *
> This is line 176 from SparkHadoopUtil.scala where the below excetion is
> occurring *
>
> private def getFileSystemThreadStatistics(): Seq[AnyRef] = {
>     val stats = FileSystem.getAllStatistics()
>    * stats.map(Utils.invoke(classOf[Statistics], _, "getThreadStatistics"))
> *=========================> Line 176
>   }
>
> Following are the extracts from the log which also contains the below
> exceptions:
>
> java.lang.NoSuchMethodException:
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
>
> java.lang.ClassNotFoundException:
> org.apache.hadoop.mapred.InputSplitWithLocationInfo
>
> java.lang.NoSuchMethodException:
> org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
>
> -----------------------------------------------
>
> 2015-06-30 15:55:48 DEBUG NativeCodeLoader:46 - Trying to load the
> custom-built native-hadoop library...
> 2015-06-30 15:55:48 DEBUG NativeCodeLoader:50 - Loaded the native-hadoop
> library
> 2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMapping:50 - Using
> JniBasedUnixGroupsMapping for Group resolution
> 2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMappingWithFallback:44 - Group
> mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
> 2015-06-30 15:55:48 DEBUG Groups:80 - Group mapping
> impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
> cacheTimeout=300000; warningDeltaMs=5000
> 2015-06-30 15:55:48 DEBUG UserGroupInformation:193 - hadoop login
> 2015-06-30 15:55:48 DEBUG UserGroupInformation:142 - hadoop login commit
> -----------------------------------------------
> 2015-06-30 15:55:50 DEBUG Master:56 - [actor] received message
> RegisterApplication(ApplicationDescription(SureETL)) from
> Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865]
> 2015-06-30 15:55:50 INFO  Master:59 - Registering app SureETL
> 2015-06-30 15:55:50 INFO  Master:59 - Registered app SureETL with ID
> app-20150630155550-0001
> 2015-06-30 15:55:50 INFO  Master:59 - Launching executor
> app-20150630155550-0001/0 on worker
> worker-20150630154548-172.16.11.212-59791
> 2015-06-30 15:55:50 INFO  Master:59 - Launching executor
> app-20150630155550-0001/1 on worker
> worker-20150630155002-172.16.11.133-61908
> 2015-06-30 15:55:50 DEBUG Master:62 - [actor] handled message (8.672752 ms)
> RegisterApplication(ApplicationDescription(SureETL)) from
> Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865]
> -----------------------------------------------
> 2015-06-30 15:56:02 DEBUG Server:228 - rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@604d28c6
> 2015-06-30 15:56:02 DEBUG Client:63 - getting client out of cache:
> org.apache.hadoop.ipc.Client@1511d157
> 2015-06-30 15:56:03 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:56 - [actor] received
> message AkkaMessage(ReviveOffers,false) from
> Actor[akka://sparkDriver/deadLetters]
> 2015-06-30 15:56:03 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:63 - Received RPC
> message: AkkaMessage(ReviveOffers,false)
> 2015-06-30 15:56:03 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:62 - [actor] handled
> message (1.73455 ms) AkkaMessage(ReviveOffers,false) from
> Actor[akka://sparkDriver/deadLetters]
> 2015-06-30 15:56:03 DEBUG BlockReaderLocal:105 - Both short-circuit local
> reads and UNIX domain socket are disabled.
> 2015-06-30 15:56:03 DEBUG PairRDDFunctions:63 - Saving as hadoop file of
> type (NullWritable, Text)
> 2015-06-30 15:56:03 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
> Hadoop classes are unavailable. Using the older Hadoop location info code.
> java.lang.ClassNotFoundException:
> org.apache.hadoop.mapred.InputSplitWithLocationInfo
>         at java.net.URLClassLoader.findClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Unknown Source)
>         at
>
> org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
>         at
> org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
>         at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
>         at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
>         at
> org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:61)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1093)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)
>         at
>
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400)
>         at
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
>         at
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>         at
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>         at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)
>         at
>
> org.apache.spark.sql.json.DefaultSource.createRelation(JSONRelation.scala:99)
>         at
> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:305)
>         at
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
>         at
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
>         at
>
> org.dwh.oem.extract.OrderLookupExtractor$.orderLookupExtractionProcss(OrderingLookupExtractor.scala:61)
>         at org.dwh.oem.transform.ETLProcess$.main(ETLProcess.scala:33)
>         at org.dwh.oem.transform.ETLProcess.main(ETLProcess.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
>         at
>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.tip.id is deprecated.
> Instead, use mapreduce.task.id
> 2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.is.map is
> deprecated. Instead, use mapreduce.task.ismap
> 2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.partition is
> deprecated. Instead, use mapreduce.task.partition
> 2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.job.id is deprecated.
> Instead, use mapreduce.job.id
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
> <function2>
>
> (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13})
> +++
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared fields: 4
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public static final long
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.serialVersionUID
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.$outer
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
> org.apache.spark.SerializableWritable
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.wrappedConf$2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
> org.apache.spark.SparkHadoopWriter
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.writer$2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared methods: 3
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final void
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
> java.lang.Object
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(java.lang.Object,java.lang.Object)
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$
> 13.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$anonfun$$$outer()
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + inner classes: 3
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer classes: 2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
> org.apache.spark.rdd.PairRDDFunctions
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer objects: 2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      <function0>
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
> org.apache.spark.rdd.PairRDDFunctions@5d14e99e
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + populating accessed fields
> because this is the starting closure
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + fields accessed by
> starting
> closure: 2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
> org.apache.spark.rdd.PairRDDFunctions,Set())
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outermost object is not a
> closure, so do not clone it: (class
>
> org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e
> )
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + cloning the object
> <function0> of class
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + cleaning cloned closure
> <function0> recursively
> (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1)
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
> <function0>
> (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1}) +++
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared fields: 3
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public static final long
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.serialVersionUID
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
> org.apache.spark.rdd.PairRDDFunctions
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.$outer
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
> org.apache.hadoop.mapred.JobConf
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.conf$4
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared methods: 4
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
> java.lang.Object
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final void
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public void
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp()
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public
> org.apache.spark.rdd.PairRDDFunctions
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.org
> $apache$spark$rdd$PairRDDFunctions$$anonfun$$$outer()
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + inner classes: 5
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$apply$mcV$sp$2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer classes: 1
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
> org.apache.spark.rdd.PairRDDFunctions
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer objects: 1
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
> org.apache.spark.rdd.PairRDDFunctions@5d14e99e
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + fields accessed by
> starting
> closure: 2
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
> org.apache.spark.rdd.PairRDDFunctions,Set())
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outermost object is not a
> closure, so do not clone it: (class
>
> org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e
> )
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  +++ closure <function0>
> (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1) is
> now cleaned +++
> 2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  +++ closure <function2>
>
> (org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13)
> is now cleaned +++
> 2015-06-30 15:56:03 INFO  SparkContext:59 - Starting job: save at
> OrderingLookupExtractor.scala:61
>
>
> -----------------------------------------------------------------------------------------
> 15-06-30 15:56:11 DEBUG SparkHadoopUtil:84 - Couldn't find method for
> retrieving thread-level FileSystem output data
> java.lang.NoSuchMethodException:
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
>         at java.lang.Class.getDeclaredMethod(Unknown Source)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:182)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:162)
>         at
> org.apache.spark.rdd.PairRDDFunctions.org
> $apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
> 2015-06-30 15:56:11 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
> Hadoop classes are unavailable. Using the older Hadoop location info code.
> java.lang.ClassNotFoundException:
> org.apache.hadoop.mapred.InputSplitWithLocationInfo
>         at java.net.URLClassLoader.findClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Unknown Source)
>         at
>
> org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
>         at
> org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
>         at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
>         at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
>         at
> org.apache.spark.SparkHadoopWriter.setup(SparkHadoopWriter.scala:70)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1103)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
> 2015-06-30 15:56:11 DEBUG NativeIO:191 - Initialized cache for IDs to
> User/Group mapping with a  cache timeout of 14400 seconds.
> 2015-06-30 15:56:11 INFO  JDBCRDD:59 - closed connection
> 2015-06-30 15:56:11 INFO  FileOutputCommitter:439 - Saved output of task
> 'attempt_201506301556_0000_m_000000_0' to
>
> file:/sparketl/extract/icasdb_cl/oem/lookup51/dw_app_value_list/_temporary/0/task_201506301556_0000_m_000000
> 2015-06-30 15:56:11 INFO  SparkHadoopMapRedUtil:59 -
> attempt_201506301556_0000_m_000000_0: Committed
> 2015-06-30 15:56:11 INFO  JDBCRDD:59 - closed connection
> 2015-06-30 15:56:11 INFO  Executor:59 - Finished task 0.0 in stage 0.0 (TID
> 0). 624 bytes result sent to driver
>
> --------------------------------------------------------------------------------------
> 2015-06-30 15:57:03 DEBUG SparkHadoopUtil:84 - Couldn't find method for
> retrieving thread-level FileSystem output data
> java.lang.NoSuchMethodException:
> org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
>         at java.lang.Class.getDeclaredMethod(Unknown Source)
>         at org.apache.spark.util.Utils$.invoke(Utils.scala:2069)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:750)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
>         at
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:176)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:161)
>         at
> org.apache.spark.rdd.PairRDDFunctions.org
> $apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
>         at
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
> -----------------------------------------------
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/output-folder-structure-not-getting-commited-and-remains-as-temporary-tp23557.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message