spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nkd <kalidas.nimmaga...@gmail.com>
Subject output folder structure not getting commited and remains as _temporary
Date Wed, 01 Jul 2015 01:28:00 GMT
I am running a spark application in standalone cluster on windows 7
environment. 
Following are the details.

spark version = 1.4.0
Windows/Standalone mode 

built the Hadoop 2.6.0 on windows and set the env params like so
HADOOP_HOME = E:\hadooptar260\hadoop-2.6.0
HADOOP_CONF_DIR =E:\hadooptar260\hadoop-2.6.0\etc\hadoop  // where the
core-site.xml resides
added this to the path E:\hadooptar260\hadoop-2.6.0\bin

Note: I am not starting Hadoop. Wanted to ensure that hadoop libraries are
made available to Spark 
especially ensuringe hdsf.jar and haddop-common.jar are in classpath and
winutils in system path 


@rem startMaster 
spark-class2.cmd org.apache.spark.deploy.master.Master --host
machine1.QQQ.HYD  --port 7077

@rem startWorker.This worker runs on the same machine as the master
spark-class2.cmd org.apache.spark.deploy.worker.Worker
spark://machine1.QQQ.HYD:7077 

@rem startWorker.This worker runs on a second machine
spark-class2.cmd org.apache.spark.deploy.worker.Worker
spark://machine1.QQQ.HYD:7077 

@rem startApp.This command is run from the machine where master and first
worker are running
spark-submit2 --verbose --jars /app/lib/ojdbc7.jar --driver-class-path
/app/lib/ojdbc7.jar  --driver-library-path
/programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class "org.ETLProcess"
--name MyETL  --master spark://machine1.QQQ.HYD:7077 --deploy-mode client
/app/appjar/myapp-0.1.0.jar ETLProcess 1 51

@rem to avoid the NoSuchmethodException, tried the following 
spark-submit2 --verbose --jars
/app/lib/ojdbc7.jar,/app/lib/hadoop-common-2.6.0.jar,/app/lib/hadoop-hdfs-2.6.0.jar
--driver-class-path /app/lib/ojdbc7.jar  --driver-library-path
/programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class
"org.dwh.oem.transform.ETLProcess" --name SureETL  --master
spark://machine1.QQQ.HYD:7077 --deploy-mode client
/app/appjar/myapp-0.1.0.jar ETLProcess 1 51

The above the ETL job is completing successfully by fetching the data from
db and storing as json files on each of the worker nodes.

*In the first node the files are proprly getting commited and I could see
the removal of _temporary folder and marking it  as -SUCCESS*

*The issue is, files in the second node remain in the _temporary folder
making them as not usable for further jobs. Help required to overcome this
this issue*

*
This is line 176 from SparkHadoopUtil.scala where the below excetion is
occurring *

private def getFileSystemThreadStatistics(): Seq[AnyRef] = {
    val stats = FileSystem.getAllStatistics()
   * stats.map(Utils.invoke(classOf[Statistics], _, "getThreadStatistics")) 
*=========================> Line 176
  }

Following are the extracts from the log which also contains the below
exceptions:

java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()

java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo

java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()

-----------------------------------------------

2015-06-30 15:55:48 DEBUG NativeCodeLoader:46 - Trying to load the
custom-built native-hadoop library...
2015-06-30 15:55:48 DEBUG NativeCodeLoader:50 - Loaded the native-hadoop
library
2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMapping:50 - Using
JniBasedUnixGroupsMapping for Group resolution
2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMappingWithFallback:44 - Group
mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
2015-06-30 15:55:48 DEBUG Groups:80 - Group mapping
impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
cacheTimeout=300000; warningDeltaMs=5000
2015-06-30 15:55:48 DEBUG UserGroupInformation:193 - hadoop login
2015-06-30 15:55:48 DEBUG UserGroupInformation:142 - hadoop login commit
-----------------------------------------------
2015-06-30 15:55:50 DEBUG Master:56 - [actor] received message
RegisterApplication(ApplicationDescription(SureETL)) from
Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865]
2015-06-30 15:55:50 INFO  Master:59 - Registering app SureETL
2015-06-30 15:55:50 INFO  Master:59 - Registered app SureETL with ID
app-20150630155550-0001
2015-06-30 15:55:50 INFO  Master:59 - Launching executor
app-20150630155550-0001/0 on worker
worker-20150630154548-172.16.11.212-59791
2015-06-30 15:55:50 INFO  Master:59 - Launching executor
app-20150630155550-0001/1 on worker
worker-20150630155002-172.16.11.133-61908
2015-06-30 15:55:50 DEBUG Master:62 - [actor] handled message (8.672752 ms)
RegisterApplication(ApplicationDescription(SureETL)) from
Actor[akka.tcp://sparkDriver@172.16.11.212:59974/user/$a#-1360185865]
-----------------------------------------------
2015-06-30 15:56:02 DEBUG Server:228 - rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@604d28c6
2015-06-30 15:56:02 DEBUG Client:63 - getting client out of cache:
org.apache.hadoop.ipc.Client@1511d157
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:56 - [actor] received
message AkkaMessage(ReviveOffers,false) from
Actor[akka://sparkDriver/deadLetters]
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:63 - Received RPC
message: AkkaMessage(ReviveOffers,false)
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:62 - [actor] handled
message (1.73455 ms) AkkaMessage(ReviveOffers,false) from
Actor[akka://sparkDriver/deadLetters]
2015-06-30 15:56:03 DEBUG BlockReaderLocal:105 - Both short-circuit local
reads and UNIX domain socket are disabled.
2015-06-30 15:56:03 DEBUG PairRDDFunctions:63 - Saving as hadoop file of
type (NullWritable, Text)
2015-06-30 15:56:03 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
Hadoop classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Unknown Source)
	at
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
	at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
	at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
	at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
	at org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:61)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1093)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
	at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
	at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
	at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)
	at
org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400)
	at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
	at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
	at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)
	at
org.apache.spark.sql.json.DefaultSource.createRelation(JSONRelation.scala:99)
	at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:305)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
	at
org.dwh.oem.extract.OrderLookupExtractor$.orderLookupExtractionProcss(OrderingLookupExtractor.scala:61)
	at org.dwh.oem.transform.ETLProcess$.main(ETLProcess.scala:33)
	at org.dwh.oem.transform.ETLProcess.main(ETLProcess.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.tip.id is deprecated.
Instead, use mapreduce.task.id
2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.is.map is
deprecated. Instead, use mapreduce.task.ismap
2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.task.partition is
deprecated. Instead, use mapreduce.task.partition
2015-06-30 15:56:03 INFO  deprecation:1009 - mapred.job.id is deprecated.
Instead, use mapreduce.job.id
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
<function2>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13})
+++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared fields: 4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public static final long
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.serialVersionUID
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.$outer
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
org.apache.spark.SerializableWritable
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.wrappedConf$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
org.apache.spark.SparkHadoopWriter
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.writer$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared methods: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
java.lang.Object
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(java.lang.Object,java.lang.Object)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$anonfun$$$outer()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + inner classes: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer classes: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer objects: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      <function0>
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions@5d14e99e
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + populating accessed fields
because this is the starting closure
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + fields accessed by starting
closure: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
org.apache.spark.rdd.PairRDDFunctions,Set())
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outermost object is not a
closure, so do not clone it: (class
org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + cloning the object
<function0> of class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + cleaning cloned closure
<function0> recursively
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
<function0>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1}) +++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared fields: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public static final long
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.serialVersionUID
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
org.apache.spark.rdd.PairRDDFunctions
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.$outer
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      private final
org.apache.hadoop.mapred.JobConf
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.conf$4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + declared methods: 4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final
java.lang.Object
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public final void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      public
org.apache.spark.rdd.PairRDDFunctions
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$$outer()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + inner classes: 5
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$apply$mcV$sp$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer classes: 1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outer objects: 1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -     
org.apache.spark.rdd.PairRDDFunctions@5d14e99e
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + fields accessed by starting
closure: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
org.apache.spark.rdd.PairRDDFunctions,Set())
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -      (class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  + outermost object is not a
closure, so do not clone it: (class
org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  +++ closure <function0>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1) is
now cleaned +++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -  +++ closure <function2>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13)
is now cleaned +++
2015-06-30 15:56:03 INFO  SparkContext:59 - Starting job: save at
OrderingLookupExtractor.scala:61

-----------------------------------------------------------------------------------------
15-06-30 15:56:11 DEBUG SparkHadoopUtil:84 - Couldn't find method for
retrieving thread-level FileSystem output data
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
	at java.lang.Class.getDeclaredMethod(Unknown Source)
	at
org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:182)
	at
org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:162)
	at
org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
2015-06-30 15:56:11 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
Hadoop classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Unknown Source)
	at
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
	at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
	at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
	at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
	at org.apache.spark.SparkHadoopWriter.setup(SparkHadoopWriter.scala:70)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1103)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
2015-06-30 15:56:11 DEBUG NativeIO:191 - Initialized cache for IDs to
User/Group mapping with a  cache timeout of 14400 seconds.
2015-06-30 15:56:11 INFO  JDBCRDD:59 - closed connection
2015-06-30 15:56:11 INFO  FileOutputCommitter:439 - Saved output of task
'attempt_201506301556_0000_m_000000_0' to
file:/sparketl/extract/icasdb_cl/oem/lookup51/dw_app_value_list/_temporary/0/task_201506301556_0000_m_000000
2015-06-30 15:56:11 INFO  SparkHadoopMapRedUtil:59 -
attempt_201506301556_0000_m_000000_0: Committed
2015-06-30 15:56:11 INFO  JDBCRDD:59 - closed connection
2015-06-30 15:56:11 INFO  Executor:59 - Finished task 0.0 in stage 0.0 (TID
0). 624 bytes result sent to driver
--------------------------------------------------------------------------------------
2015-06-30 15:57:03 DEBUG SparkHadoopUtil:84 - Couldn't find method for
retrieving thread-level FileSystem output data
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
	at java.lang.Class.getDeclaredMethod(Unknown Source)
	at org.apache.spark.util.Utils$.invoke(Utils.scala:2069)
	at
org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
	at
org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
	at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
	at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
	at scala.collection.Iterator$class.foreach(Iterator.scala:750)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at
org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:176)
	at
org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:161)
	at
org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
	at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
-----------------------------------------------



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/output-folder-structure-not-getting-commited-and-remains-as-temporary-tp23557.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message