spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karavany, Ido" <ido.karav...@intel.com>
Subject RE: Spark is unable to read from HDFS
Date Mon, 30 Sep 2013 16:12:49 GMT
Hi,



Thanks for the reply.

I've tried the below.



What was done:



- Deployed hadoop-core-1.0.3-Intel.jar into a local ivy repository (ivy XML files are attached)

- Changed HADOOP_VERSION: val HADOOP_VERSION = "1.0.3-Intel" (SparkBuild.scala is attached)

- Executed sbt/sbt assembly successfully

- Check and found a file named: hadoop-core-1.0.3-Intel-1.0.3-Intel.jar in /lib_managed/jars

- Tried to execute scala commands again

                - First val myf = sc.textFile("hdfs://ip-172-31-34-49:8020/iot/test.txt")
returned an exception

                - second attempt of the same command succeeded

                - myf.filter(line => line.contains("aa")).count() failed with the same
WARN and error as before.



I am probably missing some configuration or misconfigured one of the above...



Can you please advise?



Thanks,

Ido





scala> val myf = sc.textFile("hdfs://ip-172-31-34-49:8020/iot/test.txt")

java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration

        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)

        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)

        at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)

        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)

        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)

        at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:229)

        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:465)

        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:451)

        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)

        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)

        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:559)

        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:321)

        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:294)

        at spark.SparkContext.hadoopFile(SparkContext.scala:262)

        at spark.SparkContext.textFile(SparkContext.scala:234)

        at <init>(<console>:12)

        at <init>(<console>:17)

        at <init>(<console>:19)

        at <init>(<console>:21)

        at <init>(<console>:23)

        at .<init>(<console>:27)

        at .<clinit>(<console>)

        at .<init>(<console>:11)

        at .<clinit>(<console>)

        at $export(<console>)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:629)

        at spark.repl.SparkIMain$Request$$anonfun$10.apply(SparkIMain.scala:890)

        at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)

        at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration

        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.scala$tools$nsc$util$ScalaClassLoader$$super$findClass(ScalaClassLoader.scala:88)

        at scala.tools.nsc.util.ScalaClassLoader$class.findClass(ScalaClassLoader.scala:44)

        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.findClass(ScalaClassLoader.scala:88)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.scala$tools$nsc$util$ScalaClassLoader$$super$loadClass(ScalaClassLoader.scala:88)

        at scala.tools.nsc.util.ScalaClassLoader$class.loadClass(ScalaClassLoader.scala:50)

        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.loadClass(ScalaClassLoader.scala:88)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

        ... 36 more



scala> val myf = sc.textFile("hdfs://ip-172-31-34-49:8020/iot/test.txt")

13/09/30 10:53:17 INFO storage.MemoryStore: ensureFreeSpace(40319) called with curMem=0, maxMem=339585269

13/09/30 10:53:17 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated
size 39.4 KB, free 323.8 MB)

myf: spark.RDD[String] = MappedRDD[1] at textFile at <console>:12



scala> myf.filter(line => line.contains("aa")).count()

13/09/30 10:53:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

13/09/30 10:53:22 WARN snappy.LoadSnappy: Snappy native library not loaded

13/09/30 10:53:22 INFO mapred.FileInputFormat: Total input paths to process : 1

13/09/30 10:53:23 INFO spark.SparkContext: Starting job: count at <console>:15

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:15) with
1 output partitions (allowLocal=false)

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Final stage: Stage 0 (filter at <console>:15)

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Parents of final stage: List()

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Missing parents: List()

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter
at <console>:15), which has no missing parents

13/09/30 10:53:23 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (FilteredRDD[2]
at filter at <console>:15)

13/09/30 10:53:23 INFO local.LocalScheduler: Running ResultTask(0, 0)

13/09/30 10:53:23 INFO local.LocalScheduler: Size of task 0 is 1543 bytes

13/09/30 10:54:23 WARN hdfs.DFSClient: Failed to connect to /172.31.34.49:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:40997
remote=/172.31.34.49:50010]

13/09/30 10:55:23 WARN hdfs.DFSClient: Failed to connect to /172.31.34.50:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:56610
remote=/172.31.34.50:50010]

13/09/30 10:55:23 INFO hdfs.DFSClient: Could not obtain block blk_-1057940606378039494_1013
from any node: java.io.IOException: No live nodes contain current block.. Will get new block
locations from namenode and retry...

13/09/30 10:56:26 WARN hdfs.DFSClient: Failed to connect to /172.31.34.49:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:41671
remote=/172.31.34.49:50010]

13/09/30 10:57:26 WARN hdfs.DFSClient: Failed to connect to /172.31.34.50:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:57284
remote=/172.31.34.50:50010]

13/09/30 10:57:26 INFO hdfs.DFSClient: Could not obtain block blk_-1057940606378039494_1013
from any node: java.io.IOException: No live nodes contain current block.. Will get new block
locations from namenode and retry...

13/09/30 10:58:29 WARN hdfs.DFSClient: Failed to connect to /172.31.34.49:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:42338
remote=/172.31.34.49:50010]

13/09/30 10:59:29 WARN hdfs.DFSClient: Failed to connect to /172.31.34.50:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:57945
remote=/172.31.34.50:50010]

13/09/30 10:59:29 INFO hdfs.DFSClient: Could not obtain block blk_-1057940606378039494_1013
from any node: java.io.IOException: No live nodes contain current block.. Will get new block
locations from namenode and retry...

13/09/30 11:00:32 WARN hdfs.DFSClient: Failed to connect to /172.31.34.49:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:43000
remote=/172.31.34.49:50010]

13/09/30 11:01:32 WARN hdfs.DFSClient: Failed to connect to /172.31.34.50:50010, add to deadNodes
and continuejava.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.31.34.49:58609
remote=/172.31.34.50:50010]

13/09/30 11:01:32 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block:
blk_-1057940606378039494_1013 file=/iot/test.txt

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2511)

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2285)

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2444)

        at java.io.DataInputStream.read(DataInputStream.java:83)

        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)

        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)

        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:208)

        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)

        at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:89)

        at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:70)

        at spark.util.NextIterator.hasNext(NextIterator.scala:54)

        at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)

        at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457)

        at spark.RDD$$anonfun$count$1.apply(RDD.scala:580)

        at spark.RDD$$anonfun$count$1.apply(RDD.scala:578)

        at spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

        at spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

        at spark.scheduler.ResultTask.run(ResultTask.scala:77)

        at spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:76)

        at spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

        at java.util.concurrent.FutureTask.run(FutureTask.java:138)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)



13/09/30 11:01:32 ERROR local.LocalScheduler: Exception in task 0

java.io.IOException: Could not obtain block: blk_-1057940606378039494_1013 file=/iot/test.txt

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2511)

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2285)

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2444)

        at java.io.DataInputStream.read(DataInputStream.java:83)

        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)

        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)

        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:208)

        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)

        at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:89)

        at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:70)

        at spark.util.NextIterator.hasNext(NextIterator.scala:54)

        at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)

        at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457)

        at spark.RDD$$anonfun$count$1.apply(RDD.scala:580)

        at spark.RDD$$anonfun$count$1.apply(RDD.scala:578)

        at spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

        at spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

        at spark.scheduler.ResultTask.run(ResultTask.scala:77)

        at spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:76)

        at spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

        at java.util.concurrent.FutureTask.run(FutureTask.java:138)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

13/09/30 11:01:32 INFO scheduler.DAGScheduler: Failed to run count at <console>:15

spark.SparkException: Job failed: ResultTask(0, 0) failed: ExceptionFailure(java.io.IOException,java.io.IOException:
Could not obtain block: blk_-1057940606378039494_1013 file=/iot/test.txt,[Ljava.lang.StackTraceElement;@191be075)

        at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)

        at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)

        at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)

        at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)

        at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)

        at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)







-----Original Message-----
From: Stoney Vintson [mailto:stoneyv@gmail.com]
Sent: Sunday, September 29, 2013 04:15
To: user@spark.incubator.apache.org
Subject: Re: Spark is unable to read from HDFS



Karavany, Ido

Thank you for specifying details about your build configuration and including excerpts from
your log file.  In addition to specifying

HADOOP_VERSION=1.0.3 in the ./project/SparkBuild.scala file, you will need to specify the
libraryDependencies and name "spark-core"

resolvers.  Otherwise, sbt will fetch version 1.0.3 of hadoop-core from apache instead of
Intel.  You can set up your own local or remote repository that you specify.



http://www.scala-sbt.org/0.12.3/docs/Detailed-Topics/Publishing.html



( Note: this particular apache spark document is from latest 0.8.0 and not 0.7.3 ) http://spark.incubator.apache.org/docs/latest/hadoop-third-party-distributions.html



> 13/09/28 13:14:45 WARN util.NativeCodeLoader: Unable to load

> native-hadoop library for your platform... using builtin-java classes

> where applicable



After doing an sbt/sbt assembly, you should find the hadoop core jar file spark-0.7.3/lib_managed/jars/hadoop-core-1.0.3.jar



On Sat, Sep 28, 2013 at 6:42 AM, Karavany, Ido <ido.karavany@intel.com<mailto:ido.karavany@intel.com>>
wrote:

> Hi All,

>

> We're new spark users - trying to install it over Intel Distribution

> for Hadoop.

> IDH (Intel Distribution for Hadoop) has customized Hadoop and has its

> core jar (Hadoop-1.0.3-Intel.jar)

>

> What was done?

>

>

> Download Scala 2.9.3

> Download Spark 0.7.3

> Change ./project/SparkBuild.scala and set HADOOP_VERSION=1.0.3 Compile

> by using sbt/sbt package Create ./conf/spark-env.sh and set SCALA_HOME

> in it Update slaves file Started a standalone cluster Successfully

> tested spark with: ./run spark.examples.SparkPi

> spark://ip-172-31-34-49:7077

>

>

>

> Started spark-shell

> Defining a text file and executing the filter with count()

>

> val myf = sc.textFile("hdfs://ip-172-31-34-49:8020/iot/test.txt")

> myf.filter(line => line.contains("aa")).count()

>

> The file and HDFS are accessible (hdfs fs cat or creating external

> hive

> table)

> The above command fails with the below result One option that I can

> think of is that spark should be compiled against the Hadoop intel jar

> - but I don't know how it can be done...

>

>

> Any help would be great as we stuck with this issue for ~1 month now...

>

> Thanks,

> Ido

>

> below is the output log:

>

> scala> myf.filter(line => line.contains("aa")).count()

> 13/09/28 13:14:45 WARN util.NativeCodeLoader: Unable to load

> native-hadoop library for your platform... using builtin-java classes

> where applicable

> 13/09/28 13:14:45 WARN snappy.LoadSnappy: Snappy native library not

> loaded

> 13/09/28 13:14:45 INFO mapred.FileInputFormat: Total input paths to

> process

> : 1

> 13/09/28 13:14:45 INFO spark.SparkContext: Starting job: count at

> <console>:15

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Got job 0 (count at

> <console>:15) with 1 output partitions (allowLocal=false)

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Final stage: Stage 0

> (filter at <console>:15)

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Parents of final stage:

> List()

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Missing parents: List()

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Submitting Stage 0

> (FilteredRDD[3] at filter at <console>:15), which has no missing

> parents

> 13/09/28 13:14:45 INFO scheduler.DAGScheduler: Submitting 1 missing

> tasks from Stage 0 (FilteredRDD[3] at filter at <console>:15)

> 13/09/28 13:14:45 INFO local.LocalScheduler: Running ResultTask(0, 0)

> 13/09/28 13:14:45 INFO local.LocalScheduler: Size of task 0 is 1543

> bytes

> 13/09/28 13:15:45 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.49:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:44040

> remote=/172.31.34.49:50010]

> 13/09/28 13:16:46 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.50:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:59724

> remote=/172.31.34.50:50010]

> 13/09/28 13:16:46 INFO hdfs.DFSClient: Could not obtain block

> blk_-1057940606378039494_1013 from any node: java.io.IOException: No

> live nodes contain current block. Will get new block locations from

> namenode and retry...

> 13/09/28 13:17:49 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.49:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:44826

> remote=/172.31.34.49:50010]

> 13/09/28 13:18:49 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.50:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:60514

> remote=/172.31.34.50:50010]

> 13/09/28 13:18:49 INFO hdfs.DFSClient: Could not obtain block

> blk_-1057940606378039494_1013 from any node: java.io.IOException: No

> live nodes contain current block. Will get new block locations from

> namenode and retry...

> 13/09/28 13:19:52 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.49:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:45621

> remote=/172.31.34.49:50010]

> 13/09/28 13:20:52 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.50:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:33081

> remote=/172.31.34.50:50010]

> 13/09/28 13:20:52 INFO hdfs.DFSClient: Could not obtain block

> blk_-1057940606378039494_1013 from any node: java.io.IOException: No

> live nodes contain current block. Will get new block locations from

> namenode and retry...

> 13/09/28 13:21:55 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.49:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:46423

> remote=/172.31.34.49:50010]

> 13/09/28 13:22:55 WARN hdfs.DFSClient: Failed to connect to

> /172.31.34.50:50010, add to deadNodes and

> continuejava.net.SocketTimeoutException: 60000 millis timeout while

> waiting for channel to be ready for read. ch :

> java.nio.channels.SocketChannel[connected local=/172.31.34.49:33885

> remote=/172.31.34.50:50010]

> 13/09/28 13:22:55 WARN hdfs.DFSClient: DFS Read: java.io.IOException:

> Could not obtain block: blk_-1057940606378039494_1013 file=/iot/test.txt

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2269)

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2063)

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2224)

>         at java.io.DataInputStream.read(DataInputStream.java:100)

>         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)

>         at

> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133)

>         at

> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)

>         at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:89)

>         at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:70)

>         at spark.util.NextIterator.hasNext(NextIterator.scala:54)

>         at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)

>         at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457)

>         at spark.RDD$$anonfun$count$1.apply(RDD.scala:580)

>         at spark.RDD$$anonfun$count$1.apply(RDD.scala:578)

>         at

> spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

>         at

> spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

>         at spark.scheduler.ResultTask.run(ResultTask.scala:77)

>         at

> spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:76)

>         at

> spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)

>         at

> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

>         at

> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)

>         at

> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

>         at

> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

>         at java.lang.Thread.run(Thread.java:679)

>

> 13/09/28 13:22:55 ERROR local.LocalScheduler: Exception in task 0

> java.io.IOException: Could not obtain block:

> blk_-1057940606378039494_1013 file=/iot/test.txt

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2269)

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2063)

>         at

> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2224)

>         at java.io.DataInputStream.read(DataInputStream.java:100)

>         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)

>         at

> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133)

>         at

> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)

>         at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:89)

>         at spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:70)

>         at spark.util.NextIterator.hasNext(NextIterator.scala:54)

>         at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)

>         at scala.collection.Iterator$$anon$22.hasNext(Iterator.scala:457)

>         at spark.RDD$$anonfun$count$1.apply(RDD.scala:580)

>         at spark.RDD$$anonfun$count$1.apply(RDD.scala:578)

>         at

> spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

>         at

> spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:617)

>         at spark.scheduler.ResultTask.run(ResultTask.scala:77)

>         at

> spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:76)

>         at

> spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)

>         at

> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

>         at

> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)

>         at

> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

>         at

> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

>         at java.lang.Thread.run(Thread.java:679)

> 13/09/28 13:22:55 INFO scheduler.DAGScheduler: Failed to run count at

> <console>:15

> spark.SparkException: Job failed: ResultTask(0, 0) failed:

> ExceptionFailure(java.io.IOException,java.io.IOException: Could not

> obtain

> block: blk_-1057940606378039494_1013

> file=/iot/test.txt,[Ljava.lang.StackTraceElement;@2e9267fe)

>         at

> spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)

>         at

> spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)

>         at

> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

>         at

> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

>         at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)

>         at

> spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)

>         at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)

>         at

> spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)

>         at

> spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)

>
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Mime
View raw message