Hi ,

 

I am running spark streaming in standalone mode on twitter source into single machine(using HDP virtual box)  I receive status from streaming context and I can print the same but when I try to save those statuses as RDD into Hadoop using rdd.saveAsTextFiles or saveAsHadoopFiles(“hdfs://10.20.32.204:50070/user/hue/test”,”txt”) I get below connection error.

My Hadoop version:2.4.0.2.1.1.0-385

Spark 1.1.0

 

ERROR-----------------------------------------------------

 

14/12/09 04:45:12 ERROR scheduler.JobScheduler: Error running job streaming job 1418129110000 ms.1

java.io.IOException: Call to /10.20.32.204:50070 failed on local exception: java.io.EOFException

        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)

        at org.apache.hadoop.ipc.Client.call(Client.java:1075)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)

        at com.sun.proxy.$Proxy9.getProtocolVersion(Unknown Source)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

        at org.apache.hadoop.mapred.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:193)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:685)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)

        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)

        at org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:762)

        at org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:760)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at scala.util.Try$.apply(Try.scala:161)

        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)

        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:155)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:744)

Caused by: java.io.EOFException

        at java.io.DataInputStream.readInt(DataInputStream.java:392)

        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:811)

        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

[error] (run-main-0) java.io.IOException: Call to /10.20.32.204:50070 failed on local exception: java.io.EOFException

java.io.IOException: Call to /10.20.32.204:50070 failed on local exception: java.io.EOFException

        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)

        at org.apache.hadoop.ipc.Client.call(Client.java:1075)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)

        at com.sun.proxy.$Proxy9.getProtocolVersion(Unknown Source)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

        at org.apache.hadoop.mapred.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:193)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:685)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:572)

        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:894)

        at org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:762)

        at org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:760)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at scala.util.Try$.apply(Try.scala:161)

        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)

        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:155)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:744)

Caused by: java.io.EOFException

        at java.io.DataInputStream.readInt(DataInputStream.java:392)

        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:811)

        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

[trace] Stack trace suppressed: run last compile:run for the full output.

 

 

 

 

 

Thanks,

Saurabh


Happiest Minds Disclaimer

This message is for the sole use of the intended recipient(s) and may contain confidential, proprietary or legally privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the original intended recipient of the message, please contact the sender by reply email and destroy all copies of the original message.

Happiest Minds Technologies