spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Cuthbert <cuthbert....@gmail.com>
Subject Re: hdfs streaming context
Date Mon, 01 Dec 2014 22:55:33 GMT
Thanks Sean,

That worked just removing the /* and leaving it as /user/data

Seems to be streaming in.


> On 1 Dec 2014, at 22:50, Sean Owen <sowen@cloudera.com> wrote:
> 
> Yes, in fact, that's the only way it works. You need
> "hdfs://localhost:8020/user/data", I believe.
> 
> (No it's not correct to write "hdfs:///...")
> 
> On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert
> <cuthbert.ben@gmail.com> wrote:
>> All,
>> 
>> Is it possible to stream on HDFS directory and listen for multiple files?
>> 
>> I have tried the following
>> 
>> val sparkConf = new SparkConf().setAppName("HdfsWordCount")
>> val ssc = new StreamingContext(sparkConf, Seconds(2))
>> val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*")
>> lines.filter(line => line.contains("GE"))
>> lines.print()
>> ssc.start()
>> 
>> But I get
>> 
>> 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time 1417469742000
ms
>> java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not exist.
>>        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408)
>>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
>>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
>>        at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
>>        at org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message