spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark RuntimeException hadoop output format
Date Sat, 15 Aug 2015 00:29:20 GMT
First you create the file:

    final File outputFile = new File(outputPath);

Then you write to it:
        Files.append(counts + "\n", outputFile, Charset.defaultCharset());

Cheers

On Fri, Aug 14, 2015 at 4:38 PM, Mohit Anchlia <mohitanchlia@gmail.com>
wrote:

> I thought prefix meant the output path? What's the purpose of prefix and
> where do I specify the path if not in prefix?
>
> On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Please take a look at JavaPairDStream.scala:
>>  def saveAsHadoopFiles[F <: OutputFormat[_, _]](
>>       prefix: String,
>>       suffix: String,
>>       keyClass: Class[_],
>>       valueClass: Class[_],
>>       outputFormatClass: Class[F]) {
>>
>> Did you intend to use outputPath as prefix ?
>>
>> Cheers
>>
>>
>> On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia <mohitanchlia@gmail.com>
>> wrote:
>>
>>> Spark 1.3
>>>
>>> Code:
>>>
>>> wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>,
>>> Time, Void>()* {
>>>
>>> @Override
>>>
>>> *public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
>>> IOException {
>>>
>>> String counts = "Counts at time " + time + " " + rdd.collect();
>>>
>>> System.*out*.println(counts);
>>>
>>> System.*out*.println("Appending to " + outputFile.getAbsolutePath());
>>>
>>> Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());
>>>
>>> *return* *null*;
>>>
>>> }
>>>
>>> });
>>>
>>> wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.
>>> *class*, TextOutputFormat.*class*);
>>>
>>>
>>> What do I need to check in namenode? I see 0 bytes files like this:
>>>
>>>
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495124000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495125000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495126000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495127000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495128000.txt
>>>
>>>
>>>
>>> However, I also wrote data to a local file on the local file system for
>>> verification and I see the data:
>>>
>>>
>>> $ ls -ltr !$
>>> ls -ltr /tmp/out
>>> -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out
>>>
>>>
>>> On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> Which Spark release are you using ?
>>>>
>>>> Can you show us snippet of your code ?
>>>>
>>>> Have you checked namenode log ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mohitanchlia@gmail.com>
>>>> wrote:
>>>>
>>>> I was able to get this working by using an alternative method however I
>>>> only see 0 bytes files in hadoop. I've verified that the output does exist
>>>> in the logs however it's missing from hdfs.
>>>>
>>>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mohitanchlia@gmail.com
>>>> > wrote:
>>>>
>>>>> I have this call trying to save to hdfs 2.6
>>>>>
>>>>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>>>>
>>>>> but I am getting the following:
>>>>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>>>>> org.apache.hadoop.mapreduce.OutputFormat
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message