flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Palumbo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3605) DataSet.output(HadoopOutput) always expects a LongWritable as a key and throws an exception when supplied with other Writable Key Types
Date Mon, 14 Mar 2016 01:43:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192602#comment-15192602
] 

Andrew Palumbo commented on FLINK-3605:
---------------------------------------

These tests will pass by explicitly setting the output key an value types in the {{Job}}:
{code}
 job.setOutputKeyClass(classOf[IntWritable])
 job.setOutputValueClass(classOf[IntWritable])
{code}

I'm not sure that that is the way the API is meant to work but, So will leave this Issue open
for you to decide.


> DataSet.output(HadoopOutput) always expects a LongWritable as a key and throws an exception
when supplied with other Writable Key Types
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-3605
>                 URL: https://issues.apache.org/jira/browse/FLINK-3605
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API, Scala API
>    Affects Versions: 1.0.0
>            Reporter: Andrew Palumbo
>              Labels: mahout
>
> When trying to output a {{DataSet}} as a Sequence file, using the Scala API, an Exception
is thrown when anything other than a {{LongWritable}} is supplied as a key.  The following
simple unit test demonstrates this:
> {code}
>   test("Simple DataSet with IntWritable Key and Value") {
>     val path = TmpDir + "flinkOutput" 
>     implicit val typeInfo = createTypeInformation[(Int,Int)]
>     val ds = env.fromElements[(Int, Int)]((1,2), (3,4), (5,6), (7,8))
>     val writableDataset : DataSet[(IntWritable, IntWritable)] =
>       ds.map( tuple =>
>         (new IntWritable(tuple._1.asInstanceOf[Int]), 
>          new IntWritable(tuple._2.asInstanceOf[Int])) )
>     val job: Job = new Job()
>     // setup sink for IntWritable
>     val sequenceFormat = 
>        new SequenceFileOutputFormat[IntWritable,  IntWritable]
>     val hadoopOutput  = 
>        new HadoopOutputFormat[IntWritable, IntWritable](sequenceFormat, job)
>     FileOutputFormat.setOutputPath(job, new org.apache.hadoop.fs.Path(path))
>     writableDataset.output(hadoopOutput)
>     env.execute(s"dfsWrite($path)")
> }
> {code}
> The above code will throw the following exception:
> {code}
> {...}
> 1 [DataSink (org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat@4d518b32)
(1/1)] ERROR org.apache.flink.runtime.operators.DataSinkTask  - Error in user code: wrong
key class: org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.LongWritable:
 DataSink (org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat@4d518b32) (1/1)
> java.io.IOException: wrong key class: org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.LongWritable
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1305)
> 	at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
> 	at org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat.writeRecord(HadoopOutputFormat.scala:30)
> 	at org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat.writeRecord(HadoopOutputFormat.scala:26)
> 	at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:200)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> When trying to write a {{DataSet}} to a Hadoop Sequence File, the API seems to always
expect a Key of type {{LongWritable}} and a Value of Type {{Text}}.  Similar exceptions are
thrown when other {{Writables}} are used as keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message