spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject using withWatermark on Dataset
Date Wed, 01 Feb 2017 18:38:49 GMT
Hi everyone,

Anyone knows how to use withWatermark  on Dataset?

I have tried the following but hit this exception:

dataset org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
cannot be cast to "MyType"

The code looks like the following:

dataset
.withWatermark("timestamp", "5 seconds")
.groupBy("timestamp", "customer_id")
.agg(MyAggregator)
.writeStream....

Where dataset has MyType for each row.
Where MyType is:
case class MyTpe(customer_id: Long, timestamp: Timestamp, product_id: Long)

MyAggregator which takes MyType as the input type did some maths on the
product_id and outputs a set of product_ids.

Best Regards,

Jerry

Mime
View raw message