spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Get the value of DStream[(String, Iterable[String])]
Date Wed, 17 Dec 2014 16:28:26 GMT
You can create a DStream that contains the count, transforming the grouped
windowed RDD, like this:
val errorCount = grouping.map{case (k,v) => v.size }

If you need to preserve the key:
val errorCount = grouping.map{case (k,v) => (k,v.size) }

or you if you don't care about the content of the values, you could count
directly, instead of grouping first:

val errorCount = mapErrorLines.countByWindow(Seconds(8), Seconds(4))

Not sure why you're using map(line => ("key", line)) as there only seem to
be one key. If that's not required, we can simplify one more step:

val errorCount = errorLines.countByWindow(Seconds(8), Seconds(4))


The question is: what do you want to do with that count afterwards?

-kr, Gerard.


On Wed, Dec 17, 2014 at 5:11 PM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:
>
> I'm a newbie with Spark,,, a simple question
>
> val errorLines = lines.filter(_.contains("h"))
> val mapErrorLines = errorLines.map(line => ("key", line))
> val grouping = errorLinesValue.groupByKeyAndWindow(Seconds(8), Seconds(4))
>
> I get something like:
>
> 604: -------------------------------------------
> 605: Time: 1418832180000 ms
> 606: -------------------------------------------
> 607: (key,ArrayBuffer(h2, h3, h4))
>
> Now, I would like to get that ArrayBuffer and count the number of
> elements,,
> How could I get that arrayBuffer??? something like:
> val values = grouping.getValue()... How could I do this in Spark with
> Scala?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message