spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Heiler <georg.kf.hei...@gmail.com>
Subject Re: Collecting Multiple Aggregation query result on one Column as collectAsMap
Date Mon, 28 Aug 2017 17:10:41 GMT
What about the rdd stat counter?
https://spark.apache.org/docs/0.6.2/api/core/spark/util/StatCounter.html
Patrick <titlibatali@gmail.com> schrieb am Mo. 28. Aug. 2017 um 16:47:

> Hi
>
> I have two lists:
>
>
>    - List one: contains names of columns on which I want to do aggregate
>    operations.
>    - List two: contains the aggregate operations on which I want to
>    perform on each column eg ( min, max, mean)
>
> I am trying to use spark 2.0 dataset to achieve this. Spark provides an
> agg() where you can pass a Map <String,String> (of column name and
> respective aggregate operation ) as input, however I want to perform
> different aggregation operations on the same column of the data and want to
> collect the result in a Map<String,String> where key is the aggregate
> operation and Value is the result on the particular column.  If i add
> different agg() to same column, the key gets updated with latest value.
>
> Also I dont find any collectAsMap() operation that returns map of
> aggregated column name as key and result as value. I get collectAsList()
> but i dont know the order in which those agg() operations are run so how do
> i match which list values corresponds to which agg operation.  I am able to
> see the result using .show() but How can i collect the result in this case ?
>
> Is it possible to do different aggregation on the same column in one
> Job(i.e only one collect operation) using agg() operation?
>
>
> Thanks in advance.
>
>

Mime
View raw message