spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kundan kumar <iitr.kun...@gmail.com>
Subject summary for all columns (numeric, strings) in a dataset
Date Sat, 24 Jan 2015 10:32:32 GMT
Hi ,

Is there something like summary function in spark like that in "R".

The summary calculation which comes with
spark(MultivariateStatisticalSummary) operates only on numeric types.

I am interested in getting the results for string types also like the first
four max occuring strings(groupby kind of operation) , number of uniques
etc.

Is there any preexisting code for this ?

If not what please suggest the best way to deal with string types.

Thanks,
Kundan

Mime
View raw message