flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xu Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-12671) Summarizer: summary statistics for Table
Date Wed, 29 May 2019 09:32:00 GMT
Xu Yang created FLINK-12671:
-------------------------------

             Summary: Summarizer: summary statistics for Table
                 Key: FLINK-12671
                 URL: https://issues.apache.org/jira/browse/FLINK-12671
             Project: Flink
          Issue Type: Sub-task
            Reporter: Xu Yang
            Assignee: Xu Yang


We provide summary statistics for Table through Summarizer. User can easily get the total
count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1,
normL2, the number of missing values and the number of valid values.

SparkML has same function, [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]

 

 

Example:

 

Table input = … 

TableSummary summary = *new* Summarizer(_input_).collectResult();

System.*_out_*.println(summary.mean(*"age"*));  // print the mean of the column(Name: “age”)

System.out.println(summary);

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message