 Diff: docs/libs/ml/optimization.md 
@@ 0,0 +1,218 @@
+
+mathjax: include
+title: "ML  Optimization"
+displayTitle: <a href="index.md">ML</a>  Optimization
+
+
+* Table of contents
+{:toc}
+
+$$
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\x}{\mathbf{x}}
+\newcommand{\y}{\mathbf{y}}
+\newcommand{\wv}{\mathbf{w}}
+\newcommand{\av}{\mathbf{\alpha}}
+\newcommand{\bv}{\mathbf{b}}
+\newcommand{\N}{\mathbb{N}}
+\newcommand{\id}{\mathbf{I}}
+\newcommand{\ind}{\mathbf{1}}
+\newcommand{\0}{\mathbf{0}}
+\newcommand{\unit}{\mathbf{e}}
+\newcommand{\one}{\mathbf{1}}
+\newcommand{\zero}{\mathbf{0}}
+$$
+
+## Mathematical Formulation
+
+The optimization framework in Flink is a developeroriented package that can be used
to solve
+[optimization](https://en.wikipedia.org/wiki/Mathematical_optimization)
+problems common in Machine Learning (ML) tasks. In the supervised learning context, this
usually
+involves finding a model, as defined by a set of parameters $w$, that minimize a function
$f(\wv)$
+given a set of $(\x, y)$ examples,
+where $\x$ is a feature vector and $y$ is a real number, which can represent either a
real value in
+the regression case, or a class label in the classification case. In supervised learning,
the
+function to be minimized is usually of the form:
+
+$$
+\begin{equation}
+ f(\wv) :=
+ \frac1n \sum_{i=1}^n L(\wv;\x_i,y_i) +
+ \lambda\, R(\wv)
+ \label{eq:objectiveFunc}
+ \ .
+\end{equation}
+$$
+
+where $L$ is the loss function and $R(\wv)$ the regularization penalty. We use $L$ to
measure how
+well the model fits the observed data, and we use $R$ in order to impose a complexity
cost to the
+model, with $\lambda > 0$ being the regularization parameter.
+
+### Loss Functions
+
+In supervised learning, we use loss functions in order to measure the model fit, by
+penalizing errors in the predictions $p$ made by the model compared to the true $y$ for
each
+example. Different loss function can be used for regression (e.g. Squared Loss) and classification
+(e.g. Hinge Loss).
+
+Some common loss functions are:
+
+* Squared Loss: $ \frac{1}{2} (\wv^T \x  y)^2, \quad y \in \R $
+* Hinge Loss: $ \max (0, 1y \wv^T \x), \quad y \in \{1, +1\} $
 End diff 
maybe we can add a small spacing between `y` and `\wv^T\x`
