Convex Functions on \({\mathbb R}^d\)

The separating hyperplane theorem has some important implications for the structure theory of convex functions. Suppose that \(E \subset {\mathbb R}^d\) is convex. Recall that a function \(\varphi \ : \ E \rightarrow {\mathbb R}\) is called convex when it satisfies the inequality
\[ \varphi(\theta x + (1-\theta) x') \leq \theta \varphi(x) + (1-\theta) \varphi(x') \]
for all \(x,x' \in E\) and all \(\theta \in [0,1]\). A simple induction argument establishes that it must also satisfy
\[ \varphi \left( \sum_{i=1}^m \theta_i x_i \right) \leq \sum_{i=1}^m \theta_i \varphi(x_i) \]
for all \(x_1,\ldots,x_m \in E\) and all \(\theta_i \in [0,1]\) satisfying \(\sum_{i=1}^m = 1\).
Proposition
Suppose \(y,v_1,\ldots,v_m\) are vectors in \({\mathbb R}^d\) and that \(\varphi\) is a convex function on a set \(E \subset {\mathbb R}^d\) which contains every vector of the form \(y \pm v_i\). If \(y' = y + \sum_{i=1}^m \theta_i v_i\) for constants \(\theta_i\) such that \(\sum_{i=1}^m |\theta_i| \leq 1\), and \(K_y\) is the maximum of \(|\varphi(y\pm v_i) - \varphi(y)|\) over all \(i=1,\ldots,m\) and all choices of \(\pm\) sign, then
\[{}\left| \varphi(y') - \varphi(y) \right| \leq K_y \sum_{i=1}^m |\theta_i|.{}\]
Proof
By symmetry in signs, it may be assumed that the \(v_i\) are defined in such a way that \(\theta_i \geq 0\) for each \(i\). Using the identity
\[{}y + \sum_{i=1}^m \theta_i v_i {}\]
\[{}= \left(1-\sum_{i=1}^m \theta_i \right) y{}\]
\[{}+ \sum_{i=1}^m \theta_i (y + v_i){}\]
and convexity of \(\varphi\) gives that
\[{}\varphi \left(y + \sum_{i=1}^m \theta_i v_i \right){}\]
\[{}\leq \left( 1 - \sum_{i=1}^m \theta_i \right) \varphi(y){}\]
\[{}+ \sum_{i=1}^m \theta_i \varphi(y+v_i){}\]
which implies that
\[{}\varphi\left(y' \right) - \varphi(y) {}\]
\[{}\leq \sum_{i=1}^m \theta_i (\varphi(y + v_i) - \varphi(y)){}\]
\[{}\leq \sum_{i=1}^m |\theta_i| |\varphi(y+v_i) - \varphi(y)|{}\]
\[{}\leq K_y \sum_{i=1}^m |\theta_i|.{}\]
Next, the identity
\[{}y = \frac{1}{1 + \sum_{i=1}^m \theta_i} y' {}\]
\[{}+ \sum_{i=1}^m \frac{\theta_i}{1+\sum_{i=1}^m \theta_i} (y - v_i){}\]
implies by convexity that
\[{}\varphi(y){}\]
\[{}\leq \frac{1}{1 + \sum_{i=1}^m \theta_i} \varphi(y'){}\]
\[{}+ \sum_{i=1}^m \frac{\theta_i}{1+\sum_{i=1}^m \theta_i} \varphi(y - v_i),{}\]
which means
\[{}- \sum_{i=1}^m \theta_i (\varphi(y-v_i) - \varphi(y)) {}\]
\[{}\leq \varphi(y') - \varphi(y).{}\]
In particular, \(\varphi(y') - \varphi(y) \geq - K_y \sum_{i=1}^m |\theta_i|\), which completes the proposition when combined with the upper bound for \(\varphi(y') - \varphi(y)\) which has already been proved.

1. Regularity Properties

Some important regularity properties of convex functions hold on \({\mathbb R}^d\) exactly as was the case in one dimension.
Corollary (Continuity on Open Convex Domains)
If \(\varphi\) is a convex function on an open convex domain \(U\), then \(\varphi\) is continuous on \(U\).
Proof
Fix \(y \in U\) and let \(r\) be a positive constant such that the \(\ell^1\) ball of radius \(r\) centered at \(y\) is contained in \(U\). In the proposition above, let \(v_1,\ldots,v_m\) equal \(r e_1/2,\ldots,r e_d/2\), where \(e_1,\ldots,e_d\) are the standard basis vectors. Then any \(y'\) which has \(||y'-y||_{\ell^1} \leq r/2\) will have the property that \(y' = y + \sum_{i=1}^d \theta_i (r e_i/2)\) for coefficients \(\theta_i\) satisfying \(\sum_{i=1}^d |\theta_i| = 2 r^{-1} ||y'-y||_{\ell^1} \leq 1\). Thus
\[ |\varphi(y') - \varphi(y)| \leq 2 K_y r^{-1} ||y'-y||_{\ell^1}\]
for all \(y'\) in the closure of the \(\ell^1\) ball of radius \(r/2\) centered at \(y\). This inequality implies that \(\varphi\) must be continuous at \(y\).
Corollary (Lipschitz on Compact Subsets of Convex Open Domains)
If \(\varphi\) is a convex function on an open convex domain \(U\), then \(\varphi\) is Lipschitz on every compact set \(E \subset U\). More specifically, there is a finite constant \(C\) such that
\[ |\varphi(y') - \varphi(y')| \leq C ||y'-y||_{\ell^1}\]
for all \(y',y \in E\).
Proof
Because \(E\) is compact, there is a positive \(r\) such that \(||y' - y||_{\ell^1} \geq r\) for all \(y' \in U^c\) and \(y \in E\). The constant \(K_y\) is clearly a continuous function of \(y\) on an open \(r/2\)-neighborhood of \(E\) because \(\varphi\) is continuous, and so by the Extreme Value Theorem, it attains a maximum \(K\). In particular, just as in the previous corollary, \(|\varphi(y') - \varphi(y)| \leq 2 r^{-1} K ||y' - y||_{\ell^1}\) for all \(y',y \in E\) at distance at most \(r/2\) from each other. If \(y',y \in E\) are any two points at distance more than \(r/2\), then
\[{}|\varphi(y') - \varphi(y)|{}\]
\[{}\leq 2 \max_{x \in E} |\varphi(x)|{}\]
\[{}\cdot 2 r^{-1} ||y' - y||_{\ell^1}{}\]
simply because \(|\varphi(y') - \varphi(y)| \leq 2 \max_{x \in E} |\varphi(x)|\) and \(1 \leq 2 r^{-1} ||y' - y||_{\ell^1}\). Thus \(\varphi\) must be Lipschitz on \(E\) with a constant equal to the maximum of \(2 r^{-1} K\) and \(4 r^{-1} \max_{x \in E} |\varphi(x)|\).

2. Relationship to Linear Functions

A fundamentally-important feature of convex functions is that on any open domain, one can always bound the convex function below by an (affine) linear function which agrees with it at any preselected point:
Theorem (Convex Functions As Sup of Linear Functions)
Suppose that \(U \subset {\mathbb R}^d\) is an open convex set and \(\varphi\) is a convex function on \(U\). Then for every \(x_0 \in U\), there is a constant \(c_{x_0}\) and a vector \(v_{x_0}\) such that
\[ \varphi(x) \geq c_{x_0} + v_{x_0} \cdot x \]
for all \(x \in U\), with equality when \(x = x_0\). Here \(\cdot\) is the usual dot product on \({\mathbb R}^d\). In other words, there exists a set \(L\) of affine linear functions \(\ell\) such that \(\varphi(x) = \sup_{\ell \in L} \ell(x)\) for all \(x \in U\) and the supremum is attained at every \(x \in U\).
Proof
This is an important application of the Separating Hyperplane Theorem. Consider the set \(G_+ \subset {\mathbb R}^{d+1}\) given by
\[ G_{+} := \left\{(x,y) \in U \times {\mathbb R} \ : \ y \geq \varphi(x) \right\}. \]
This set is convex because any \(\theta \in [0,1]\) and any points \((x_1,y_1)\), \((x_2,y_2)\) in this region, \(\theta (x_1,y_1) + (1-\theta) (x_2,y_2)\) is the point \((\theta x_1 + (1-\theta)x_2,\theta y_1 + (1-\theta) y_2)\) and so
\[{}\varphi(\theta x_1 + (1-\theta) x_2) {}\]
\[{}\leq \theta \varphi(x_1) + (1-\theta) \varphi(x_2){}\]
\[{}\leq \theta y_1 + (1-\theta)y_2.{}\]
(Convexity of the domain \(U\) ensures that \(\theta x_1 + (1-\theta)x_2\) is itself a point in \(E\).)

Now consider the closure \(\overline{G_+}\) of \(G_+\). The closure must also be convex, because any two points \((x_0,y_0)\) and \((x_1,y_1)\) must be limits of convergent sequences in \(G_+\), and convexity of \(G_+\) establishes that \(\theta (x_0,y_0) + (1-\theta) (x_1,y_1)\) can (in the natural way) also be written as a limit of a convergent sequence in \(G_+\).

Now for any point \(x_0 \in U\), the point \((x_0,\varphi(x_0))\) belongs to the boundary of the convex set \(\overline{G_+}\) because \((x_0,\varphi(x_0)+\delta) \in G_+\) when \(\delta > 0\) and belongs to \(G_+^c\) when \(\delta < 0\). Therefore, there must be some vector nonzero \(\ell_+ := (\ell_1,\ldots,\ell_{d+1})\) and some constant \(c\) such that \(\ell_+ \cdot (x,y) \geq c\) for all \((x,y) \in \overline{G_+}\) with equality at \((x_0, \varphi(x_0))\). Defining \(\ell \in {\mathbb R}^d\) to equal \((\ell_1,\ldots,\ell_d)\) (i.e., the first \(d\) coordinates of \(\ell_+\)), it follows that \(\ell \cdot x + \ell_{d+1} (\delta + \varphi(x)) \geq c\) for all \(x \in U\) and all \(\delta > 0\), with equality when \(\delta = 0\) and \(x = x_0\). The constant \(\ell_{d+1}\) must be nonnegative (since sending \(\delta\) to infinity would otherwise cause a contradiction); if it happens to be strictly positive, then
\[ \varphi(x) \geq \ell_{d+1}^{-1} c - \ell_{d+1}^{-1} \ell \cdot x \]
for all \(x \in U\) with equality when \(x = x_0\). Fixing \(c_{x_0} := \ell_{d+1}^{-1} c\) and \(v_{x_0} := - \ell_{d+1}^{-1} \ell\) establishes the theorem.

If, in the above calculation, \(\ell_{d+1} = 0\), then it must follow that \(\ell \cdot x \geq c\) for all \(x \in U\) with equality when \(x = x_0\). Since \(\ell_+\) is nonzero, \(\ell\) in this case must also be nonzero. But then \(\ell \cdot x\) cannot possibly be greater than \(c\) on a neighborhood of \(x_0\) because it's equal to \(c\) at \(x=x_0\) and there must be some small vector \(v\) such that \(x + v \in U\) and \(\ell \cdot v < 0\). Thus the theorem is proved.