The Inverse Function Theorem

1. Statement and Proof Strategy

Theorem (Inverse Function Theorem)
Suppose that \(U\) and \(V\) are open sets in \({\mathbb R}^{n}\) and that \(f : U \rightarrow V\) is \(C^1\) (i.e., has continuous first derivatives). If \(Df\) is invertible (as an \(n \times n\) matrix) at a point \(p\), then there exist open sets \(p \in U' \subset U\) and \(f(p) \in V' \subset V\) and a \(C^1\) function \(f^{-1} : V' \rightarrow U'\) such that \(f \circ f^{-1}\) is the identity map on \(V'\). Moreover, \(Df\) is invertible at all points of \(U'\) and for any \(y \in V'\), \(D f^{-1} |_y = (D f|_{f^{-1}(y)})^{-1}\).
Meta (Strategy)
  1. First we show that \(f\) can be inverted in the set theoretic sense in some neighborhood of the point \(p\).
  2. Then we show that the inverse map \(f^{-1}\) must be continuous.
  3. Then we show that the inverse map \(f^{-1}\) must be differentiable and continuous.

1.1. Step 1: Contraction Mapping Principle

(Throughout the arguments below, we will take \(||\cdot||\) to be the \(\ell^2\) norm.)

To show that the equation \(f(x) = a\) has a solution \(x\) for a given right-hand side \(a\), we will reformulate the equation as a fixed point equation and use the Contraction Mapping Principle.
  • By differentiability of \(f\) at \(p\), we have
    \[{}f(x){}\]
    \[{}= f(p) + Df|_p (x-p){}\]
    \[{}+ R_p(x), {}\]
    (1)
    where we know that \(R_p(x) = o(||x-p||)\) as \(x \rightarrow p\).
  • Apply the matrix \((Df|_p)^{-1}\) to both sides and rearrange:
    \[ x = p + (D f|_p)^{-1} (f(x) - f(p) - R_p(x)). \]
  • If \(f(x) = a\), then \(x\) must be a fixed point of the form
    \[ x = p + (Df|_p)^{-1} (a - f(p) - R_p(x)). \]
  • Fix an \(a\) (presumably near \(f(p)\) in a manner to be determined), and let \(F_a : U \rightarrow {\mathbb R}^n\) be given by
    \[ F_a(x) = p + (Df|_p)^{-1} (a - f(p) - R_p(x)). \]
    We hope to find a fixed point \(x_* = F_a(x_*)\).
  • Use (1) to write \(R_p(x)\) as \(f(x) - f(p) - Df|_p(x-p)\). We have
    \[{}F_a(x){}\]
    \[{}= p + (Df|_p)^{-1} a{}\]
    \[{}+ (Df|_p)^{-1} \left( Df|_p(x-p) - f(x) \right).{}\]
  • Let \(C \subset U\) be an open convex set containing \(p\). Then for any \(v \in {\mathbb R}^n\) and any \(x,y \in C\), the Mean Value Theorem dictates that
    \[{}||F_a(x) - F_a(y)||{}\]
    \[{}\leq || DF_a|_{\xi} (x-y)|| {}\]
    (2)
    at some point \(\xi\) on the line segment connecting \(x\) and \(y\).
  • We can express \(DF_a\) in terms of \(Df\) using the Chain Rule plus the fact that the total derivative of a linear transformation \(T\) is just \(T\) itself:
    \[ DF_a |_{\xi} = (Df|_p)^{-1} ( Df|_p - Df|_\xi). \]
  • Go back to (2) using the computation of \(D F_a\):
    \[{}|| F_a(x) - F_a(y)||{}\]
    \[{}\leq \left| \left| (Df|_p)^{-1} ( Df|_p - Df|_\xi) (x-y) \right| \right|. {}\]
  • Matrix multiplication is bounded; using the quantitative version of that result gives
    \[{}|| F_a(x) - F_a(y) ||{}\]
    \[{}\leq \left| \left| \left| (Df|_p)^{-1} ( Df|_p - Df|_\xi) \right| \right| \right| ||x-y||, {}\]
    where \(||| \cdot |||\) is the Hilbert-Schmidt norm (square root of the sum of squares of the entries of the matrix).
  • Continuity of first partials of \(f\) guarantees that \(||| (Df|_p)^{-1} (Df|_p - Df|_z)||| \rightarrow 0\) as \(z \rightarrow p\). Thus, there is a Euclidean ball \(B_\delta(p)\) such that this norm is less than or equal to \(1/2\) for all \(z\) in \(\overline{B_\delta(p)} \subset U\).
Meta (Summary of What We Have Proved So Far)
Given \(p\), there is a \(\delta > 0\) such that for each \(x, y \in \overline{B_\delta(p)}\),
\[ ||F_a(x) - F_a(y)|| \leq \frac{1}{2} ||x-y||. \]
This would imply \(F_a\) is a contraction on the complete metric space \(\overline{B_\delta(p)}\) if we knew that \(F_a\) maps this closed ball into itself.
  • Note that above, \(\delta\) does not depend on \(a\).
  • Also note that invertibility of \(Df|_x\) is equivalent to the condition \(\det Df|_x \neq 0\). Thus by continuity of the first partials of \(f\) and continuity of the determinant (it's a polynomial in entries of a matrix), we may also choose \(\delta\) sufficiently small that \(D f|_x\) is invertible for all \(x \in \overline{B_\delta(p)}\).
  • The only time \(a\) matters is when asking whether \(F_a\) maps \(\overline{B_\delta(x)}\) to itself; it's helpful to insist on slightly more: that \(\overline{B_\delta(x)}\) maps into the smaller set \(B_\delta(p)\).
  • We need \(||F_a(x) - p|| < \delta\) for all \(x \in \overline{B_\delta(p)}\). Since
    \[{}||F_a(x) - p||{}\]
    \[{}\leq ||F_a(p) - p||{}\]
    \[{}+ ||F_a(p) - F_a(x)|| {}\]
    by the triangle inequality, and since \(||F_a(p) - F_a(x)|| \leq ||p - x||/2 \leq \delta /2\) by the calculation from the previous slide, it suffices for us to find an open set of \(a\)'s containing \(f(p)\) such that \(||F_a(p) - p|| \leq \delta/2\).
  • By defintion of \(F_a\),
    \[ F_a(p) - p = (Df|_p)^{-1} ( a - f(p)). \]
    Fixing an \(\eta\) small enough that \(||| (Df|_p)^{-1}||| \eta < \delta / 2\) gives that for all \(a \in B_\eta(f(p))\),
    \[{}|| F_a(p) - p || {}\]
    \[{}\leq ||| (Df|_p)^{-1} ||| ||a - f(p)||{}\]
    \[{} = ||| (Df|_p)^{-1} ||| \eta < \delta /2.{}\]

1.2. Step 2: Lipschitz Continuity

By the Contraction Mapping Principle, \(F_a\) has a unique fixed point in \(\overline{B_\delta(p)}\) for any \(a \in B_\eta(f(p))\), i.e., \(\exists! \ x_*\) such that
\[{}x_* = p{}\]
\[{}+ (Df|_p)^{-1} (a + Df|_p(x_* - p) - f(x_*)) {}\]
(3)
By simplifying, (3) holds for \(x_*\) if and only if \(f(x_*) = a\). Because \(F_a(\overline{B_\delta(p)}) \subset B_\delta(p)\), we know that \(x_* \in B_\delta(p)\). Compare nearby values \(a,a' \in B_\eta(f(p))\):
\[ F_{a}(x) - F_{a'}(x) = (Df|_p)^{-1} (a - a'). \]
By continuity of linear transformations, we have that \(||F_{a}(x) - F_{a'}(x)|| < \epsilon\) provided that \(||a-a'||\) is sufficiently small. Thus by the stability corollary of the Contraction Mapping Principle, it must be the case that for \(a,a' \in B_\eta(f(p))\), the unique solutions \(x_*,x_*' \in B_\delta(p)\) to the equations \(f(x_*) = a\) and \(f(x_*') = a'\) must satisfy \(x_*' \rightarrow x_*\) as \(a' \rightarrow a\). In fact, our stability corollary of the Contraction Mapping Principle says that
\[{}||x_*' - x_*||{}\]
\[{}\leq C \sup_{x} || F_a(x) - F_{a'}(x)||{}\]
\[{}\leq C' ||a-a'||{}\]
for some constants \(C\) and \(C'\) not depending on \(a,a' \in B_\eta(f(p))\).
Meta (Summary of What We Have Proved So Far)
Given a \(C^1\) map \(f : U \rightarrow V\) and a point \(p \in U\) at which \(Df\) is invertible, there exist \(\delta,\eta > 0\) and open sets \(V' := B_\eta(f(p)) \subset V\) and \(U' := B_\delta(p) \subset U\) such that for all \(a \in V'\), the equation \(f(x) = a\) has a unique solution \(x \in U'\), and the corresponding map \(f^{-1} : V' \rightarrow U'\) must be Lipschitz continuous and \(Df|_x\) must be invertible for all \(x \in U'\).

1.3. Step 3: Regularity of the Inverse

To make life easier, let's replace \(U'\) by the set \(U' \cap f^{-1} (V')\), where by \(f^{-1}\) in this instance we mean inverse image. By continuity of \(f\), we know that \(U'\) is still open, and by what we've proved so far, we see that \(f\) must actually be a bijection from \(U'\) to \(V'\). Having a literal bijection will make technical issues a little easier.
  • Use what we know: for all \(x,y \in V'\),
    \[ x = f( f^{-1}(x)) \text{ and } y = f(f^{-1}(y)). \]
  • Thus
    \[ x - y = f(f^{-1}(x)) - f(f^{-1}(y)). \]
  • Moreover, differentiability of \(f\) at \(f^{-1}(y)\) gives that
    \[{}f(f^{-1}(x)) {}\]
    \[{}= f(f^{-1}(y)){}\]
    \[{}+ (Df|_{f^{-1}(y)})( f^{-1}(x) - f^{-1}(y)){}\]
    \[{}+ o (|| f^{-1}(x) - f^{-1}(y)||) {}\]
    as \(x \rightarrow y\) (because we know by continuity of \(f^{-1}\) that \(f^{-1}(x) \rightarrow f^{-1}(y)\) as well). Simplifying a little gives
    \[{}x{}\]
    \[{}= y{}\]
    \[{}+ (Df|_{f^{-1}(y)})( f^{-1}(x) - f^{-1}(y)){}\]
    \[{} + o (|| f^{-1}(x) - f^{-1}(y)||).{}\]
  • Move \(y\) to the left-hand side. We get
    \[{}(Df|_{f^{-1}(y)}) ( f^{-1}(x) - f^{-1}(y)){}\]
    \[{}= x-y + o (||f^{-1}(x) - f^{-1}(y)||){}\]
    as \(x \rightarrow y\).
  • We have arranged it so that \(Df|_y\) is always known to be invertible for any \(y \in U'\), so apply the inverse:
    \[{}f^{-1}(x) - f^{-1}(y){}\]
    \[{}= (Df|_{f^{-1}(y)})^{-1} (x-y){}\]
    \[{}+ o (||f^{-1}(x) - f^{-1}(y)||){}\]
    as \(x \rightarrow y.\) The little-\(o\) term is unchanged because it remains little-\(o\) of \(||f^{-1}(x) - f^{-1}(y)||\) after any fixed linear transformation is applied to it. From here, to show that \(f^{-1}\) is differentiable at \(y \in V'\) is just to study this formula.
  • We know that \(f^{-1}\) is Lipschitz continuous, i.e., that there exists \(C\) such that \(||f^{-1}(x) - f^{-1}(y)|| \leq C ||x-y||\) for any \(x,y \in U'\). Thus any function which is \(o(||f^{-1}(x) - f^{-1}(y)||)\) is also \(o(||x-y||)\).
  • Therefore
    \[{}f^{-1}(x) - f^{-1}(y){}\]
    \[{}= (Df|_{f^{-1}(y)})^{-1} (x-y){}\]
    \[{}+ o (||x - y||){}\]
    as \(x \rightarrow y\). This implies differentiability and by uniqueness of total derivatives, we also have \(Df^{-1}|_y = (Df|_{f^{-1}(y)})^{-1}\).
  • The entries of \((D f)^{-1}\) depend continuously on the entries of \(Df\) by virtue of Cramer's Rule for computing the inverse of a matrix.
  • \(f^{-1}\) is already established to be continuous.
  • \((Df|_{f^{-1}(y)})^{-1}\) is therefore a composition of continuous functions and therefore continuous.

2. Comments on the Theorem

  • Notice that, unlike the one-dimensional inverse function theorem, it is very much possible for \(D f\) to be everywhere invertible without \(f\) being injective (see the figure below). This is why the statement of the Inverse Function Theorem is more subtle in higher dimensions, relying on the existence of some smaller domain \(U'\) on which \(f\) is in fact injective. The sets \(U'\) and \(V'\) are far from unique (for example, it is possible to take \(U'\) to be any open set containing \(p\) whose image is simply connected).
    Figure. Illustration of a non-injective function with everywhere invertible Jacobian
  • If the function \(f\) is known to be \(C^k\) for some \(k > 1\), then the inverse map \(f^{-1}\) will also be \(C^k\). This follows by induction on \(k\) and the Cramer's Rule argument indicated above. The base case \(k=1\) follows directly from the Inverse Function Theorem itself. For higher \(k\), we write the entries of \(D f^{-1}\) as compositions of the entries of \((D f)^{-1}\) with \(f^{-1}\). The entries of \((D f)^{-1}\) are rational functions of the entries of \(D f\), which are by assumption \(C^{k-1}\) functions. By induction, we also know that \(f^{-1}\) is \(C^{k-1}\), so the entries of \(D f^{-1}\) are compositions of \(C^{k-1}\) functions, making them \(C^{k-1}\). (The proof that compositions of \(C^{k-1}\) functions are \(C^{k-1}\) functions is its own induction argument.) Since the entries of \(D f^{-1}\) are first partials of \(f^{-1}\), \(f^{-1}\) must be \(C^k\).