The Inverse Function Theorem

This is a revised and (hopefully) better set of video and notes on the Inverse Function Theorem. You can see the original materials here.

Video. Inverse Functions

1. Inverse Functions and Continuity

An important basic result is that for real-valued functions of one variable, continuity of the function is sufficient to imply continuity of the inverse function:

Theorem

Suppose that \(I \subset {\mathbb R}\) is an interval and \(f : I \rightarrow {\mathbb R}\) is injective and continuous. Let \(U := f(I)\). Then the map \(f^{-1} : U \rightarrow I\) must also be continuous.

Proof

Suppose the contrary: there would exist some \(y_* \in U\), some \(\epsilon > 0\), and some sequence \(\{y_n\}_{n=1}^\infty\) in \(U\) such that \(|y_n - y_*| < \frac{1}{n}\) for each \(n\) but \(|f^{-1}(y_n) - f^{-1}(y_*)| > \epsilon\) for each \(n\). Passing to a subsequence \(\{y_{n_j}\}_{j=1}^\infty\) it may be assumed that either \(f^{-1}(y_{n_j}) > f^{-1}(y_*) + \epsilon\) for each \(j\) or that \(f^{-1}(y_{n_j}) < f^{-1}(y_*)\) for each \(j\). Fixing \(x_{j} := y_{n_j}\) for each \(j\) gives that \(f(x_{j}) \rightarrow y_*\) as \(j \rightarrow \infty\) and that \(x_j > f^{-1}(y_*) + \epsilon\) for all \(j\) (which will be called Case I) or that \(x_j < f^{-1}(y_*) - \epsilon\) for all \(j\) (Case II).

Now for each \(j\), when \(f\) is restricted to the closed interval joining \(x_* := f^{-1}(y_*)\) and \(x_j\), the values of \(f\) must lie between \(y_*\) and \(f(x_j) = y_{n_j}\). This is because if there is a point \(x\) strictly between \(x_*\) and \(x_j\) such that \(f(x) > \max\{y_*,y_{n_j}\}\), then the Intermediate Value Theorem applied to the interval joining \(x_*\) and \(x\) as well as the interval joining \(x\) and \(x_j\) would imply that \(f\) attains all values in \((\max\{y_*,y_{n_j}\},f(x))\) on both intervals and consequently isn't 1-1. Similarly if \(f(x) < \min \{y_*,y_{n_j}\}\) for some \(x\) between \(x_j\) and \(x_*\), then there would again be two distinct intervals on which \(f\) attains all values in \((f(x),\min\{y_*,y_{n_j}\})\).

Now in Case I, \(x_* + \epsilon\) always lies between \(x_*\) and \(x_j\). Thus \(f(x_* + \epsilon)\) lies between \(y_*\) and \(y_{n_j}\) for each \(j\). Since \(y_{n_j} \rightarrow y_*\) as \(j \rightarrow \infty\), this forces \(f(x_* + \epsilon) = y_* = f(x_*)\), which contradicts the assumption that \(f\) is \(1-1\).

In Case II, the equality \(f(x_*-\epsilon) = f(x_*)\) must hold because \(x_* - \epsilon\) always lies between \(x_*\) and \(x_j\) and so \(f(x_*-\epsilon)\) lies between \(f(x_*)\) and \(y_{n_j}\), which tends to \(f(x_*)\) as \(j \rightarrow \infty\).

In both cases, there is a contradiction. Thus the assumption of discontinuity cannot be true.

By itself, differentiability of \(f\) is not sufficient to imply differentiability of \(f^{-1}\): the classic example \(f(x) = x^3\) certainly maps the real line bijectively to itself and is differentiable, but its inverse, \(f^{-1}(y) = y^{1/3}\), is not differentiable at \(y = 0\).

2. Differentiability: Preliminary Observations

Proposition 1 (Nonvanishing Derivative Implies Increasing or Decreasing)

Suppose that \(f\) is some real-valued differentiable function on some open interval \(I\) which is differentiable at every point \(x_0 \in I\) and has \(f'(x_0) \neq 0\) for all \(x_0 \in I\). Then \(f\) must be either strictly increasing on \(I\) or strictly decreasing on \(I\). In either case, if \(U := f(I)\), then the inverse map \(f^{-1} : U \rightarrow I\) is well-defined.

Proof

This is a consequence of the Intermediate Value Property of Derivatives. Since \(f'(x_0)\) never vanishes, it must be either always positive or always negative. By the Mean Value Theorem, this implies that \(f\) is strictly increasing or strictly decreasing on \(I\), respectively. In either case, \(f\) is \(1-1\) and consequently \(f\) is a bijection between \(I\) and \(f(I)\).

Proposition 2 (Images of Neighborhoods of Differentiable Points Contain Suitable Intervals)

Suppose that \(f\) is a continuous, real-valued injective function on some open interval \(I\) containing a point \(x_0\) at which \(f\) is differentiable and \(f'(x_0) \neq 0\). Then for any \(e \in (0,|f'(x_0)|)\), there is some \(\eta > 0\) such that the interval \(N_\eta(x_0) := (x_0-\eta,x_0+\eta)\) is entirely contained in \(I\) and its image via \(f\) contains the interval \(N_{(|f'(x_0)| - e) \eta}(f(x_0))\), i.e., \(N_{(|f'(x_0)| - e) \eta}(f(x_0)) \subset f(N_\eta(x_0))\).

Proof

Let \(e \in (0,|f'(x_0)|)\) be fixed and choose \(\eta\) to be any sufficiently small positive number so that \(N_{2 \eta}(x_0) \subset I\) and that

\[ \left| \frac{f(x) - f(x_0)}{x-x_0} - f'(x_0) \right| < e \]

whenever \(x\) satisfies \(0 < |x - x'| < 2 \eta\). Multiplying the inequality by \(|x-x_0|\) gives that

\[{}|f(x) - f(x_0) - f'(x_0) (x - x_0)|{}\]

\[{}\leq e |x-x_0|{}\]

whenever \(|x - x_0| < 2 \eta\) with the case of equality holding only when \(x = x_0\). Consider what this inequality implies about the function \(f(x)\) evaluated at the points \(x_+ := x_0 + \eta\) and \(x_- := x_0 - \eta\)

If \(f'(x_0) > 0\), we have that
\[{}f(x_+){}\]
\[{}\geq f(x_0){}\]
\[{}+ f'(x_0) (x_+-x_0){}\]
\[{}- e |x_+-x_0|{}\]
\[{}= f(x_0) + (|f'(x_0)| - e)\eta,{}\]
\[{}f(x_-) {}\]
\[{}\leq f(x_0){}\]
\[{}+ f'(x_0) (x_{-} - x_0){}\]
\[{}+ e |x_--x_0|{}\]
\[{}= f(x_0) - (|f'(x_0)| - e)\eta.{}\]
If \(f'(x_0) < 0\), we have that
\[{}f(x_-){}\]
\[{}\geq f(x_0){}\]
\[{}+ f'(x_0) (x_--x_0){}\]
\[{}- e |x_--x_0|{}\]
\[{}= f(x_0) + (|f'(x_0)| - e)\eta,{}\]
\[{}f(x_+){}\]
\[{}\leq f(x_0){}\]
\[{}+ f'(x_0) (x_{+} - x_0){}\]
\[{}+ e |x_+-x_0|{}\]
\[{}= f(x_0) - (|f'(x_0)| - e)\eta.{}\]
In either case, one of \(f(x_+)\) and \(f(x_-)\) is greater than \(f(x_0) + (|f'(x_0)| - e) \eta\) and the other is less than \(f(x_0) - (|f'(x_0)|-e) \eta\). Since \(f\) is continuous on \([x_{-},x_+]\), the Intermediate Value Theorem guarantees that all intermediate values are attained. Specifically, this means that the image of \((x_-,x_+)\) has to contain \(N_{(|f'(x_0)| - e) \eta}(f(x_0))\).

Below is an image to illustrate the idea of the proof. The red curve is the graph of \(f(x)\). The point \(x_0\) is taken to be \(1\) in the picture, and just for convenience, \(f'(x_0) = 1\). The blue lines are the graph of

\[ f'(x_0) (x-x_0) + e |x-x_0| \]

and the green curve is the graph of

\[ f'(x_0) (x-x_0) - e |x-x_0| \]

for \(e = \frac{1}{2}\). The number \(\eta\) is the size of the interval on which the red curve is sandwiched between blue and green; in the picture, any positive \(\eta\) which is not quite as big as \(1\) could work. To the right of \(x_0\), the green curve has slope \(\frac{1}{2}\). Since this line has a “rise” of \(\frac{1}{2} \eta\) on the right, the graph of \(f\) must increase by at least \(\frac{1}{2} \eta\) on the right, because it lies above the green line and coincides with it at \(x_0\). Similarly the blue curve has slope \(\frac{1}{2}\) to the left of \(x_0\) and drops by exactly \(\frac{1}{2}\eta\) when moving distance \(\eta\) to the left of \(x_0\). Because the function \(f(x)\) lies below, it has to drop at least this same amount. But then the Intermediate Value Theorem says that any value between \(f(x_0) \pm \frac{1}{2} \eta\) must be attained by \(f\) somewhere on the interval \((x_0 - \eta,x_0 + \eta)\). Note that as \(e\) is chosen to be smaller and smaller, the aperture of the “wedge” formed by green and blue graphs becomes smaller.

Corollary

For \(f\) as above, let \(y_0 := f(x_0)\). For any \(e \in (0,|f'(x_0)|)\), there is some \(\delta > 0\) such that \(N_\delta(y_0) \subset f(I)\) and such that every \(y \in N_\delta(y_0) \setminus \{y_0\}\) satisfies

\[{} f'(x_0) - e{}\]

\[{}< \frac{y - y_0}{f^{-1}(y) - f^{-1}(y_0)}{}\]

\[{}< f'(x_0) + e.{}\]

Proof

Let \(\eta\) be as in Proposition 2 and let \(\delta := (|f'(x_0)| - e) \eta\). The proposition guarantees that \(N_\delta(y_0) \subset N_{\eta}(x_0)\). In other words, every \(y \in N_\delta(y_0)\) has \(f^{-1}(y) \in N_\eta(x_0)\). By reducing the size of \(\eta\) if necessary (note: based on the argument above, it so happens that it isn't), it may be assumed that

\[ \left| \frac{f(x) - f(x_0)}{x-x_0} - f'(x_0) \right| < e \]

for all \(x \in N_\delta(x_0) \setminus \{x_0\}\). In particular, this means that

\[{}f'(x_0) - e{}\]

\[{}< \frac{y - f(x_0)}{f^{-1}(y) - x_0}{}\]

\[{}< f'(x_0) + e{}\]

whenever \(y \in N_\delta(y_0) \setminus \{y_0\}\). Observing that \(y_0 = f(x_0)\) and \(x_0 = f^{-1}(y_0)\) finishes the corollary.

3. The Inverse Function Theorem in One Dimension

Theorem

Suppose that \(f\) is a differentiable function on some open interval \(I\) such that \(f'\) is never zero. Then \(f\) is a bijection from \(I\) to its image \(U := f(I)\). The set \(U\) is an open interval, \(f^{-1}\) is differentiable at every point \(y \in U\), and

\[ \frac{d}{dy} f^{-1}(y) = \frac{1}{f'(f^{-1}(y))}. \]

Proof

By Proposition 1, since \(f'\) is never zero, \(f\) is either strictly increasing or strictly decreasing and in particular \(f\) is a bijection between \(I\) and \(U\).

By Proposition 2, differentiability at every point \(x_0 \in I\) combined with \(f'(x_0) \neq 0\) implies (by choosing \(e := |f'(x_0)|/2\), for example), that there is some open interval centered at \(y_0 := f(x_0)\) which is contained in \(U\). Every point \(y \in U\) has the form \(f(x)\) for some \(x \in I\), and thus every point of \(U\) has a neighborhood which is also contained in \(U\). This means that \(U\) is open. The set \(U\) must be an interval by virtue of the Intermediate Value Theorem because \(f\) is differentiable and therefore continuous.

Finally, take \(y_0\) to be any point in \(U\) and let \(x_0 := f^{-1}(y_0)\). Now fix any \(\epsilon > 0\). Since \(f'(x_0) \neq 0\), there is always some \(e \in (0,|f'(x_0)|)\) such that both

\[ \frac{1}{f'(x_0)} - \epsilon < \frac{1}{f'(x_0) + e}\]

and

\[ \frac{1}{f'(x_0) - e} < \frac{1}{f'(x_0)} + \epsilon\]

hold. The reason is simply that

\[{}\lim_{e \rightarrow 0^+} \frac{1}{f'(x_0) + e}{}\]

\[{}= \lim_{e \rightarrow 0^+} \frac{1}{f'(x_0)-e}{}\]

\[{}= \frac{1}{f'(x_0)}{}\]

and so all sufficiently small values of \(e\) must force \((f'(x_0) \pm e)^{-1}\) to be within \(\epsilon\) of \((f'(x_0))^{-1}\). For this particular value of \(e\), there is some \(\delta\) from the Corollary such that \(y \in N_\delta(y_0) \setminus \{y_0\}\) guarantees

\[{} \frac{1}{f'(x_0)} - \epsilon{}\]

\[{}< \frac{1}{f'(x_0) + e}{}\]

\[{}< \frac{f^{-1}(y) - f^{-1}(y_0)}{y - y_0}{}\]

\[{}< \frac{1}{f'(x_0) - e} < \frac{1}{f'(x_0)} + \epsilon,{}\]

and consequently

\[ \left| \frac{f^{-1}(y) - f^{-1}(y_0)}{y - y_0} - \frac{1}{f'(x_0)} \right| < \epsilon. \]

Therefore \(f^{-1}\) must be differentiable at \(y_0\) and have derivative equal to \(1/f'(x_0) = 1 / f'(f^{-1}(y_0))\).

Exercise

Adapt the argument used to prove continuity of \(f^{-1}\) to show that when \(I\) is an open interval and \(f : I \rightarrow U \subset {\mathbb R}\) is a continuous bijection, \(f\) must be strictly monotone.

Hint

First observe that when \(p < q\) are two points in \(I\) and \(t \in (p,q)\), \(f(t)\) must lie strictly between \(f(p)\) and \(f(q)\). Then apply this result to show that for any points \(x_1 < \cdots < x_k\) in \(I\), \(f(x_i)\) must be strictly monotone as a function of the index \(i\). Lastly, show that if there are points \(x < y\) and \(x' < y'\) (all in \(I\)) such that \(f(x) < f(y)\) and \(f(x') > f(y')\), then putting \(x,x',y,y'\) in numerical order forces \(f(x), f(x'), f(y), f(y')\) not to be monotone in the way just established.