Change of Variables Formula

1. The Setup and Statement of the Theorem

Throughout these notes, \(U\) and \(V\) will be open sets in \({\mathbb R}^n\) and \(\Phi : U \rightarrow V\) will be a \(C^1\) bijection from \(U\) to \(V\). Furthermore, \(D_x \Phi\) will be assumed to be invertible at every point of \(U\). We use the notation \(\Delta_\Phi(x)\) to denote the determinant of \(D_x \Phi\) at \(x \in U\).

We say that a Jordan region \(E \subset U\) is a safe Jordan region in \(U\) when its closure is contained in \(U\) as well. From the notes on Jordan regions, for any safe Jordan region \(E \subset U\), its image via \(\Phi\), which we denote by \(\Phi(E)\), will be a Jordan region in \(V\). Because \(\overline{E}\) is itself a safe Jordan region in \(U\), \(\Phi(\overline{E})\) will be a Jordan region in \(V\) as well. Since \(\overline{E}\) is compact (Jordan regions must be bounded), \(\Phi(\overline{E})\) is compact, as it is a continuous image of a compact set. Thus the closure of \(\Phi(E)\) must be contained in \(\Phi(\overline{E})\), which is contained in \(V\). Therefore \(\Phi(E)\) will also be a safe Jordan region in \(V\). By the Inverse Function Theorem, \(\Phi^{-1}\) will be a \(C^1\) bijection from \(V\) to \(U\) with invertible total derivative at every point, and so it is also true that if \(F \subset V\) is a safe Jordan region, then \(\Phi^{-1}(F)\) is a safe Jordan region in \(U\).

The goal is the following:
Theorem (Change of Variables Formula)
Suppose \(U\) and \(V\) are open subsets of \({\mathbb R}^n\) and that \(\Phi : U \rightarrow V\) is a \(C^1\) bijection with the property that \(\Delta_\Phi(x) \neq 0\) for all \(x \in U\). If \(E\) is a safe Jordan region in \(V\) and if \(f\) is Riemann integrable on \(E\), then \(f \circ \Phi\) is Riemann integrable on \(\Phi^{-1}(E)\) and
\[ \int_{E} f(y) dy = \int_{\Phi^{-1}(E)} f (\Phi(x)) |\Delta_\Phi(x)| dx.\]

2. Initial Consideration

The following result is used many times during the proof:
Proposition
Suppose that \(E_1,\ldots,E_N\) are Jordan regions in \({\mathbb R}^n\) such that \(E_i \cap E_j\) has Jordan content zero when \(i \neq j\). Then
\[ \int_{\bigcup_{i=1}^N E_j} f dx = \sum_{j=1}^N \int_{E_j} f dx \]
for any \(f\) which is Riemann-integrable on \(\bigcup_{j=1}^N E_j\). In particular, if \(E \subset \bigcup_{j=1}^N E_j\) is a Jordan region, then
\[ \operatorname{Vol}\left(E\right) = \sum_{j=1}^N \operatorname{Vol}\left(E \cap E_j\right). \]

3. The Plan of Approach

The proof is long and technical. It's easiest to accomplish by breaking into a series of smaller steps which increasingly refine strength of the result.
Point (Why the proof is organized this way)
The proof below is not the shortest or most efficient proof known to exist. In particular, there exist relatively short proofs that cut straight to Claim 5 by an argument involving the Implicit Function Theorem and induction on dimension. The present proof has been chosen because it highlights more hands-on, robust techniques that we've already used many times now. Proving theorems as sophisticated as this one can be accomplished systematically with the right perspective and doesn't require special degrees of cleverness or insight.
Claim (1. How Linear Transformations Affect Volume)
For any convex Jordan region \(E \subset {\mathbb R}^n\), if we regard its points as vectors and apply an invertible matrix \(M \in {\mathbb R}^{n \times n}\), the set \(M E\) is a convex Jordan region and
\[ \operatorname{Vol}\left(ME\right) = |\det M| \operatorname{Vol}\left(E\right).\]
In other words, applying a linear transformation \(M\) to a convex Jordan region yields another region of the same sort whose volume is simply the volume of the initial region times the absolute value of the determinant of \(M\).
Proof
Meta (Steps)
  1. Establish that \(ME\) is a convex Jordan region when \(E\) is and \(M\) is invertible.
  2. Establish the volume identity when \(M\) has key elementary forms:
    • When \(M\) is diagonal (so \(M\) stretches/shrinks/reflects along coordinate directions); this is easy because \(M\) sends boxes to boxes
    • When \(M\) swaps coordinate directions (again, boxes to boxes is helpful)
    • When \(M\) is a shear transformation; here use Fubini's Theorem
  3. Use the fact that via row reduction, every matrix can be decomposed into elementary matrices and then apply the theorem repeatedly for each factor.
Claim (2. How Approximately Linear Maps Affect Volumes of Small Cubes)
For each \(x \in {\mathbb R}^n\) and each \(r > 0\), let \(x + Q_r\) be the cube centered at \(x\) with side length \(r\). For any \(\epsilon > 0\) and any compact set \(K \subset U\), there is an \(r_0 > 0\) such that for all \(x \in K\) and all \(r \in (0,r_0)\),
\[{}(1-\epsilon)^n |\Delta_\Phi(x)| \operatorname{Vol}\left(x + Q_r\right){}\]
\[{}\leq \operatorname{Vol}\left(\Phi(x + Q_r)\right){}\]
\[{}\leq (1 + \epsilon)^n |\Delta_\Phi(x) | \operatorname{Vol}\left(x + Q_r\right).{}\]
Proof
Meta (How to understand the claim)
The claim is that the volume of the images of small cubes centered on points \(x \in K\) is “almost” equal to the volume of the cube times the absolute value of the determinant of \(D_x \Phi\). Here “almost” means the ratio of the two quantities is as close to \(1\) as desired by appropriate choice of \(\epsilon\).
  1. It suffices to show that under the hypotheses of the claim, \(\Phi(x+ Q_r)\) contains a subset of volume at least \((1-\epsilon)^n |\Delta_\Phi(x)| \operatorname{Vol}\left(x + Q_r\right)\) and is contained in a set of volume at most \((1 + \epsilon)^n |\Delta_\Phi(x) | \operatorname{Vol}\left(x + Q_r\right)\). We will show
    \[{}\Phi(x) + D_x \Phi(Q_{(1-\epsilon)r}){}\]
    \[{}\subset \Phi(x+Q_r){}\]
    \[{}\subset \Phi(x) + D_x \Phi(Q_{(1+\epsilon)r}){}\]
    for all \(x \in K\) and all \(r < r_0\).
    Figure. \(C^1\) images can be bounded by linear map images
  2. Let \(||\cdot||\) denote the \(\ell^\infty\) norm on \({\mathbb R}^n\). Let \(r_1\) be half the distance from \(K\) to \(\partial U\). The value of \(r_0\) will never exceed \(r_1\). Let \(K_1\) be the compact set consisting of all points at most distance \(r_1\) to the set \(K\).
  3. If \(x \in K\), \(r < r_1\), and \(y \in x + Q_r\), the line segment joining \(x\) and \(y\) is contained in \(x + Q_r\) and in \(K_1 \subset U\). By the Mean Value Theorem, any convex function \(\varphi\) admits some \(\xi \in K_1\) such that
    \[ \varphi( \Phi(x) - \Phi(y)) \leq \varphi(D_\xi \Phi(x-y)). \]
    We can specifically choose \(\varphi(z) := || -x + y + (D_x \Phi)^{-1} z||_{\infty}\). It follows that
    \[{}|| - x + y + (D_x \Phi)^{-1}(\Phi(x) - \Phi(y))||{}\]
    \[{}\leq || (-I + (D_x \Phi)^{-1} D_\xi \Phi)(x-y)||.{}\]
    Since \(-I + (D_x \Phi)^{-1} D_\xi \Phi\) has continuous entries in \(x\) and \(\xi\) which vanish when \(\xi = x\), uniform continuity on compact sets implies the existence of \(r_2 < r_1\) such that \(||| -I + (D_x \Phi)^{-1} D_\xi \Phi ||| < \epsilon\) whenever \(x \in K_1\) and \(||x - y|| < r_2/2\). This implies
    \[{}||(D_x \Phi)^{-1}(\Phi(x) - \Phi(y))||{}\]
    \[{}\leq (1 + \epsilon) ||x-y||{}\]
    meaning that \(\Phi(x + Q_r) \subset \Phi(x) + D_x \Phi(Q_{(1+\epsilon)r})\) for all \(x \in K\) and all \(r < r_2\).
  4. As \(\Phi(K_1)\) is compact and contained in \(V\) and because \(D_x \Phi\) is continuous as a function of \(x\), there exists some \(r_3 > 0\) such that \(\Phi(x) + D_x \Phi(Q_{r}) \subset V\) when \(x \in K_1\) and \(r < r_3\). Let \(z = \Phi(x) + D_x \Phi(u)\) for some \(u \in Q_{(1-\epsilon)r}\); for any convex \(\varphi\),
    \[{}\varphi( \Phi^{-1}(z) - \Phi^{-1}(\Phi(x))){}\]
    \[{}\leq \varphi(D_\xi (\Phi^{-1})(z - \Phi(x))){}\]
    where \(\xi\) is some point in \(\Phi(x) + D_x \Phi(Q_{(1-\epsilon)r})\). This time let \(\varphi(w) := || w - u||_\infty\). This implies
    \[{}|| (\Phi^{-1}(z) - x) - u||{}\]
    \[{}\leq || ((D_{\Phi^{-1}(\xi)} \Phi)^{-1} D_x \Phi - I) u||.{}\]
    Once again, uniform continuity combined with the fact that the matrix vanishes when \(\xi = x\) implies the existence of some \(r_3\) such that \(|||((D_{\Phi^{-1}(\xi)} \Phi)^{-1} D_x \Phi - I)||| \leq \epsilon\) when \(x \in K_1\) and \(\xi \in \Phi(x) + D_x \Phi(Q_{(1-\epsilon)r})\) for any \(r < r_3\). The triangle inequality implies
    \[ || \Phi^{-1}(z) - x|| \leq (1 + \epsilon) ||u|| \]
    when \(r < \min \{r_1,r_2,r_3\} =: r_0\) and \(u \in Q_{(1-\epsilon)r}\) This means that \(\Phi^{-1}\) maps \(\Phi(x) + D_x \Phi(Q_{(1-\epsilon)r})\) into \(x + Q_{(1+\epsilon)(1-\epsilon)r}\), and since \((1+\epsilon)(1-\epsilon) = 1-\epsilon^2 < 1\), it follows that \(\Phi(x + Q_r)\) must contain \(\Phi(x) + D_x \Phi(Q_{(1-\epsilon)r})\).
Claim (3. Increasing Precision of the Previous Claim)
For any compact set \(K \subset U\), there is an \(r_0 > 0\) such that for any \(x \in K\) and all \(r \in (0,r_0)\), there is a point \(x' \in x + Q_r\) such that
\[{}\operatorname{Vol}\left(\Phi(x + Q_r)\right) = |\Delta_\Phi({x'})| \operatorname{Vol}\left(x + Q_r\right).{}\]
Proof
Meta (Key Idea)
Just take \(r_0\) to be less than half the distance from \(K\) to the boundary of \(U\). Let \(K_2\) be the closure of this set.

Now choose \(x \in K\) and \(r < r_0\). Now pick any \(\epsilon\). There is a threshold size at which we may apply the previous claim to sufficiently small boxes in \(K_2\).

Subdivide \(x + Q_r\) via partition \({\mathcal P}\) into sufficiently tiny cubes that the previous claim applies to each of them. Then
\[{}(1-\epsilon)^n \sum_{Q \in {\mathcal P}} |\Delta_\Phi({z_Q})| \operatorname{Vol}\left(Q\right){}\]
\[{}\leq \operatorname{Vol}\left(\Phi(x + Q_r)\right){}\]
\[{}\leq (1+\epsilon)^n \sum_{Q \in {\mathcal P}} |\Delta_\Phi({z_Q})| \operatorname{Vol}\left(Q\right){}\]
where \({\mathcal P}\) is the partition of the original cube into smaller ones. The pointwise values of \(\Delta_\Phi\) are bounded below and above by the supremum and infimum of \(\Delta_\Phi\) on \(x + Q_r\), and the sum of \(\operatorname{Vol}\left(Q\right)\) is just the volume of the original cube. Now let \(\epsilon \rightarrow 0\) and use the fact that \(\Delta_\Phi\) is continuous.
Claim (4. Increasing Precision of Previous Claim and Removing Smallness Restriction)
For any (closed) cube \(Q \subset U\),
\[ \int_Q |\Delta_\Phi(x)| dx = \operatorname{Vol}\left(\Phi(Q)\right). \]
Proof
Meta (Key Idea)
Let \(K\) in the previous claim be the cube \(Q\) itself. Take
a partition \(\mathcal P\) of \(Q\) into subcubes so small that the previous claim applies;
then the previous claim implies that \(\operatorname{Vol}\left(\Phi(Q)\right)\) must be sandwiched between the upper and lower sums of the integral \(\int_Q |\Delta_\Phi(x)| dx\). Letting the diameter of cubes in the partition tend to zero finishes the proof.
Claim (5. Moving from Cubes to Boxes)
For any (closed) box \(R \subset U\),
\[ \int_R |\Delta_\Phi(x)| dx = \operatorname{Vol}\left(\Phi(R)\right). \]
Proof
Meta (Key Idea)
Any box which can be written as a nonoverlapping union of cubes has this property immediately, as both sides of the equality can be expanded into sums over the individual cubes in the union. Boxes with rational side lengths fall in this category, for example. But if \(R_1 \subset R \subset R_2\) and both \(R_1\) and \(R_2\) have only rational side lengths, then
\[{}\int_{R_1} |\Delta_\Phi(x)|dx \leq \operatorname{Vol}\left(\Phi(R)\right){}\]
\[{}\leq \int_{R_2} |\Delta_\Phi(x)| dx.{}\]
If \(|\Delta_\Phi(x)| \leq C\) on \(R_2\), then
\[{}\left| \int_{R} |\Delta_\Phi(x)|dx - \operatorname{Vol}\left(\Phi(R)\right) \right|{}\]
\[{}\leq C \max \{\operatorname{Vol}\left(R \setminus R_1\right),{}\]
\[{}\operatorname{Vol}\left(R_2 \setminus R\right)\} \}.{}\]
Because the right-hand side can be made as small as desired by appropriate choice of \(R_1\) and \(R_2\), the claim must hold.
Claim (6. General Volume Statement for Jordan Regions and Nonlinear Maps)
For any safe Jordan region \(E \subset U\),
\[ \operatorname{Vol}\left(\Phi(E)\right) = \int_E |\Delta_\Phi(x)| dx. \]
Proof
Meta (Key Idea)
We know that \(\chi_E |\Delta_\Phi|\) is Riemann integrable. We also know that for any partition \(\mathcal P\) into approximate cubes of a box containing \(E\) (so fine that boxes which intersect the closure of \(E\) are bounded away from \(\partial U\)),
\[{}\operatorname{Vol}\left(\Phi(E)\right) = \sum_{Q \in {\mathcal P}} \operatorname{Vol}\left(\Phi(E \cap Q)\right){}\]
because the sets \(E \cap Q\) for different \(Q \in {\mathcal P}\) have intersections with Jordan content zero.
Figure. Approximating a set E (purple) by a union of approximate cubes \(Q\) not intersecting its boundary (shown in orange). The cubes \(Q\) are taken from a fine partition of some box \(R\) containing \(E\).

If \(Q \cap \partial E\) is empty, then \(Q \cap E\) is either all of \(Q\) or is also empty, so when it's nonempty, it's simply an approximate cube to which the previous claim applies:
\[{}\operatorname{Vol}\left(\Phi(E)\right){}\]
\[{}= \sum_{Q \in {\mathcal P}} \operatorname{Vol}\left(\Phi(E \cap Q)\right){}\]
\[{}= \sum_{\substack{Q \in {\mathcal P}\\Q \cap \partial E = \emptyset}} \int_{Q \cap E} |\Delta_\Phi(x)| dx{}\]
\[{}+ \sum_{\substack{Q \in {\mathcal P}\\Q \cap \partial E \neq \emptyset}} \operatorname{Vol}\left(\Phi(E \cap Q)\right){}\]
\[{}= \sum_{Q \in {\mathcal P}} \int_{Q \cap E} |\Delta_\Phi(x)| dx{}\]
\[{}+ \sum_{\substack{Q \in {\mathcal P}\\Q \cap \partial E \neq \emptyset}} \Bigg( \operatorname{Vol}\left(\Phi(E \cap Q)\right){}\]
\[{}- \int_{Q \cap E} |\Delta_\Phi(x)| dx \Bigg){}\]
\[{}= \int_E |\Delta_\Phi(x)| dx{}\]
\[{}+ \sum_{\substack{Q \in {\mathcal P}\\Q \cap \partial E \neq \emptyset}} \Bigg( \operatorname{Vol}\left(\Phi(E \cap Q)\right){}\]
\[{}- \int_{Q \cap E} |\Delta_\Phi(x)| dx \Bigg).{}\]
One need only show that the sum on the last line can be made as small as desired for appropriate choice of \(\mathcal P\) by comparing each term to \(\operatorname{Vol}\left(Q\right)\). Because \(Q\) is an approximate cube, we can show that \(\operatorname{Vol}\left(\Phi(E \cap Q)\right) \leq C_1 \operatorname{Vol}\left(Q\right)\) and likewise \(\int_{Q \cap E} |\Delta_\Phi(x)| dx \leq C_2 \operatorname{Vol}\left(Q\right)\). We also know that the sum of the volumes of these approximate cubes can be made as small as desired because \(\partial E\) has Jordan content zero.

4. Later Stages of Proof

Claim (7. From Jordan Regions to Integration)
Let \(R\) be a box contained entirely in \(V\) and suppose that \(f\) is a Riemann integrable function on \(R\). Then \(f (\Phi(x)) |\Delta_\Phi(x)|\) is integrable on \(\Phi^{-1}(R)\) and
\[ \int_R f(y) dy = \int_{\Phi^{-1}(R)} f(\Phi(x)) |\Delta_\Phi(x)| dx. \]
Proof
Let \({\mathcal P}\) be a partition of \(R\). For every \(R' \in {\mathcal P}\), use Claim 6 on the set \(E := \Phi^{-1}(R')\) to conclude that
\[ |R'| = \int_{\Phi^{-1}(R')} |\Delta_\Phi(x)| dx \]
for every \(R' \in \mathcal P\). We substitute this identity into the formula for the upper sum of \(f\) on \({\mathcal P}\):
\[{}\mathcal U(f,{\mathcal P}){}\]
\[{}= \sum_{R' \in {\mathcal P}} |R'| \sup_{y \in R'} f(y){}\]
\[{}= \sum_{R' \in {\mathcal P}} \int_{\Phi^{-1}(R')} |\Delta_\Phi(x)| \left( \sup_{y \in R'} f(y) \right) dx{}\]
\[{}= \int_{\Phi^{-1}(R)} |\Delta_\Phi(x)| {}\]
\[{}\cdot \left[ \sum_{R' \in {\mathcal P}} \chi_{\Phi^{-1}(R')}(x) \sup_{y \in R'} f(y) \right] dx{}\]
\[{}\geq(U) \int |\Delta_\Phi(x)| \chi_{\Phi^{-1}(R)} (x) f (\Phi(x)) dx.{}\]
Similarly, applying the previous claim to the image of the interior of \(R'\) via \(\Phi^{-1}\) gives that
\[{}\mathcal L(f,{\mathcal P}){}\]
\[{}= \sum_{R' \in {\mathcal P}} \int_{\Phi^{-1}({R'}^\circ)} \inf_{y \in R'} f(y) dx{}\]
\[{}= \sum_{R' \in {\mathcal P}} \int_{\Phi^{-1}({R'}^\circ)} |\Delta_\Phi(x)| \left( \inf_{y \in R'} f(y) \right) dx{}\]
\[{}= \int_{\Phi^{-1}(R)} |\Delta_\Phi(x)| {}\]
\[{}\cdot \left[ \sum_{R' \in {\mathcal P}} \chi_{\Phi^{-1}({R'}^\circ)}(x) \inf_{y \in R'} f(y) \right] dx{}\]
\[{}\leq (L) \int |\Delta_\Phi(x)| \chi_{\Phi^{-1}(R)} (x) f (\Phi(x)) dx.{}\]
Proof of the Theorem Let \(\mathcal P\) be a collection of nonoverlapping boxes covering \(E\) and entirely contained in \(V\); since \(E\) is a safe Jordan region, its distance to the boundary of \(V\) is strictly positive and consequently we can guarantee that every box in the covering of \(E\) is entirely contained in \(V\) if we simply subdivide the boxes so that the diameter is sufficiently small and then throw out any boxes not intersecting \(E\). Then by Claim 7, for every \(R' \in {\mathcal P}\), we have that \(f (\Phi(x)) \chi_E(\Phi(x)) |\Delta_\Phi(x)|\) is Riemann integrable on \(\Phi^{-1}(R')\) and
\[{}\int f(y) \chi_{E \cap R'}(y) dy = \int_{R'} f(y) \chi_{E}(y) dy{}\]
\[{}= \int_{\Phi^{-1}(R')} f(\Phi(y)) \chi_E(\Phi(x)) |\Delta_\Phi(x)| dx{}\]
\[{}= \int f(\Phi(x)) \chi_{\Phi^{-1}(E \cap R')}(x) |\Delta_\Phi (x)| dx.{}\]
Summing over \(R' \in {\mathcal P}\) gives that
\[{}\int f(y) \chi_{E}(y) dy{}\]
\[{}= \sum_{R' \in {\mathcal P}} \int f(y) \chi_{E \cap R'}(y) dy{}\]
\[{}= \sum_{R' \in {\mathcal P}}\int f(\Phi(x)) \chi_{\Phi^{-1}(E \cap R')}(x) dx{}\]
\[{}= \int \! f(\Phi(x)) |\Delta_\Phi(x)| {}\]
\[{}\cdot \sum_{R' \in {\mathcal P}} \chi_{\Phi^{-1}(E \cap R')}(x) dx{}\]
\[{}= \int_{\Phi^{-1}(E)} f(\Phi(x))|\Delta_\Phi(x)| dx{}\]
by virtue of the fact that all the boxes \(R'\) are nonoverlapping (and \(\Phi^{-1}\) preserves nonoverlapping-ness because it sends sets of content zero to sets of content zero).