1. Fundamental Theorem of Calculus

With all of the machinery built up to this point, the “more famous” part of the Fundamental Theorem of Calculus is not terribly difficult to prove. The key message is that integration and differentation are, in some very literal sense, inverse operations.
Theorem (Fundamental Theorem of Calculus, Part 1)
Suppose \(f : [a,b] \rightarrow {\mathbb R}\) is continuous on \([a,b]\), differentiable on \((a,b)\), and \(f'\) extends to a Riemann integrable function on \([a,b]\). Then
\[ \int_a^b f'(t) dt = f(b) - f(a). \]
Proof
The trick is to use the Mean Value Theorem to show that a telescoping series of values of \(f\) must be sandwiched between any lower and upper sum of a given partition \(P\) of \([a,b]\).

Let \(P\) be a partition of \([a,b]\); for convenience, write it as
\[ \{[a,x_1],\ldots,[x_{N-1},b] \}\]
for \(a < x_1 < \cdots < x_{N-1} < b\). For even more convenience, let \(x_0 := a\) and \(x_N := b\). Now for any \(i \in \{1,\ldots,N\}\), the Mean Value Theorem implies that there is some \(\xi_i \in (x_{i-1},x_i)\) such that
\[ f(x_i) - f(x_{i-1}) = f'(\xi_i) (x_i - x_{i-1}). \]
Now trivially
\[ \inf_{x \in [x_{i-1},x_i]} f'(x) \leq f'(\xi_i) \leq \sup_{x \in [x_{i-1},x_i]} f'(x), \]
so multiplying by the length of the interval \([x_{i-1},x_i]\) and summing over \(i = 1,\ldots,N\) gives that
\[{}L(f',P) \leq \sum_{i=1}^N (f(x_i) - f(x_{i-1})){}\]
\[{}\leq U(f',P).{}\]
The inner sum telescopes and equals \(f(x_N) - f(x_0) = f(b) - f(a)\). Therefore every lower sum is bounded above by \(f(b) - f(a)\) and every upper sum is bounded below by \(f(b) - f(a)\). This implies
\[ (L) \int_a^b f'(t) dt \leq f(b) - f(a) \]
and
\[ (U) \int_a^b f'(t) dt \geq f(b) - f(a). \]
Since we have assumed that \(f'\) is Riemann integrable, upper integral and lower integral are themselves equal and therefore must equal \(f(b) - f(a)\).

The second half of the Fundamental Theorem of Calculus reinforces strength of the assertion that differentiation and integration are inverse operations by switching the order relative to part 1. Although perhaps less frequently-used, this second part is also very important.
Theorem (FTC, Part 2)
Suppose that \(f : [a,b] \rightarrow {\mathbb R}\) is continuous. Then for \(x \in (a,b)\),
\[ \frac{d}{dx} \int^x_a f(t) dt = f(x). \]
Proof
We already know that continuous functions are integrable, so the function \(g(x)\) given by
\[ g(x) := \int_a^x f(t) dt \]
is well-defined for all \(x \in (a,b)\). Furthermore, we know that when \(y < y'\) are points in \((a,b)\),
\[ \int_a^{y'} f(t) dt = \int_a^{y} f(t) dt + \int_y^{y'} f(t) dt. \]
Fixing \(y' := x+h\) and \(y := x\) for \(h > 0\) and alternately fixing \(y' := x\) and \(y := x+h\) when \(h < 0\) gives
\[ g(x+h) - g(x) = \int_x^{x+h} f(t) dt \]
when \(h > 0\) and
\[ g(x+h) - g(x) = - \int_{x+h}^x f(t) dt \]
when \(h < 0\).

(Note that we have not technically defined the Riemann integral in the case when the bounds appear in the wrong order, which is why we don't write \(\int_{x}^{x+h}\) in the case \(h < 0\).) For any \(\epsilon > 0\), continuity of \(f\) at \(x\) implies the existence of some \(\delta > 0\) such that \(|f(t) - f(x) | < \epsilon\) for all \(t \in (a,b)\) with \(|t-x| < \delta\). If we suppose that \(|h| < \delta\), it follows that
\[ f(x) - \epsilon < f(t) < f(x) + \epsilon\]
for all \(t\) between \(x\) and \(x+h\), from which it follows that
\[{}h(f(x) - \epsilon){}\]
\[{}\leq g(x+h) - g(x){}\]
\[{}\leq h (f(x) + \epsilon){}\]
when \(h > 0\) and
\[{}|h|(-f(x) - \epsilon){}\]
\[{}\leq g(x+h) - g(x){}\]
\[{}\leq |h| (-f(x) + \epsilon){}\]
when \(h < 0\). Combining these observations gives
\[ f(x) - \epsilon \leq \frac{g(x+h) - g(x)}{h} \leq f(x) + \epsilon\]
when \(0 < |h| < \delta\); therefore
\[ \left| \frac{g(x+h) - g(x)}{h} - f(x) \right| \leq \epsilon. \]
Modulo the non-strict inequality (which is easily fixed by simply replacing \(\epsilon\) by \(\epsilon/2\), for example), this exactly implies that \(g\) is differentiable at \(x\) and has derivative \(f(x)\) there.
Exercise
Use the Fundamental Theorem of Calculus to prove that when \(f\) and \(g\) are continuous on \([a,b]\), differentiable on \((a,b)\), and their derivatives extend to continuous functions on \([a,b]\), one has the integration-by-parts formula
\[{}\int_a^b f'(t) g(t) dt{}\]
\[{}= f(t) g(t) \Big|_{t=a}^b - \int_a^b f(t) g'(t) dt.{}\]

2. Substitution

Definition
Suppose \(\varphi : [a,b] \rightarrow {\mathbb R}\) is continuous on \([a,b]\), differentiable on \((a,b)\), and \(\varphi'\) extends to be Riemann integrable on \([a,b]\). If \(\varphi'\) is nonvanishing on \((a,b)\), we say that \(\varphi\) is a valid substitution.

In the following, we let \(\chi_E\) denote the indicator function of a set \(E\) (i.e., \(\chi_E(x) = 1\) at all points \(x \in E\) and \(\chi_{E}(x) = 0\) when \(x \not \in E\)).
Corollary
Suppose \(\varphi\) is a valid substitution. Then for any interval \([u_1,u_2] \subset \varphi([a,b])\),
\[{}\int_a^b \chi_{[u_1,u_2]}(\varphi(t)) |\varphi'(t)| dt{}\]
\[{}= \int \chi_{[u_1,u_2]}(u) du.{}\]
Proof
Suppose \(\varphi' > 0\) on \((a,b)\). Then \(\chi_{[u_1,u_2]}(\varphi(t)) = 1\) exactly when \(\varphi(t) \in [u_1,u_2]\). Because \(\varphi\) is increasing, we know that \(\varphi(t) \in [u_1,u_2]\) whenever \(t \in [\varphi^{-1}(u_1),\varphi^{-1}(u_2)]\). In this case, the left-hand side of our desired identity is simply
\[{}\int_{\varphi^{-1}(u_1)}^{\varphi^{-1}(u_2)} \varphi'(t) dt{}\]
\[{}= \varphi(\varphi^{-1}(u_2)) - \varphi(\varphi^{-1}(u_1)){}\]
by the Fundamental Theorem of Calculus and the right-hand side is
\[ \int_{u_1}^{u_2} du = u_2 - u_1. \]
Thus both sides are clearly equal.

If instead \(\varphi' < 0\) everywhere, then \(\chi_{[u_1,u_2]}(\varphi(t)) = \chi_{[\varphi^{-1}(u_2),\varphi^{-1}(u_1)]}(t)\) and \(|\varphi'(t)| = -\varphi'(t)\). An argument analogous to the one just given establishes the desired conclusion.

As a corollary to the corollary, if \(f\) is a finite linear combination of indicator functions of intervals in \(\varphi([a,b]) =: [c,d]\), then when \(\varphi\) is a valid substitution, \(f(\varphi(t))\) will also be a finite linear combination of indicator functions of intervals and
\[ \int_a^b f(\varphi(t)) |\varphi'(t)| dt = \int_{c}^d f(u) du. \]
To make life a bit easier, if one defines
\[ \int_{c}^d f(u) du = - \int_d^c f(u) du \]
when \(c > d\), we can say that
\[ \int_a^b f(\varphi(t)) \varphi'(t) dt = \int_{\varphi(a)}^{\varphi(b)} f(u) du. \]
Note also that changing the value of \(f\) at finitely many points preserves Riemann integrability of both sides and also preserves the equality (since any Riemann integrable function can be redefined at finitely many points without changing the integral).
Theorem (Substitution Formula)
Suppose \(\varphi : [a,b] \rightarrow {\mathbb R}\) is a valid substitution. Then for any Riemann integrable function \(f\) on \(\varphi([a,b])\), \(f(\varphi(t)) \varphi'(t)\) is Riemann integrable on \([a,b]\) and
\[ \int_a^b f(\varphi(t)) \varphi'(t) dt = \int_{\varphi(a)}^{\varphi(b)} f(u) du. \]
Proof
The proof is by an approximation argument. Let \(\varphi\) and \(f\) be as described. Suppose also for the moment that \(\varphi' > 0\). Given a partition \(P\) of \(\varphi([a,b])\), let \(f^+\) be a function which equals \(\sup_{x \in I} f(x)\) at each \(x \in I\) for each \(I \in P\) and let \(f^{-}\) be a function which equals \(\inf_{x \in I} f(x)\) at each \(x \in I\) for each \(I \in P\). There is technically some ambiguity in how \(f^+\) and \(f^-\) are defined at the endpoints of intervals (since such points belong to more than one \(I\)). For simplicity, define \(f^+(x)\) to be the maximum of \(\sup_{y \in I} f(y)\) for each \(I\) containing and likewise take \(f^{-}(x)\) to be the minimum of for each containing . Then \(f^-(x) \leq f(x) \leq f^+(x)\) for all \(x \in [a,b]\) and, aside from modification at finitely many points, \(f^{-}\) and \(f^+\) are linear combinations of characteristic functions of intervals. Thus
\[ \int_a^b f^-(\varphi(t)) \varphi'(t) dt = \int_{\varphi(a)}^{\varphi(b)} f^-(u) du \]
and
\[ \int_a^b f^+(\varphi(t)) \varphi'(t) dt = \int_{\varphi(a)}^{\varphi(b)} f^+(u) du. \]
By linearity of the integral, it must be the case that
\[{}\int_{\varphi(a)}^{\varphi(b)} f^-(u) du{}\]
\[{}= \sum_{I \in P} |I| \inf_{x \in I} f(x) = L(f,P){}\]
and likewise
\[ \int_{\varphi(a)}^{\varphi(b)} f^+(u) du = U(f,P). \]
Now because upper and lower integrals are monotone, we have
\[{}(U) \int_a^b f(\varphi(t)) |\varphi'(t)| dt {}\]
\[{}\leq (U) \int_a^b f^+(\varphi(t)) \varphi'(t) dt{}\]
\[{}= U(f,P){}\]
and
\[{}(L) \int_a^b f(\varphi(t)) |\varphi'(t)| dt{}\]
\[{}\geq (L) \int_a^b f^+(\varphi(t)) \varphi'(t) dt{}\]
\[{}= L(f,P){}\]
Because \(f\) is Riemann integrable, for any \(\epsilon\), there is some partition \(P\) making \(U(f,P) - L(f,P) < \epsilon\), so it must be the case that the upper and lower integrals of \(f(\varphi(t))|\varphi'(t)|\) on \([a,b]\) are in fact equal. Moreover, they must equal the integral of \(f\) on \(\varphi([a,b])\). If \(\varphi'\) is never negative, we are done. If not, \(|\varphi'(t)| = - \varphi'(t)\) and we use the definition of integration with reversed endpoints to finish the proof.
Exercise
Prove the key claim from the proof of substitution: If \(f\) is any Riemann integrable function on \([a,b]\) and \(g : [a,b] \rightarrow {\mathbb R}\) has the property that \(f(t) = g(t)\) for all but finitely many \(t \in [a,b]\), then \(g\) is Riemann integrable and has the same integral on \([a,b]\) as \(f\) itself.