ROBERT GHRIST

**CALCULUS BLUE GUIDE**

1^st^ edition, corrected 2025

Copyright © 2023-2025 Robert Ghrist

All rights reserved worldwide

Published by Agenbyte Press

Jenkintown PA, USA

ISBN 978-1-944655-07-5


*Robert Ghrist is the Andrea Mitchell University PIK Professor\
of Mathematics and Electrical & Systems Engineering\
at the University of Pennsylvania*

# CONTENTS

CONTENTS [3](#contents)

PREFACE [5](#preface)

VOLUME I : VECTORS & MATRICES [7](#volume-i-vectors-matrices)

Week 1 : Points & Vectors [9](#week-1-points-vectors)

Week 2 : Vector Calculus [19](#week-2-vector-calculus)

Week 3 : Matrix Algebra [31](#week-3-matrix-algebra)

Week 4 : Linear Transformations [45](#week-4-linear-transformations)

VOLUME II : DERIVATIVES [57](#p940y1volume-ii-derivatives)

Week 5 : The Derivative [59](#week-5-the-derivative)

Week 6 : Differentiation [71](#week-6-differentiation)

Week 7 : Approximation [83](#week-7-approximation)

Week 8 : Optimization [95](#week-8-optimization)

VOLUME III : INTEGRALS [107](#volume-iii-integrals)

Week 9 : Integrals & Averages [109](#week-9-integrals-averages)

Week 10 : Mass & Probability [119](#week-10-mass-probability)

Week 11 : Changing Coordinates [131](#week-11-changing-coordinates)

VOLUME IV : FIELDS [145](#volume-iv-fields)

Week 12 : Path Integrals [147](#week-12-path-integrals)

Week 13 : Differential Forms [159](#week-13-differential-forms)

Week 14 : The Fundamental Theorem
[173](#week-14-the-fundamental-theorem)

EPILOGUE : BEYOND CALCULUS [185](#epilogue-beyond-calculus)

# PREFACE

THIS TEXT is intended to be a guide for teaching from the Calculus BLUE
Project materials. It includes brief sketches of the lecture contents,
as well as notes for classroom discussions, sample assessment problems,
and answers, with hints at solutions.

This resource is for teaching a contemporary course in multivariable
calculus with applications over a 14-week semester. The prerequisites
are a solid single-variable calculus course (to be realized as Calculus
GREEN in the fullness of time). It is assumed that students use a
combination of video resources, in-class discussions, and practice
problems to learn and debug the material.

I was reluctant to produce any printed materials for the Calculus BLUE
Project: a fully video- and electronic-text resource package was the
goal *ab initio*. Feedback from faculty convinced me that a printed
guide was requisite. *Adsum.*

This guide, like the videotext, is quartered.

- VOLUME I : Vectors & Matrices

- VOLUME II : Derivatives

- VOLUME III : Integrals

- VOLUME IV : Fields

Each quarter is split into three or four weeks, and each week contains:

- Materials, referencing chapters of Calculus BLUE

- Outline of topics

- Learning objectives

- Primer of material summarizing the videotext

- Discussion questions for use in the classroom

- Sample assessment problems

- Answers and hints at solutions

*Nota bene*, though answers may be given, it is the process that
matters.

All content including artwork is by the author.

I am grateful to my students, both at Penn and elsewhere.

This book was written in difficult times. It is dedicated to E/L.

Robert Ghrist / prof-g

# 

# VOLUME I : VECTORS & MATRICES

## 

## Week 1 : Points & Vectors

### OUTLINE

MATERIALS: Calculus BLUE : Vol 1 : Chapters 1-4

TOPICS:

- Lines and planes in 2-D and 3-D

- Curves and surfaces in 2-D and 3-D

- Implicit vs. parametrized representations

- Euclidean $n$-dimensional space; coordinates

- Lines, planes, and hyperplanes in $\mathbb{R}^{n}$

- Vectors; their notation, algebra, geometry, applications

LEARNING OBJECTIVES:

- Write implicit and parametric formulae for lines and planes in 3-d

- Interpret parametric formulae for curves and surfaces in
  $\mathbb{R}^{n}$

- Use coordinates in $\mathbb{R}^{n}$ to compute distances between
  points

- Express vectors in $\mathbb{R}^{n}$ using coordinates or standard
  basis vectors

- Perform vector addition and scalar multiplication

- Compute and compare lengths of vectors

### PRIMER

This first (short) week of Multivariable Calculus has no calculus
whatsoever: that comes later. We will spend the first *month* in
preparation for the calculus to come, with only a few small sidequests
using derivatives and integrals.

Our story begins with simple elements -- lines, planes, curves, and
surfaces -- in 2-D and 3-D, with more dimensions to follow. Certain
formulae for lines (in 2-D) and planes (in 3-D) should be known /
remembered from earlier courses:

$$lines\ :\ \ ax + by = c\ \ \ \ or\ \ \ \frac{x}{a} + \frac{y}{b} = 1\ ,$$

$$planes\ :\ \ ax + by + cz = d\ \ \ \ or\ \ \ \frac{x}{a} + \frac{y}{b} + \frac{z}{c} = 1\ .$$

The constants $(a,b,c,d)$ are related to how the line/plane is inclined
-- *cf.* slope. Such formulae are called *implicit* representations.
Implicit functions can also be used to define curves in 2-D and surfaces
in 3-D. The formula for a circle or a sphere of radius $R$ at a point
$(x_{0},\ y_{0})$ in 2-D or $(x_{0},y_{0},z_{0})$ in 3-D should be
familiar examples:

$$\left( x - x_{0} \right)^{2} + \left( y - y_{0} \right)^{2} = R^{2}\ \ \ \ :\ \ \ \left( x - x_{0} \right)^{2} + \left( y - y_{0} \right)^{2} + \left( z - z_{0} \right)^{2} = R^{2}\ .$$

There are numerous related formulae for quadratic surfaces in 3-D:
*ellipsoids, paraboloids, hyperboloids, cones*, and more are standard
examples. The formulae for such can be ignored for the time being.
Eventually, you will want familiarity with spheres, ellipsoids, and
cones. Many of the more unusual quadratic surfaces will not appear in
our story.

Implicit representations of curves and surfaces have a mirror image in
the form of *parametric* representations, where one or two parameters
are used to trace out the object like so: a 2-D curve
$f(t) = \left( x(t),\ y(t) \right)$ or a 3-D surface
$g(s,t) = \left( x(s,t),\ y(s,t),\ z(s,t) \right)$. These parametrized
curves and surfaces are perhaps not as familiar; they will be very
useful to us, as they give explicit instructions for how to draw the
object. Implicit representations are indeed implicit.

COORDINATES. We will work in the *Euclidean space* $\mathbb{R}^{n}$
consisting of ordered $n$-tuples of real numbers:
$\mathbb{R}^{n} = \left\{ \left( x_{1},x_{2},\ldots,x_{n} \right)\ :x_{i}\mathbb{\in R} \right\}$.
The numbers $x_{i}$ used to describe a point in $\mathbb{R}^{n}$ are
called the *coordinates* of the point. In 2-D or 3-D, we often use the
more familiar $(x,y)$ and $(x,y,z)$ coordinates.

The distance between two points in $\mathbb{R}^{n}$ is via the usual
Pythagorean formula:

$$d(p,q) = \sqrt{\left( p_{1} - q_{1} \right)^{2} + \left( p_{2} - q_{2} \right)^{2} + \cdots + \left( p_{n} - q_{n} \right)^{2}}\ .$$

Distances can be deceiving when working in higher dimensions. Any two
points in a unit-radius $n$-dimensional ball are within a distance of
two from each other. However, for a unit cube in $\mathbb{R}^{n}$, the
antipodal corners are separated by a distance of $\sqrt{n}$ : a unit
cube does not fit inside a ball of diameter two (or two hundred) for
sufficiently high dimensions.

Lines, planes, curves, and surfaces have higher-dimensional
generalizations. Curves and surfaces are easiest to represent
parametrically. Higher-dimensional objects are more easily defined
implicitly: consider a single implicit equation of the form

$$\sum_{i = 1}^{n}{c_{i}x_{i}} = 1\ ,$$

where the $c_{1},\ \ldots,\ c_{n}$ are constants. This is an *affine*
equation (linear plus a constant) and the geometric object it encodes is
called a *hyperplane*. A hyperplane is flat like a plane in 3-D and
similarly divides $\mathbb{R}^{n}$ into two *sides*. These (and their
nonlinear generalizations) are important objects in machine learning,
determining *classifiers* which separate two regions of data in
$\mathbb{R}^{n}$.

VECTORS. Vectors can be defined, for the purposes of this course, as
differences between two points in $\mathbb{R}^{n}$. As these differences
give a sequence of $n\ $real numbers, there is temptation to conflate
vectors and points: this is unavoidable. We will therefore write out the
components of vectors vertically, like so:

$$\mathbf{v} = \begin{pmatrix}
v_{1} \\
v_{2} \\
 \vdots \\
v_{n}
\end{pmatrix}\ .$$

There are many possible notations for vectors:
$\mathbf{v},\ \overrightarrow{v},\ \overline{v},\ $and $\underline{v}$
are common. We will use bold letters to denote vectors (though you
should be prepared to see others, especially on a blackboard). Other
approaches to vectors may have been shown to you in other classes: they
are often defined as *arrows* or *quantities with magnitude and
direction*. When you take Linear Algebra, you will learn just how
precise and general the true definition of a vector is. In this course,
we work with the more limited *Euclidean* (and finite dimensional!)
version in which vectors are finite ordered $n$-tuples of *scalars*
(real numbers).

Vectors, like scalars, have their own arithmetic and geometry. Vectors
can be *rescaled* (to double a vector, double each component). Negative
rescalings *reverse* *the direction* as it were. Vectors can be *added*,
arithmetically by adding components, or geometrically by manipulating
arrows tip-to-tail. The *zero vector* has all entries zero and is an
additive identity: adding it does nothing.

The ability to add and rescale vectors leads to the notion of a *linear
combination* of vectors -- sums of rescaled vectors. Any vector can be
decomposed as a linear combination of certain fundamental *basis
vectors*. In $\mathbb{R}^{3}$, the standard basis vectors are unit
axis-aligned vectors denoted $\widehat{i},\ \widehat{j},\ $and
$\widehat{k}$ respectively, for the $x$-, $y$-, and $z$-axes. In
$\mathbb{R}^{n}$, one usually denotes the standard basis vectors as
${\widehat{e}}_{i}$ for $i = 1\ldots n$. These vectors have all-but-one
components $0$, with $1$ for the $i$^th^ component.

Why do we choose to do multivariable calculus in terms of vectors? They
provide a very convenient data structure for geometric information. For
example, in the implicit formula for a plane in $\mathbb{R}^{3}$, we can
encode the inclination as a vector:

$$ax + by + cz = d\ \ \ \  \Rightarrow \ \ \ \ \mathbf{v} = \begin{pmatrix}
a \\
b \\
c
\end{pmatrix}\ .$$

This vector $\mathbf{v}$ is *orthogonal* to the plane -- it meets the
plane at a right angle. That is convenient, but only the beginning of
the utility of vectors.

### DISCUSSION

[QUESTION 1.]{.underline} What happens of you take the equation of a
line, such as $2x - 3y = 7$ and interpret it in 3-D? Why is that not a
line? Why is there such a difference between implicit representations of
lines in 2-D versus 3-D?

Students should understand that the equation is true for all values of
$z$ and thus gives a plane in 3-D. Subsequent question: does that single
2-variable equation determine "things" in all higher dimensions?
Hyperplanes are defined by one affine equation in $\mathbb{R}^{n}$.

[QUESTION 2]{.underline}. If two affine equations in 3-D give an
implicit representation of a line, how can a parametric representation
be generated? Specifically, given

$$3x + y - z = 4\ \ \ \ \ :\ \ \ \ x - 2y + z = 1\ ,$$

how does one parametrize this line?

If students are stuck, suggest choosing some variable to act as
parameter, say $y = t$. The idea of combining equations will follow.
E.g., adding the equations yields $4x - y = 5$, which allows for solving
for $x(t) = (y + 5)/4$, etc.

This is a foreshadowing of row operations for solving linear systems of
equations.

[QUESTION 3.]{.underline} Where does the line
$x(t) = 2t - 1;\ y(t) = 3t + 2;\ z(t) = 4t$ intersect the plane given by
$4x + 3y - z = 3$? What happens if it is not a plane but a more general
surface? How hard can this get?

Students should see quickly to substitute and solve. Subsequent
questions help with understanding the tradeoffs between implicit and
parametric representations.

[QUESTION 4.]{.underline} Given two lines in $\mathbb{R}^{2}$, what is
their intersection? What *could* it be? What *typically* happens? What
about two lines in $\mathbb{R}^{3}$? A line and a plane? Two planes?
What about two planes in $\mathbb{R}^{4}$? Can you see any patterns?

This is a challenge for many students; reassure them that it's difficult
but worthwhile to reason in dimension higher than three.

[QUESTION 5.]{.underline} In $\mathbb{R}^{4}$, coordinatized by
$(x_{1},x_{2},x_{3},x_{4})$, what is the intersection between the
$(x_{1},x_{2})$ plane and the $\left( x_{3},x_{4} \right)$ plane?

Students can feel both confident and totally lost here. This is a good
chance to go over coordinates and their use. Try pulling back to
$\mathbb{R}^{3}$ and asking for a definition of the $(x,y)$ plane in
terms of a set of points satisfying conditions: $\{(x,y,z)\ :z = 0\}$.
Then, ask students what it means to intersect two sets defined by
conditions.

This is a foreshadowing of intersection as logical AND, to be seen again
in multivariate probability.

[QUESTION 6.]{.underline} Compute a parametrization of the line in
$\mathbb{R}^{4}$ which passes through the points $(1,3, - 2,0)$ and
$(4,2,1, - 5)$.

This is a good way to pivot to vectors. After doing this, ask "How
difficult was this? Was it much harder to work in 4-D than in 3-D?"

[QUESTION 7.]{.underline} Here is a parametrization of a sphere of
radius $R$ at the origin:

$$G(s,t) = \begin{pmatrix}
R\cos s\sin t \\
R\sin s\sin t \\
R\cos t
\end{pmatrix}$$

How can you verify that this is indeed a sphere?

Ask students to recall what the implicit equation is & the rest should
follow from them. This opens the question -- how was this
parametrization produced? This is a foreshadowing of spherical
coordinates from Week 11.

[QUESTION 8.]{.underline} Can you think of an example where you would
want to work with points in $\mathbb{R}^{1000}$?

If students get stuck, encourage them to think in terms of digital
audio, images, genetic data, neuroscience data, stock portfolios, prices
for inflation estimates, climate data, etc. This is a good time to ask
students about what they are interested in studying for a major and how
the mathematics of this course may impact that field.

[QUESTION 9.]{.underline} For a pair of points in $\mathbb{R}^{1000}$,
is the Euclidean distance the best way to describe how far apart the
points are? What else could you do?

Get students to think in terms of when images are close, or when genetic
sequences are close. One could use this as an opportunity to mention
Hamming distance or the metropolis metric, though these are not going to
be used in this course.

[QUESTION 10.]{.underline} We will not be dealing with
infinite-dimensional vectors in this class, but consider for a moment a
vector $\mathbf{v}$ with components $v_{n}$ for
$n = 1,\ 2,\ 3,\ \ldots$. If $v_{n} = 1/n$, then what is the length
$|\mathbf{v}|$ equal to?

*Under what conditions on the asymptotics of* $v_{n}$ *would finite
length be guaranteed?*

[QUESTION 11.]{.underline} What does a hyperplane in $\mathbb{R}^{n}$
look like? Does it matter?

*This is of course ill-defined, but it's worth listening to what
students come up with. This is a good opportunity to talk about
classifiers in Machine Learning or linear constraints in Economics, the
latter coming from an equation* $\sum_{i}^{}{C_{i}x_{i} = K}$*, where*
$C_{i} > 0$ *is resource cost,* $x_{i} \geq 0$ *is quantity, and*
$K > 0$ *is total budget.*

[QUESTION 12.]{.underline} What is the volume of a unit cube in
$\mathbb{R}^{n}$? What is the volume of the inscribed ball of radius
$1$? What about radius $10$? What happens as $n \rightarrow \infty$?

*Depending on the background of the student, they may be surprised to
hear that the ball of radius* $10$ *has volume tending (swiftly!) to
zero as dimension increases. For the skeptical, ask if any portion of
the cube "pokes out" from the ball as dimension increases... How many
corners does the cube have? Is that where all the volume "really lives"?
This is a deep set of questions that foreshadows bonus material in Week
11.*

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the following planes in
$\mathbb{R}^{3}$, where $C$ is a constant:

$$2Cx - 3y + (C + 4)z = 5\ \ \ \ \ :\ \ \ \ (C + 1)x + Cy - z = 1$$

A\) Assuming $C = 1$ and $z = 0$, find the point of intersection of
these two planes.

B\) Assuming that $C = 0$, find a vector that points along the line of
intersection between these two planes.

[PROBLEM 2.]{.underline} Consider the vectors

$$\mathbf{a} = \begin{pmatrix}
5 \\
3 \\
 - 1
\end{pmatrix}\ \ \ \ \ :\ \ \ \mathbf{b} = \begin{pmatrix}
 - 2 \\
1 \\
0
\end{pmatrix}$$

A\) Write down an implicit equation for a plane passing through the
point $(1,2,3)$ and orthogonal to $\mathbf{a}$.

B\) Write down a parametrized equation of a line passing through the
point $(1,2,3)$ and tangent to the vector $\mathbf{b}$.

[PROBLEM 3.]{.underline} Consider the vectors

$$\mathbf{u} = \begin{pmatrix}
2 \\
 - 1 \\
3
\end{pmatrix}\ \ \ \ \ :\ \ \ \mathbf{v} = \begin{pmatrix}
 - 1 \\
4 \\
2
\end{pmatrix}$$

A\) Write down an implicit equation for a plane passing through the
point $(0, - 1,2)$ and orthogonal to $\mathbf{u}$.

B\) Write down a parametrized equation of a line passing through the
point $(0, - 1,2)$ and tangent to the vector $\mathbf{v}$.

[PROBLEM 4.]{.underline} Consider the following parametrized plane given
by

$$F\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
2u - v \\
u + v - 3 \\
3u + v + 1
\end{pmatrix}$$

A\) Give two nonzero vectors tangent to the plane.

B\) Find the point on this plane which intersects the $z$-axis.

[PROBLEM 5.]{.underline} Consider the following vector in
$\mathbb{R}^{6}$:
$\mathbf{v} = 2{\widehat{e}}_{2} + {\widehat{e}}_{3} - {\widehat{e}}_{4} + {\widehat{e}}_{5} - 3{\widehat{e}}_{6}$.

A\) Compute the length of $\mathbf{v}$.

B\) Give an example of a vector with the same length as $\mathbf{v}$ but
which is not parallel to $\mathbf{v}$.

C\) Is there a vector with all entries positive that is parallel to
$\mathbf{v}$? If so, give an example; if not, explain why not.

[PROBLEM 6.]{.underline} Which of the following five vectors in
$\mathbb{R}^{5}$ is longest/shortest?

$$\mathbf{u} = \begin{pmatrix}
1 \\
1 \\
 - 5 \\
0 \\
3
\end{pmatrix}\ \ \ \ \ \mathbf{v} = \begin{pmatrix}
 - 5 \\
2 \\
2 \\
2 \\
0
\end{pmatrix}\ \ \ \ \ \mathbf{w} = \begin{pmatrix}
0 \\
3 \\
5 \\
0 \\
 - 2
\end{pmatrix}\mathbf{\ \ \ \ \ x} = \begin{pmatrix}
3 \\
0 \\
 - 2 \\
0 \\
 - 5
\end{pmatrix}\ \ \ \ \ \mathbf{y} = \begin{pmatrix}
 - 5 \\
0 \\
3 \\
2 \\
 - 1
\end{pmatrix}\ $$

[PROBLEM 7.]{.underline} A) Write down an implicit equation of a plane
which intersects the $x$-axis at $3$; the $y$-axis at $- 2$; and the
$z$-axis at $- 5$.

B\) Give an example of a vector that is orthogonal to this plane.

[PROBLEM 8.]{.underline} A) Write down a parametrization of a line that
passes through the points $(1,3, - 5)$ and $(2,4,0)$ using a parameter
$s$.

B\) Write down a parametrization of a line that passes through the
points $(2,4,0)$, and $(3,0, - 2)$ using a parameter $t$.

C\) Write down a parametrization of a plane that passes through the
points $(1,3, - 5)$, $(2,4,0)$, and $(3,0, - 2)$ using parameter $s$ and
$t$.

[PROBLEM 9.]{.underline} Consider the parametrized arc given by

$$\gamma(t) = \left( \begin{array}{r}
\cos{3t} \\
1 - \sin{2t} \\
3 + \sin{3t} \\
2 + \cos{2t}
\end{array} \right)\ \ \ \ :\ \ \ 0 \leq t \leq \frac{\pi}{2}$$

A\) Write down a vector that points from one endpoint of this arc to the
other. How many such vectors are there?

B\) What is the distance between the endpoints of this arc?

[PROBLEM 10.]{.underline} For what values of constant $C$ are the planes
given by

$$3Cx + 16y + Cz = 5\ \ \ \ \ \&\ \ \ \ 12x + Cy + 4z = 17$$

parallel to each other?

[PROBLEM 11.]{.underline} Consider the points $P = (2, - 3,5)$ and
$Q = (4,1,7)$.

A\) What is the distance between $P$ and $Q$?

B\) Give a parametrization of a line passing through both $P$ and $Q$.

[PROBLEM 12.]{.underline} Consider the line parametrized via

$$\gamma(s) = \left( \begin{array}{r}
2s - 1 \\
3s + 2 \\
4s
\end{array} \right)\ $$

A\) Find a point where this line intersects the plane given implicitly
by

$$2x - 3y + z = 10$$

B\) Does this line intersect this plane orthogonally? Explain.

[PROBLEM 13.]{.underline} Consider the plane given by
$4x + 12y - 5z = 6$.

A\) Give an example of a vector tangent to the plane; and a vector that
is orthogonal to the plane, noting which is which.

B\) Parametrize a line that is orthogonal to this plane at the point
$(1,\ 1,\ 2)$.

[PROBLEM 14.]{.underline} Write down parametrizations of a line between
the following points (using a parameter $t$ from $0$ to $1$):

A\) From $(0,1)$ to $(3,0)$ in the plane;

B\) From $(0,1,7,2, - 3)$ to $(3,0,5, - 4,8)$ in $\mathbb{R}^{5}$.

[PROBLEM 15.]{.underline} Consider the following parametrized lines:

$$\gamma_{1}\left( t_{1} \right) = \left( \begin{array}{r}
3 + 2t_{1} \\
5 + 3t_{1} \\
7 + 4t_{1}
\end{array} \right)\ \ \ \ \ :\ \ \ \ \gamma_{2}\left( t_{2} \right) = \left( \begin{array}{r}
 - 1 + t_{2} \\
 - 2 + 2t_{2} \\
11 - 4t_{2}
\end{array} \right)$$

A\) Find the point at which these two lines intersect.

B\) Write down a parametrized plane $S\left( t_{1},t_{2} \right)$ which
contains both lines.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) $(1, - 1,0)$ ; B) substitute to get
$- 3y + 4z = 5$ and $x - z = 1$. Parametrize to get, e.g., $z = t$,
$y = - \frac{5}{3} + \frac{4}{3}t$, and $x = 1 + t$. The tangent vector
is $3\widehat{i} + 4\widehat{j} + 3\widehat{k}$.

[PROBLEM 2.]{.underline} A) $5x + 3y - z = 8$ ; B)
$\gamma(t) = \left( \begin{array}{r}
1 - 2t \\
2 + t \\
3
\end{array} \right)$

[PROBLEM 3.]{.underline} A) $2x - y + 3z = 7$ ; B)
$\gamma(t) = \left( \begin{array}{r}
 - t \\
4t - 1 \\
2t + 2
\end{array} \right)$

[PROBLEM 4.]{.underline} A) $\left( \begin{array}{r}
2 \\
1 \\
3
\end{array} \right)$ and $\left( \begin{array}{r}
 - 1 \\
1 \\
1
\end{array} \right)$ ; B) set $2u - v = 0 = u + v - 3$; solve for
$u = 1,\ v = 2$.

[PROBLEM 5.]{.underline} A) $4$ ; B) anything with all zeros and one $4$
will do ; C) lol, no.

[PROBLEM 6.]{.underline} lengths are
$\sqrt{36},\ \sqrt{37},\ \sqrt{38},\ \sqrt{38},\ \sqrt{39}$, so
$\mathbf{u}$ shortest, $\mathbf{y}$ longest.

[PROBLEM 7.]{.underline} A)
$\frac{x}{3} - \frac{y}{2} - \frac{z}{5} = 1$ ; B) take coefficients and
multiply by $30$ to get $\left( \begin{array}{r}
10 \\
 - 15 \\
 - 6
\end{array} \right)$.

[PROBLEM 8.]{.underline} A) $\gamma_{1}(s) = \left( \begin{array}{r}
2 + s \\
4 + s \\
5s
\end{array} \right)$ ; B) $\gamma_{2}(s) = \left( \begin{array}{r}
2 + t \\
4 - 4t \\
 - 2t
\end{array} \right)$ ; C) $\ G(s,t) = \left( \begin{array}{r}
2 + s + t \\
4 + s - 4t \\
5s - 2t
\end{array} \right)$

[PROBLEM 9.]{.underline} A) the choices are $\left( \begin{array}{r}
0 \\
1 \\
2 \\
1
\end{array} \right) - \left( \begin{array}{r}
1 \\
1 \\
3 \\
3
\end{array} \right) = \left( \begin{array}{r}
 - 1 \\
0 \\
 - 1 \\
 - 2
\end{array} \right)$ and its opposite ; B) $\sqrt{6}$.

[PROBLEM 10.]{.underline} solve
$\frac{3C}{12} = \frac{16}{C} = \frac{C}{4}$ to get $C = 8$.

[PROBLEM 11.]{.underline} A) $\sqrt{4 + 16 + 4} = 2\sqrt{6}$ ; B) one
choice is $\gamma(t) = \left( \begin{array}{r}
2 + 2t \\
 - 3 + 4t \\
5 + 2t
\end{array} \right)$.

[PROBLEM 12.]{.underline} A) substitute and solve to get $s = - 18$ and
thus $( - 37,\  - 52,\  - 72)$ ; B) no.

[PROBLEM 13.]{.underline} A) $\left( \begin{array}{r}
4 \\
12 \\
 - 5
\end{array} \right)$ is orthogonal; there are many choices for a
tangent, e.g., $\left( \begin{array}{r}
 - 3 \\
1 \\
0
\end{array} \right)$, obtained by parametrizing, say, $y = t$; and
$z = 0$ ; B) let $\gamma(s) = \left( \begin{array}{r}
1 + 4s \\
1 + 12s \\
2 - 5s
\end{array} \right)$.

[PROBLEM 14.]{.underline} A) $x = 3t,\ y = 1 - t$ ; B) $z = 7 - 2t$, and
keep going... :-/

[PROBLEM 15.]{.underline} A) solve for $t_{1} = - 1$ and $t_{2} = 2$ ;
B) let $S\left( t_{1},t_{2} \right) = \left( \begin{array}{r}
1 + 2t_{1} + t_{2} \\
2 + 3t_{1} + 2t_{2} \\
3 + 4t_{1} - 4t_{2}
\end{array} \right)$

## 

## Week 2 : Vector Calculus

### OUTLINE

MATERIALS: Calculus BLUE : Vol 1 : Chapters 5-8; Interlude

TOPICS:

- Dot product; cross product; scalar triple product

- The geometry of vectors in $\mathbb{R}^{3}$ and beyond

- Lengths of curves in $\mathbb{R}^{n}$

- Velocity and acceleration of curves

- Unit tangent and unit normal vectors to curves

- Curvature and geometry of curves

- Applications to physics of 3-D motion

- Why vector calculus of curves is not enough

LEARNING OBJECTIVES:

- Compute dot products and interpret them geometrically

- Determine angles between vectors in $\mathbb{R}^{n}$

- Use equations for hyperplanes in terms of dot products

- Compute cross products and interpret them geometrically

- Compute scalar triple products and interpret them geometrically

- Determine velocity and acceleration vectors of parametrized curves

- Compute unit tangent and unit normal vectors of parametrized curves

- Set up and compute arclength integrals

### PRIMER

Vectors are the first step in building a language for multivariable
calculus: we will use them extensively. The algebra of vectors that we
saw last week extends and lifts to a version of the familiar
single-variable calculus for vector-valued functions. This will not be
sufficient for doing full multi-variable calculus, but it provides a
good and intuitive starting point.

DOT PRODUCT. The most important operation on vectors is the *dot
product*, which takes two vectors in $\mathbb{R}^{n}$ and returns a
scalar by summing up the pairwise products of components:

$$\mathbf{u} \cdot \mathbf{v\ \ } = \ \sum_{i = 1}^{n}{u_{i}v_{i}}\  = \ \ \mathbf{v \cdot u}\ .$$

This has a number of nice properties, including linearity in each term.
The dot product gives substantial geometric information. One begins with
the observation that the dot product of a vector with itself is the
square of its length:
$\mathbf{v} \cdot \mathbf{v} = \left| \mathbf{v} \right|^{2}$. Any two
nonzero vectors in $\mathbb{R}^{n}$ have a well-defined angle between
them (within a plane containing both vectors). That angle, $\theta$, is
related to the dot product via:

$$\cos\theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\left| \mathbf{u} \right|\left| \mathbf{v} \right|}\ .$$

This yields a deeper and very useful interpretation, very useful as
*cosine similarity* in machine learning. For each unit vector
$\mathbf{u}$, the dot product $\mathbf{u \cdot v}$ is the *oriented
projected length* of the vector $\mathbf{v}$ along the
$\mathbf{u}$-axis. Vectors are orthogonal if and only if their dot
product vanishes. Positive dot products arise from vectors that point in
similar directions (an acute angle). Negative dot product connotes a
more antipodal direction (an obtuse angle). We will return to this
notion of an oriented projected length in Week 12.

CROSS PRODUCT. There is a fundamentally different binary operation on
vectors that returns a vector -- not a scalar -- as the product.
Unfortunately, this product is defined only in $\mathbb{R}^{3}$. The
*cross product* of two vectors is given by the formula

$$\mathbf{u} \times \mathbf{v\ } = \ \begin{pmatrix}
u_{1} \\
u_{2} \\
u_{3}
\end{pmatrix} \times \begin{pmatrix}
v_{1} \\
v_{2} \\
v_{3}
\end{pmatrix}\  = \ \begin{pmatrix}
u_{2}v_{3} - u_{3}v_{2} \\
u_{3}v_{1} - u_{1}v_{3} \\
u_{1}v_{2} - u_{2}v_{1}
\end{pmatrix}\  = \  - \left( \mathbf{v} \times \mathbf{u} \right)\mathbf{\ }.$$

This yields a vector which is orthogonal to both $\mathbf{u}$ and
$\mathbf{v}$ and whose length is equal to the area of the parallelogram
spanned by the two vectors, an observation that will become more and
more significant over time. The geometry of the cross product is
captured best by the formula for the angle between the two vectors:

$$\sin\theta = \frac{\left| \mathbf{u} \times \mathbf{v} \right|}{\left| \mathbf{u} \right||\mathbf{v}|}\ .$$

It is important to remember that the cross product is defined only for
vectors in $\mathbb{R}^{3}$ (or in a plane if included in 3-D).
Higher-dimensional analogues are hidden.

SCALAR TRIPLE PRODUCT. In 3-D, the dot and cross product combine into a
novel operation on a triple of vectors that returns a scalar: the
*scalar triple product*,

$$\mathbf{u \cdot}\left( \mathbf{v} \times \mathbf{w} \right) = u_{1}v_{2}w_{3} + u_{2}v_{3}w_{1} + u_{3}v_{1}w_{2} - u_{1}v_{3}w_{2} - u_{2}v_{1}w_{3} - u_{3}v_{2}w_{1}\ .$$

This has some intriguing symmetries, inherited from dot and cross
products:

$$\mathbf{u \cdot}\left( \mathbf{v} \times \mathbf{w} \right) = \mathbf{v \cdot}\left( \mathbf{w} \times \mathbf{u} \right) = \mathbf{w \cdot}\left( \mathbf{u} \times \mathbf{v} \right) = - \ \mathbf{u \cdot}\left( \mathbf{w} \times \mathbf{v} \right)\mathbf{\ \ }etc.$$

The scalar triple product measures *oriented* *volume*: the absolute
value equals the volume of the *parallelopiped* spanned by the three
vectors. The sign is a type of *orientation*, to be explored in Weeks 4
and 13. For now, a nod to the right-hand rule from elementary Physics
class is the best we can do to explain the +/- signs.

VECTOR CALCULUS. We have enough tools at hand to attempt building a
rudimentary vector calculus. Consider a parametrized curve
$\gamma:\lbrack a,b\rbrack \rightarrow \mathbb{R}^{n}$ as a
vector-valued function. At a given $t \in \lbrack a,b\rbrack$, the value
$\gamma(t)$ is a *position vector* from the origin to the point on the
curve. Each coordinate $x_{i}$ of $\gamma$ is a single-input
single-output function. As such, classical single-variable calculus
tells us how to define a derivative: $\gamma' = d\gamma/dt$ is a
vector-valued function whose $i$^th^ term equals $dx_{i}/dt$, the rate
of change of the $x_{i}$ coordinate with respect to $t$. Thinking of the
parameter as time, the derivative is interpretable as a velocity vector,
$\mathbf{v}(t)$. This velocity vector is tangent to the curve, as
follows from the definition:

$$\frac{d\gamma}{dt} = \lim_{\epsilon \rightarrow 0}\frac{\gamma(t + \epsilon) - \gamma(t)}{\epsilon}$$

Note how this uses vector addition and scalar multiplication of vectors.
One continues: the acceleration vector $\mathbf{a}(t)$ is the derivative
of the velocity vector. In Physics, one tends to analyze the motion of
objects along time-parametrized paths. This leads to discussions of the
*unit tangent* $\widehat{T}$ and *unit normal* $\widehat{N}$ vectors,
which are orthogonal and span the local plane of motion:

$$\widehat{T} = \frac{\mathbf{v}}{\left| \mathbf{v} \right|}\ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \widehat{N} = \frac{d\widehat{T}/dt}{|d\widehat{T}/dt|}\ .$$

These in turn can be used to define related quantities in Geometry, such
as *curvature* and *torsion*, alluded to in the videos but not on our
main storyline.

Apart from Physics, vector-valued functions are of broad interest in
Mathematics, Data Science, Machine Learning, Economics, and more. One
wonders: how much of multivariable calculus can be built up from
properties of vectors? For example, there are some intriguing
differentiation rules that mimic the product rule, but with dot products
and cross products:

$$\left( \mathbf{u \cdot v} \right)^{\mathbf{'}} = \mathbf{u}^{\mathbf{'}}\mathbf{\cdot v} + \mathbf{u \cdot}\mathbf{v}'\ \ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \ \left( \mathbf{u \times v} \right)^{\mathbf{'}}\mathbf{=}\mathbf{u}^{\mathbf{'}}\mathbf{\times v + u \times}\mathbf{v}^{\mathbf{'}}\mathbf{.}$$

What can be done with integration? In the same way that we differentiate
a vector-valued function term-by-term, one can integrate. This is of
limited use. A more helpful way to use integrals and vectors is the
following computation of arclength of a parametrized curve. By the
Pythagorean Theorem applied to local changes in the position, one
obtains an arclength element:

$$d\mathcal{l = \ }\left| \frac{d\gamma}{dt} \right|dt\mathcal{\ \ \ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \ l = \ }\int_{\gamma}^{}{d\mathcal{l}} = \ \int_{t = a}^{b}{\left| \frac{d\gamma}{dt} \right|dt}\ .$$

This simplifies the classical formula for arclength of a graph from
single variable calculus, while lifting it to arbitrary dimensions.

INTERLUDE. This arclength result -- in which adding a little vector
algebra to classical single-variable calculus yields a formula that
works in all dimensions -- is a short-lived victory. Tweaking
single-variable calculus with vectors suffices for 19^th^ and 20^th^
century problems of projectiles and profits, satellites and sailboats.
Consider the following contemporary challenges:

- Given an industrial plant manufacturing hundreds of products from
  thousands of components, how will output product prices change as a
  function of changes in input component prices?

- Coordinating a swarm of drones requires tracking positions, bearings,
  and velocities for every drone. If they communicate locally with
  neighbors, how long will it take for the swarm to align towards a
  goal?

- Electrical activity from hundreds of neurons are read as time series
  over dozens of experimental runs; how does one compute average
  activities and establish correlations between different neurons'
  activities?

- Viral infections spread based on social contact; viral propaganda
  spreads based on virtual contact over social networks. In what ways
  are epidemics and preference cascades comparable? How are their
  mathematical models similar or different?

- Large Language Models \[LLMs\] and other Neural Networks \[NNs\] are
  tuned by solving optimization problems over the relevant parameter
  spaces. Current LLMs have hundreds of billions of parameters.

To do contemporary multivariate calculus in full, we need more tools.
This will be our focus for the next two weeks.

### DISCUSSION

[QUESTION 1.]{.underline} For what value of constant $c$ are the planes
$2cx - y + c^{2}z = 15$ and $x + 5cy - 3z = 4$ orthogonal? Is there more
than one answer? Is there any answer?

This is a good time to remind students that when they see the word
"orthogonal" they should think "dot product zero!" Does the idea of two
planes being orthogonal make sense?

[QUESTION 2]{.underline}. What is the angle between the planes
$x - 2y + 3z = 6$ and $2x + 3y - z = 11$?

This again gets at the question of what the angle between planes means.
In this case, it is more ambiguous: is it the "smaller" angle or the
"larger" one? If confused, students should bump the problem down to two
lines in a plane. Ask leading questions about the formula for computing
the angle to bring students back to the meaning (and domain and
codomain) of the inverse cosine function.

[QUESTION 3.]{.underline} What is the angle between the grand diagonal
of a cube in $\mathbb{R}^{n}$ and an incident edge?

This is a great problem to work through. Start by asking whether the
orientation or size of the cube matters. Does the incident edge chosen
matter? How many such edges are there? What is the angle in 2-D? in 3-D?
What do you guess will happen as $n \rightarrow \infty$? What does in
fact happen? If students are stuck, begin by suggesting that they assign
one corner of the cube to the origin: that suggests using coordinates;
the next suggestion is to recall the angle formula between two vectors.

[QUESTION 4.]{.underline} What is the area of the triangle in the plane
with vertices at $(1,3)$, $( - 2,0)$, and $(5,2)$?

If stuck, help students by drawing the triangle in the plane. Is there
anything that vectors can help with? It might take some imagination to
get to a parallelogram cut in half.

[QUESTION 5.]{.underline} What is the volume of the parallelopiped
spanned by the vectors $\widehat{i}$, $\widehat{j}$, and $\mathbf{v}$,
where $\mathbf{v}$ has components $v_{x},\ v_{y},\ $and $v_{z}$?

There are several ways to do this so as to make the computations easy.
After getting the (simple) final answer, think about why this makes
sense geometrically. Why is it that object has this volume?

This is a foreshadowing of shearing that will be important in Week 4.

[QUESTION 6.]{.underline} If you move along a curve in 3-D at a constant
speed, what can you say about the acceleration? Is it zero?

One can of course reverse the question later and ask if the acceleration
is zero, what kind of path are you travelling on?

[QUESTION 7.]{.underline} Compute the velocity, acceleration, and
arclength element of the curve with components ($\cosh t$, $\sinh t$,
$\tanh t$).

This should begin with a long digression on the hyperbolic trig
functions: their definition, their properties in terms of derivatives,
and perhaps their Taylor expansions. If the arclength is set up, this
provides an object lesson in how very few arclength integrals can be
computed in full.

[QUESTION 8.]{.underline} Compute the length of a general helix in 3-d
with radius $R$ and height $C$. What are the asymptotics for small $R$,
$C$? Why does this make sense?

Begin with a parametrization, making sure to place bounds on the
parameter. This is a good time to recall the standard parametrization of
the circle:
$x(t) = R\cos{2\pi t}\ ;y(t) = R\sin{2\pi t\ };0 \leq t \leq 1$, as it
will be so frequently used. What should the last coordinate be to have
$0 \leq z \leq C$? When the answer is computed, try taking limits as
different constants go to zero: as $R \rightarrow 0$, the helix becomes
a straight line, but as $C \rightarrow 0$, the helix becomes a circle.
Can you infer a relationship to the classic Pythagoras Theorem?

[QUESTION 9.]{.underline} Compute the arclength of the parametrized
curve in 4-D:

$$\gamma(t) = \begin{pmatrix}
A\cos t \\
A\sin t \\
B\cos t \\
B\sin t
\end{pmatrix}\ \ :\ \ 0 \leq t \leq 2\pi$$

This discussion should begin with trying to visualize what the curve
looks like. Is it a circle? What happens when you plot this in the
$(x_{1},x_{2})$ plane? in the $(x_{2},x_{4})$ plane? Thankfully, the
arclength element is easily integrated. What are the asymptotics of the
solution as $A \rightarrow 0$? This is a good problem for getting
comfortable with parametrized curves in higher dimensions.

[QUESTION 10.]{.underline} What is the projected length of the vector
$\mathbf{w}$ onto the "$\mathbf{v}$-axis":

$$\mathbf{w} = \begin{pmatrix}
5 \\
 - 6 \\
2 \\
 - 7
\end{pmatrix}\ \ \ \ \ \ :\ \ \ \ \ \ \mathbf{v} = \begin{pmatrix}
0 \\
3 \\
4 \\
0
\end{pmatrix}$$

One uses the dot product, normalizing $\mathbf{v}$ by length. However,
what does (oriented) projected length mean? Consider the dot product
with $\widehat{i}$ (or ${\widehat{e}}_{1}$ in higher dimensions). We
know this means "record the i^th^ component." In fact, this very
operation -- take a dot product with $\widehat{i}$, or project to the
$x$-axis -- will in Week 12 be given a new name: $dx$.

[QUESTION 11.]{.underline} It was stated that the unit tangent
$\widehat{T}$ and the unit normal $\widehat{N}$ vectors to a curve are
always orthogonal, but this was never proved. Is it true? Why?

This is a good introduction to how to argue. Try to get students to
start with a criterion for being orthogonal (dot product zero). How can
this be shown? Let students struggle a bit with the definition of the
unit normal vector as a derivative. Hint to the students that
$\widehat{T}$ has unit length -- what does that mean in terms of the dot
product? Just a little nudge to recall the product formula for
derivatives and the dot product suffices to complete a proof. Close by
asking the students why the dot-product rule for derivatives holds: that
will come in Week 6.

[QUESTION 12.]{.underline} What is the average length (or
length-squared) of a random binary vector in $n$ dimensions?

This is difficult/impossible to answer with the tools at hand and is
meant to get students thinking and/or arguing. How can one interpret
this geometrically? Is it any easier to compute the average dot product
of two random binary vectors in $n$-D? How might these answers/guesses
trend as $n \rightarrow \infty$?

[QUESTION 13.]{.underline} Why is it that two randomly-chosen vectors in
a sufficiently high-dimensional $\mathbb{R}^{n}$ are nearly orthogonal
with high probability?

This is hard to intuit and answer with precision. Perhaps the best
appeal to intuition is to discuss cosine similarity and text embeddings.
For a sufficiently well-embedded corpus of text, cosine similarity
matches with semantic similarity. For such a subset of vectors, the
problem becomes: what are the odds that two words chosen at random are
semantically related? Some students will find it clear that the odds are
near-zero. Thus, for this limited subset of high-dimensional vectors,
the result is not at all paradoxical and in fact is intuitive. For
students who wish to "do the math" try choosing a pair of vectors, each
of whose components are drawn from the set $\{ - 1,\ 0,\ 1\}$ at random.
Compute their dot products and lengths.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the following planes in
$\mathbb{R}^{3}$, where $C$ is constant:

$$2Cx - 3y + (C + 4)z = 5$$

$$(C + 1)x + Cy - z = 1$$

A\) Find the value(s) of $C$ that makes these planes orthogonal.

B\) Explain why if the planes are orthogonal at one intersection point
they are orthogonal at all intersection points.

[PROBLEM 2.]{.underline} Consider the following vectors in
$\mathbb{R}^{3}$:

$$\mathbf{u} = \begin{pmatrix}
1 \\
0 \\
 - 1
\end{pmatrix}\ \ \ \ \ :\ \ \ \mathbf{v} = \begin{pmatrix}
2 \\
 - 1 \\
3
\end{pmatrix}\ \ \ \ :\ \ \ \mathbf{w} = \begin{pmatrix}
 - 2 \\
0 \\
2
\end{pmatrix}\ \ \ \ \ \ \ \ $$

A\) Compute the cross product $\mathbf{u} \times \mathbf{v}$.

B\) Use a scalar-triple-product to compute the volume of the
"parallelopiped" generated by these three vectors $\mathbf{u,\ v,\ w}$.

[PROBLEM 3.]{.underline} Consider the following hyperplanes in
$\mathbb{R}^{4}$:

$${x_{1} - 2x_{2} + 5x_{3} + x_{4} = 8
}{3x_{1} + 6x_{2} + x_{3} + 4x_{4} = 17}$$

Identify vectors orthogonal to each hyperplane and use these to show
that the hyperplanes are orthogonal.

[PROBLEM 4.]{.underline} Consider the following planes in
$\mathbb{R}^{3}$:

$${2x - 3y + z = 5
}{3x + y - 2z = - 1}$$

Identify vectors orthogonal to each plane and use these to compute a
vector that is tangent to both planes.

[PROBLEM 5.]{.underline} Consider the following vector in
$\mathbb{R}^{5}$:

$$\mathbf{v} = \begin{pmatrix}
2 \\
 - 1 \\
3 \\
 - 1 \\
1
\end{pmatrix}\ $$

A\) Give an example of a nonzero vector that is orthogonal to
$\mathbf{v}$.

B\) Compute the angle between the vector $\mathbf{v}$ and the basis
vector ${\widehat{e}}_{1}$.

[PROBLEM 6.]{.underline} Consider the following vectors in
$\mathbb{R}^{4}$:

$$\mathbf{u} = \begin{pmatrix}
2 \\
1 \\
 - 2 \\
0
\end{pmatrix}\ \ \ :\ \mathbf{v} = \begin{pmatrix}
2 \\
1 \\
2 \\
 - 4
\end{pmatrix}\ $$

A\) Compute the angle between these two vectors.*.*

B\) Find a vector that is orthogonal to both $\mathbf{u}$ and
$\mathbf{v}$.

[PROBLEM 7.]{.underline} Consider the following vectors in
$\mathbb{R}^{4}$:

$$\mathbf{u} = \begin{pmatrix}
4 \\
 - 5 \\
2 \\
 - 2
\end{pmatrix}\ \ \ :\ \mathbf{v} = \begin{pmatrix}
3 \\
0 \\
4 \\
0
\end{pmatrix}\ $$

A\) Compute the angle between these two vectors.

B\) Compute the projected length of $\mathbf{u}$ onto the
"$\mathbf{v}$-axis".

[PROBLEM 8.]{.underline} Consider the following vectors in
$\mathbb{R}^{3}$:

$$\mathbf{u} = \begin{pmatrix}
3 \\
 - 1 \\
2
\end{pmatrix}\ \ \ \ :\ \ \ \ \mathbf{v} = \begin{pmatrix}
 - 1 \\
1 \\
4
\end{pmatrix}\ \ \ \ :\ \ \ \ \mathbf{w} = \begin{pmatrix}
2 \\
0 \\
 - 3
\end{pmatrix}$$

A\) Without computing the angles between them, figure out which pair of
vectors, above, has the largest and smallest angle between them.

B\) Compute the volume of the parallelopiped spanned by $\mathbf{u}$ and
$\mathbf{v}$ and $\mathbf{w}$. *Show work.*

[PROBLEM 9.]{.underline} Consider the following vectors in
$\mathbb{R}^{4}$

$$\mathbf{a} = \begin{pmatrix}
0 \\
 - 1 \\
0 \\
2
\end{pmatrix}\ \ \ :\ \ \mathbf{b} = \begin{pmatrix}
0 \\
3 \\
 - 2 \\
 - 1
\end{pmatrix}\ \ \ :\ \ \mathbf{c} = \begin{pmatrix}
1 \\
 - 3 \\
2 \\
0
\end{pmatrix}\ \ \ :\ \mathbf{d} = \ \begin{pmatrix}
 - 1 \\
1 \\
0 \\
2
\end{pmatrix}\ \ \ :\ \ \mathbf{e} = \begin{pmatrix}
1 \\
2 \\
0 \\
4
\end{pmatrix}\ \ \ $$

A\) Which of the above vectors have a positive dot product with
$\mathbf{c}$?

B\) Is there a pair of vectors from
$\mathbf{a},\ \mathbf{b},\ \mathbf{c},\ \mathbf{d},\ \mathbf{e}$ that is
orthogonal?

C\) Give an example of a nonzero vector that is orthogonal to both
$\mathbf{a}$ and $\mathbf{b}$.

[PROBLEM 10.]{.underline} Consider the following four vectors in
$\mathbb{R}^{3}$:

$$\mathbf{a} = \begin{pmatrix}
1 \\
0 \\
 - 3
\end{pmatrix}\ \ :\ \ \mathbf{b} = \begin{pmatrix}
0 \\
2 \\
5
\end{pmatrix}\ \ :\ \ \mathbf{c} = \begin{pmatrix}
1 \\
3 \\
0
\end{pmatrix}\ \ :\ \ \mathbf{d} = \begin{pmatrix}
4 \\
1 \\
0
\end{pmatrix}\ $$

A\) Is there a pair of orthogonal vectors among the above? *Explain.*

B\) Which three of the vectors above span a parallelopiped with the
largest volume?

[PROBLEM 11.]{.underline} Consider the following two curves in
$\mathbb{R}^{3}$:

$$\gamma_{1}(s) = \begin{pmatrix}
s^{2} - 3s \\
e^{s} - 1 \\
1 - \cos{2s}
\end{pmatrix}\ \ \ \ :\ \ \ \ \gamma_{2}(t) = \begin{pmatrix}
sin(t - 1) \\
t^{2} + t - 2 \\
1 - \sqrt{t}
\end{pmatrix}$$

A\) Verify that these curves intersect at the origin for some values of
$s$ and $t$.

B\) At what angle do these curves intersect at the origin?

C\) Find a vector that is orthogonal to both curves at the origin.

[PROBLEM 12.]{.underline} Consider the parametrized curve in 3-D given
by

$$\gamma(t) = \begin{pmatrix}
t^{2} - t + 4 \\
t^{3} - 3t^{2} + 2t - 1 \\
2t
\end{pmatrix}\begin{matrix}
 \leftarrow x \\
 \leftarrow y \\
 \leftarrow z
\end{matrix}$$

A\) Compute the velocity vector of this curve.

B\) Write down the equation of a plane orthogonal to this curve at
$\gamma(0)$.

C\) At what angle does this curve cross the $(x,y)$ plane $z = 0$ ?
Explain your reasoning and give your answer as best you can without a
calculator...

[PROBLEM 13.]{.underline} Consider the parameterized surface in 3-D
given by

$$S\begin{pmatrix}
u \\
v
\end{pmatrix} = \left( \begin{array}{r}
4 \\
0 \\
3
\end{array} \right) + \left( \begin{array}{r}
u^{2} \\
 - 3u \\
2u
\end{array} \right) + \left( \begin{array}{r}
0 \\
 - v^{2} \\
5v
\end{array} \right)$$

Note that the point $P = (5, - 4,10)$ is on this surface (at
$u = v = 1$).

A\) Find two nonparallel vectors tangent to the surface at this point
$P$.

B\) Find a nonzero vector perpendicular (i.e., orthogonal) to the
surface at this point $P$.

[PROBLEM 14.]{.underline} Consider the parametrized curve in
$\mathbb{R}^{4}$ given by

$$\gamma(s) = \begin{pmatrix}
3 - s \\
s^{2} + 2s \\
(s + 1)^{- 1} \\
 - s
\end{pmatrix}$$

A\) Compute the velocity vector to this curve at the point where
$s = - 2$.

B\) Compute the acceleration vector to this curve at the point where
$s = - 2$.

C\) Are the velocity and acceleration vectors at this point (where
$s = - 2$) orthogonal? Why or why not?

[PROBLEM 15.]{.underline} Consider the following curve in
$\mathbb{R}^{3}$:

$$\gamma(t) = \begin{pmatrix}
(2t - 1)^{2} \\
t^{3} - 3t + 4 \\
3t - 2
\end{pmatrix}\ \ \ \ \ :\ \ \ \ 0 \leq t \leq 2$$

A\) Compute the velocity vector of this curve.

B\) Compute the unit tangent vector at the point where $t = 1$.

C\) Set up but do not solve an integral to compute the arclength of this
curve.

[PROBLEM 16.]{.underline} Consider the following curve in
$\mathbb{R}^{6}$:

$$\gamma(t) = \begin{pmatrix}
5\cos t \\
4t \\
2t - 1 \\
5 - 2t \\
3\sin t \\
 - 4\sin t
\end{pmatrix}\ \ \ \ \ :\ \ \ \ 0 \leq t \leq \pi$$

A\) Compute the velocity vector of this curve.

B\) The length of the velocity vector is constant: what is it?

C\) Compute the arclength of this curve in $\mathbb{R}^{6}$.

[PROBLEM 17.]{.underline} Consider the following curve in
$\mathbb{R}^{4}$, where $C$ is constant:

$$\gamma(t) = \begin{pmatrix}
5\cos t \\
C\sin t \\
4\cos t \\
 - 6\sin t
\end{pmatrix}\ \ \ \ \ :\ \ \ \ 0 \leq t \leq \pi$$

A\) Compute the velocity and acceleration vectors of this curve.

B\) Compute and simply (if you can) the arclength element $d\mathcal{l}$
for this curve.

C\) The arclength element is constant for some value(s) of $C$: for what
value(s)?

[PROBLEM 18.]{.underline} Consider the parametrized curve in 3-D given
by

$$\gamma(t) = \begin{pmatrix}
3 - t \\
t^{3} + t - 2 \\
t^{2} - 3t + 4
\end{pmatrix}\begin{matrix}
 \leftarrow x \\
 \leftarrow y \\
 \leftarrow z
\end{matrix}$$

A\) Compute the velocity vector of this curve and evaluate at $t = 1$.

B\) Compute the acceleration vector of this curve and evaluate at
$t = 1$.

C\) At the point $\gamma(1) = (2,0,2)$, the velocity and acceleration
vectors lie in a plane. Compute a nonzero vector that is orthogonal to
this plane.

[PROBLEM 19.]{.underline} Consider the curve given by

$$\gamma(t) = \begin{pmatrix}
t^{2} - 5t + 4 \\
t(t - 1)(t - 2) \\
e^{t - 1} - 1
\end{pmatrix}$$

A\) Compute the velocity and acceleration vectors to this curve and
evaluate these at the point where $\gamma$ intersects the origin.

B\) Compute an implicit equation for a plane in $\mathbb{R}^{3}$
orthogonal to the curve $\gamma$ at the origin in the form
\_\_\_\_\_\_\_ $x\  +$ \_\_\_\_\_\_\_ $y\  +$ \_\_\_\_\_\_\_ $z\  =$
\_\_\_\_\_\_\_.

[PROBLEM 20.]{.underline} Consider the parametrized path

$$\gamma(t) = \begin{pmatrix}
\ln{(1 + t)} \\
e^{3t} \\
\sin{(2t)}
\end{pmatrix}$$

A\) Compute the velocity ($\mathbf{v}$) and acceleration ($\mathbf{a}$)
vectors to this curve.

B\) Compute the cross product $\mathbf{v}(0) \times \mathbf{a}(0)$.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) set the dot product of coefficient vectors
to zero to get $C^{2} - C - 2 = 0$ hence $C = 2$ or $C = - 1$ ; B) The
vector orthogonal to the plane is constant, and the dot product of those
two vectors determined the angle of intersection: it is constant.

[PROBLEM 2.]{.underline} A)
$\mathbf{u \times v} = \left( \begin{array}{r}
 - 1 \\
 - 5 \\
 - 1
\end{array} \right)$ ; B)
$\mathbf{u} \cdot \left( \mathbf{v \times w} \right) = \mathbf{w} \cdot \left( \mathbf{u \times v} \right) = \left( \begin{array}{r}
 - 2 \\
0 \\
2
\end{array} \right) \cdot \left( \begin{array}{r}
 - 1 \\
 - 5 \\
 - 1
\end{array} \right) = 0$

[PROBLEM 3.]{.underline} The coefficient vectors satisfy
$\left( \begin{array}{r}
1 \\
 - 2 \\
5 \\
1
\end{array} \right) \cdot \left( \begin{array}{r}
3 \\
6 \\
1 \\
4
\end{array} \right) = 0$

[PROBLEM 4.]{.underline} take the cross product of the orthogonal
vectors to the planes to obtain $\left( \begin{array}{r}
2 \\
 - 3 \\
1
\end{array} \right) \times \left( \begin{array}{r}
3 \\
1 \\
 - 2
\end{array} \right) = \left( \begin{array}{r}
5 \\
7 \\
11
\end{array} \right)$

[PROBLEM 5.]{.underline} A) choose any vector with dot product zero ; B)
$\arccos{\left( \frac{1}{2} \right) =}\frac{\pi}{3}$

[PROBLEM 6.]{.underline} A) $arccos(1/15)$ ; B) do not try to take a
cross product, as it is undefined!

[PROBLEM 7.]{.underline} A) $arccos(4/7)$; B)
$\mathbf{u \cdot v/}\left| \mathbf{v} \right| = 4$

[PROBLEM 8.]{.underline} A) note that $\mathbf{u \cdot v} > 0$,
$\mathbf{v \cdot w} < 0$, and $\mathbf{u \cdot w} = 0$, so that
$\mathbf{u,v}$ are acute and $\mathbf{v,\ w}$ are obtuse ; B)
$\mathbf{u \cdot}\left( \mathbf{v \times w} \right)\mathbf{=} - 18$

[PROBLEM 9.]{.underline} A) $\mathbf{a}$, and do not forget
$\mathbf{c}$*!* ; B) no; all dot products are nonzero ; C) make up a
vector with dot product zero, or take the cross product of the latter
three terms in $\mathbf{a,\ b}$ (since these lie in a 3-dimensional
subspace)

[PROBLEM 10.]{.underline} A) no ; B) scalar triple products are
$\pm 9,\  \pm 19,\  \pm 33,\  \pm 55$, $(\mathbf{b,c,d})$ is largest

[PROBLEM 11.]{.underline} A) let $s = 0,\ t = 1$; B)
$\gamma_{1}'(0) = \left( \begin{array}{r}
 - 3 \\
1 \\
0
\end{array} \right)$ and $\gamma_{2}'(1) = \left( \begin{array}{r}
1 \\
3 \\
 - \frac{1}{2}
\end{array} \right)$ have dot product zero and thus are orthogonal ; C)
via cross product or observation, $C\left( \begin{array}{r}
1 \\
3 \\
20
\end{array} \right)$

[PROBLEM 12.]{.underline} A) $\gamma'(t) = \left( \begin{array}{r}
2t - 1 \\
3t^{2} - 6t + 2 \\
2
\end{array} \right)$; B) $2y - x + 2z = - 6$ ; C) take the dot product
of $\gamma'(0)$ with $\widehat{k}$; the formula yields an angle of
$\arccos\left( \frac{2}{3} \right)$ with the $z$-axis or, better still,
an angle of $\frac{\pi}{2} - \arccos\left( \frac{2}{3} \right)$ with the
$(x,y)$ plane

[PROBLEM 13.]{.underline} A) compute the velocity vectors with respect
to $u$ and $v$ to obtain tangent vectors $\left( \begin{array}{r}
2u \\
 - 3 \\
2
\end{array} \right)$ and $\left( \begin{array}{r}
0 \\
 - 2v \\
5
\end{array} \right)$, with evaluation at $u = v = 1$ yieding
$\left( \begin{array}{r}
4 \\
 - 3 \\
2
\end{array} \right)$ and $\left( \begin{array}{r}
0 \\
 - 4 \\
5
\end{array} \right)$ ; B) the cross product of these two vector yields
$\left( \begin{array}{r}
 - 7 \\
 - 20 \\
 - 16
\end{array} \right)$

[PROBLEM 14.]{.underline} A) $\gamma'( - 2) = \left( \begin{array}{r}
 - 1 \\
 - 2 \\
 - 1 \\
 - 1
\end{array} \right)$; B) $\gamma^{''}( - 2) = \left( \begin{array}{r}
0 \\
2 \\
 - 2 \\
0
\end{array} \right)$; C)
$\gamma'( - 2) \cdot \gamma^{''}( - 2) = - 2 \neq 0$, so the vectors are
not orthogonal here

[PROBLEM 15.]{.underline} A) $\gamma'(t) = \left( \begin{array}{r}
4(2t - 1) \\
3(t^{2} - 1) \\
3
\end{array} \right)$ ; B)
$\widehat{T}(1) = \frac{1}{5}\left( \begin{array}{r}
4 \\
0 \\
3
\end{array} \right)$ ; C) the arclength is

$$\mathcal{l =}\int_{t = 0}^{2}{\sqrt{16(2t - 1)^{2} + 9\left( t^{2} - 1 \right)^{2} + 9}\ dt}$$

[PROBLEM 16.]{.underline} A) $\gamma'(t) = \begin{pmatrix}
 - 5\sin t \\
4 \\
2 \\
 - 2 \\
3\cos t \\
 - 4\cos t
\end{pmatrix}\ $ ; B) $\left| \gamma' \right| = \sqrt{49} = 7$ ; C)
$\mathcal{l = \int}d\mathcal{l =}7\pi$

[PROBLEM 17.]{.underline} A) $\gamma'(t) = \begin{pmatrix}
 - 5\sin t \\
C\cos t \\
 - 4\sin t \\
 - 6\cos t
\end{pmatrix}$ and $\gamma^{''}(t) = - \gamma(t)$;

B\)
$d\mathcal{l =}\sqrt{41\sin^{2}t + \left( 36 + C^{2} \right)\cos^{2}t}\ dt$
; C) choose $C = \pm \sqrt{5}\ $ to get $d\mathcal{l =}\sqrt{41}\ dt$

[PROBLEM 18.]{.underline} A) $\gamma' = \left( \begin{array}{r}
 - 1 \\
3t^{2} + 1 \\
2t - 3
\end{array} \right)$ and $\gamma'(1) = \left( \begin{array}{r}
 - 1 \\
4 \\
 - 1
\end{array} \right)$; B) $\gamma^{''} = \left( \begin{array}{r}
0 \\
6t \\
2
\end{array} \right)$ so $\gamma^{''}(1) = \left( \begin{array}{r}
0 \\
6 \\
2
\end{array} \right)$ ; C) it suffices to take the cross product or by
observation choose $C\left( \begin{array}{r}
7 \\
1 \\
 - 3
\end{array} \right)$

[PROBLEM 19.]{.underline} A) $\gamma' = \left( \begin{array}{r}
2t - 5 \\
3t^{2} - 6t + 2 \\
e^{t - 1}
\end{array} \right)$ and $\gamma^{''} = \left( \begin{array}{r}
2 \\
6t - 6 \\
e^{t - 1}
\end{array} \right)$ at origin ($t = 1$) these evaluate to
$\left( \begin{array}{r}
 - 3 \\
 - 1 \\
1
\end{array} \right)$ and $\left( \begin{array}{r}
2 \\
0 \\
1
\end{array} \right)$; B) use $\gamma'(1)$ to obtain $- 3x - y + z = 0$

[PROBLEM 20.]{.underline} A) $\gamma' = \left( \begin{array}{r}
(1 + t)^{- 1} \\
3e^{3t} \\
2\cos{2t}
\end{array} \right)$ and $\gamma^{''} = \left( \begin{array}{r}
 - (1 + t)^{- 2} \\
9e^{3t} \\
 - 4\sin{2t}
\end{array} \right)$; B) $\left( \begin{array}{r}
1 \\
3 \\
2
\end{array} \right) \times \left( \begin{array}{r}
 - 1 \\
9 \\
 - 4
\end{array} \right) = \left( \begin{array}{r}
 - 30 \\
2 \\
12
\end{array} \right)$

## Week 3 : Matrix Algebra

### OUTLINE

MATERIALS: Calculus BLUE : Vol 1 : Chapters 9-13

TOPICS:

- Matrices; sizes; specials (identity, zero, diagonal, triangular)

- The transpose operation on matrices

- The use of matrices as data structures

- Matrix multiplication, including matrix-vector multiplication

- Square matrices and powers

- Block matrices and their products

- Linear systems of equations: $A\mathbf{x} = \mathbf{b}$

- Row operations and row reduction

- Row reduction and back-substitution of augmented matrices

- Inverse matrices: definition, computation, and use

LEARNING OBJECTIVES:

- Identify sizes, rows, and columns of matrices

- Compute matrix-vector and matrix-matrix products

- Recognize that matrix multiplication is associative but not
  commutative

- Compute and work with the transpose of a matrix

- Convert linear systems of equations into the form
  $A\mathbf{x} = \mathbf{b}$

- Solve $A\mathbf{x} = \mathbf{b}$ via row reduction and
  back-substitution

- Use the formula for the inverse of a 2-x-2 matrix

- Compute the inverse of a square matrix via row reduction

- Solve $A\mathbf{x} = \mathbf{b}$ for $\mathbf{x}$ given the inverse
  $A^{- 1}$

- Compute products, powers, and inverses of simple block matrices

### PRIMER

MATRICES. An $m$-by-$n$ matrix $A = \left( A_{ij} \right)$ is,
initially, a finite 2-dimensional array indexed by $m$ rows and $n$
columns. These are ubiquitous in applications. Of the many types of
matrices relevant to calculus, the most important include the *square*
matrices (where $m = n$), the *diagonal* matrices (which satisfy
$A_{ij} = 0$ for all $i \neq j$), and the *triangular* matrices (where,
for an *upper* or *lower* triangular matrix, $A_{ij} = 0$ for all
$i > j$ and $i < j$ respectively). The *zero* matrix $Z$ consists of all
zeros; the *identity* matrix $I$ is a square diagonal matrix whose
diagonal entries are all one. These are both examples of *binary*
matrices, whose entries are either zero or one.

Remembering the index ordering is necessary. The entry $A_{ij}$ lies in
the $i$^th^ row and the $j$^th^ column. A $1$-by-$n$ matrix can be
thought of as a vector, and we will often conflate the two. An
$m$-by-$1$ matrix can be rightly called a *row vector*.

The *transpose* of a matrix $A = (A_{ij})$ is the matrix $A^{T}$ whose
rows and columns are exchanged: thus,
$\left( A_{ij}^{T} \right) = (A_{ji})$. One imagines flipping all
entries across the matrix diagonal. The transpose of a vector,
$\mathbf{v}^{T}$ can be thought of as a *row vector*.

MULTIPLICATION. The simplest algebraic operations on matrices mimic
those of vectors -- addition and scalar multiplication -- acting
termwise. The truly useful and interesting operation is that of *matrix
multiplication*, which generalizes the dot product of vectors. For
matrices $A$ and $B$, the product $AB$ is defined if the number of
columns of $A$ agrees with the number of rows of $B$. The formula is:

$$(AB)_{ij} = \sum_{k}^{\ }{A_{ik}B_{kj}}\ .$$

This specializes to the case of matrix-vector multiplication
$A\mathbf{v}$, where $\mathbf{v}$ is a vector whose size matches that of
the number of columns of $A$. This multiplication is *associative*,
meaning that $(AB)C = A(BC)$; it is not necessarily *commutative*, since
$AB \neq BA$ in general (even if they are of the same size). The zero
matrix $Z$ acts like a "$0$" with respect to multiplication, and the
identity matrix $I$ acts like a "$1$" under multiplication:
$AI = IA = A$ for all square $A$.

Understanding well how matrix
multiplication operates is critical to our larger story of working with
multivariate functions. From the definition, one can think of the entry
$(AB)_{ij}$ of a product matrix $AB$ as being the dot product of the
$i$^th^ row of $A$ with the $j$^th^ column of $B$. This is best
illustrated as putting the product *in the corner*. In the case of
matrix-vector multiplication, there is a very useful alternate way of
thinking about the product. Given a vector $\mathbf{v} = (v_{i})$, the
product $A\mathbf{v}$ can be written as a linear combination of columns
of $A$, weighted by the terms in $\mathbf{v}$. Internalizing this
perspective now will be very useful.

LINEAR SYSTEMS. Systems of linear (or *affine*) equations can be put
into matrix-vector form as a single equation $A\mathbf{x} = \mathbf{b}$,
where $\mathbf{x}$ is the vector of unknowns to be solved for. This type
of equation is of supreme importance in applications and knowing how to
solve such a system has significant implications. The high-school
algebra approach of combination and substitution generalizes to a formal
algorithm called *row-reduction*. There are three *row operations* which
can be applied to an *augmented matrix* of the form
$\lbrack\ A\ |\ \mathbf{b}\ \rbrack$ and which preserve solutions. These
row operations are as follows:

- R1 : switch two rows : $R_{i} \leftrightarrow R_{j}$

- R2 : multiply a row by a nonzero scalar : $R_{i} \mapsto cR_{i}$

- R3 : combine a multiple of one row with another :
  $R_{i} \mapsto R_{i} + cR_{j}$

These operations can be performed sequentially to reduce an augmented
matrix to *row-echelon form*, making the matrix as triangular as
possible, with extra zeros perhaps. At any point during row reduction,
one may write out the resulting system of modified equations to solve
for the variables $\mathbf{x}$. Doing so post-reduction to row-echelon
form is called *back-substitution*.

There are some subtleties to performing row-reduction. One very much
wants for the upper-left corner entry (the *pivot*) to be nonzero; a
value of $1$ here is ideal. Note that the third row operation R3 does
not rescale $R_{i}$: this will be relevant in Week 4 when computing
determinants.

INVERSES. One cannot divide by a matrix to solve
$A\mathbf{x} = \mathbf{b}$ as $\mathbf{x} = \mathbf{b}/A$. However, for
certain square matrices $A$, it is possible to define something in the
spirit of a reciprocal. Such an *inverse matrix* is defined to be a
matrix $A^{- 1}$ such that $A^{- 1}A = I = AA^{- 1}$. Such can only
exist in the case of a square matrix, and, even then, existence is not
assured. For a 2-by-2 matrix, there is a convenient general formula:

$$A = \begin{bmatrix}
a & b \\
c & d
\end{bmatrix}\ \ \ :\ \ \ A^{- 1} = \frac{1}{ad - bc}\begin{bmatrix}
d & - b \\
 - c & a
\end{bmatrix}\ .\ \ $$

This quantity in the denominator seems to determine whether the inverse
exists. When the inverse does exist, it makes short work of solving
$A\mathbf{x} = \mathbf{b}$ via

$$A\mathbf{x} = \mathbf{b}\ \ \  \Rightarrow \ \ \ A^{- 1}\left( A\mathbf{x} \right) = A^{- 1}\mathbf{b}\ \ \  \Rightarrow \ \ \ \mathbf{x} = A^{- 1}\mathbf{b\ .}$$

To find the inverse of a square matrix $A$ one can apply row operations
to an identity-augmented matrix $\lbrack\ A\ |\ I\ \rbrack$ *ad nauseum*
until reducing the left side to the identity, so that the right side
reveals the inverse:
$\lbrack\ A\ |\ I\ \rbrack \rightarrow \lbrack\ I\ |\ A^{- 1}\ \rbrack$.
This is computationally tedious, but it does work, if the inverse
exists. Inverses of products (assuming existence) follows a pattern
reminiscent of transposes: $(AB)^{- 1} = B^{- 1}A^{- 1}$. The few simple
cases where inverse matrices can be computed without too much trouble
include *triangular* matrices (where, in addition, all the diagonal
terms are nonzero) and *block-diagonal* matrices with small invertible
blocks along the generalized diagonal.

### DISCUSSION

[QUESTION 1.]{.underline} What are the sizes of the following matrices?
Which pairwise products are well-defined?

$$A = \begin{bmatrix}
5 & 1 \\
1 & 3 \\
 - 2 & 0
\end{bmatrix}\ \ \ :\ B = \begin{bmatrix}
 - 3 & 0 & - 5 \\
 - 1 & 7 & 1 \\
2 & 1 & 4
\end{bmatrix}\ \ :\ C = \begin{bmatrix}
0 & 4 & 7 \\
1 & - 5 & 3
\end{bmatrix}$$

This question can of course be done with random matrices made up
on-the-spot. Follow up with: what are the transposes of these matrices?

[QUESTION 2.]{.underline} Can you do the following matrix-vector product
in your head?

$$\begin{bmatrix}
5 & - 7 & 1 & - 9 \\
1 & 3 & 0 & 11 \\
2 & - 2 & - 3 & 7 \\
0 & 5 & 1 & 2
\end{bmatrix}\begin{pmatrix}
1 \\
0 \\
 - 2 \\
0
\end{pmatrix}\ \ \ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ solution = \ \begin{pmatrix}
5 \\
1 \\
2 \\
0
\end{pmatrix} - 2\begin{pmatrix}
1 \\
0 \\
 - 3 \\
1
\end{pmatrix}$$

Emphasize that this way of thinking in terms of linear combinations of
columns will be very important next week and in the future.

[QUESTION 3.]{.underline} Consider a pair of vectors $\mathbf{u}$ and
$\mathbf{v}$ in $\mathbb{R}^{n}$. Can you interpret the product
$\mathbf{u}^{T}\mathbf{v}$? How does it compare to
$\mathbf{v}^{T}\mathbf{u}$? What about $\mathbf{uv}^{T}$?

This helps solidify the dot product from last week, while giving a
useful new notation to be seen later.

[QUESTION 4.]{.underline} Solve the following linear system:

$${x - y + 2z\  = \ 0
}{2x + y - 3z = 1
}{- 3x + 2y + z = 2}$$

This is surely just as easy to solve via high-school algebra, but the
point is not to get the solution quickly; it is to learn an algorithmic
approach that will work even when the system is huge.

[QUESTION 5.]{.underline} Let $c$ be a constant. For what values of $c$
does this system have a solution?

$${x + y + cz = 1
}{x + cy + z = 1
}{cx + y + z = c}$$

This is a little tedious, but it communicates that solutions do not
necessarily exist, and how to discern this from a row reduction.

[QUESTION 6.]{.underline}

Recall in Week 1 Question 2 we tried to parameterize the intersection of
the planes

$$3x + y - z = 4\ \ \ \ \ :\ \ \ \ x - 2y + z = 1\ ,$$

Set this up as a row-reduction problem and obtain an answer.

This is a good pattern to get into -- reflecting on previous problems
using new technology.

[QUESTION 7.]{.underline} Compute the inverse of the following 3-x-3
matrix the long way, row-reducing an augmented matrix, to show

$$\begin{bmatrix}
1 & 0 & 2 \\
3 & 1 & 0 \\
0 & - 1 & - 2
\end{bmatrix}^{- 1} = \ \frac{1}{8}\begin{bmatrix}
2 & 2 & 2 \\
 - 6 & 2 & - 6 \\
3 & - 1 & - 1
\end{bmatrix}$$

Do you think there is a general formula for computing 3-x-3 inverses?
Yes. Yes, there is. Will we learn it? No, we will not.

[QUESTION 8.]{.underline} How would you invert the following 4-by-4
matrix?

$$A = \begin{bmatrix}
2 & 5 & 0 & 0 \\
1 & 3 & 0 & 0 \\
0 & 0 & 5 & 3 \\
0 & 0 & 3 & 2
\end{bmatrix} = \begin{bmatrix}
B & 0 \\
0 & C
\end{bmatrix}$$

This raises the important topic of block- and block-diagonal matrices.
Encourage students to figure out the properties of this structure and
how it behaves under multiplication. Students may guess at the solution:
how would you confirm this? Students may opt to begin with row-reduction
-- don't discourage that, but rather encourage them to notice what is
happening structurally. Ex post, one can mention how important
block-diagonal matrices are in decoupling large systems.

[QUESTION 9.]{.underline} As it is difficult to compute an inverse of a
large matrix, is there ever any justification for doing so? Is it not
always better simply to row-reduce $A\mathbf{x} = \mathbf{b}$**?**

Try to get students to think of a situation where the $b$ vector is
changing or otherwise full of parameters that can vary. It is perhaps
worth noting that explicit inverses of large matrices are almost never
computed in practice: there are better algorithmic approaches.

[QUESTION 10.]{.underline} Consider the matrix equation $AB = 0$. If
these were scalars, then you know that either $A$ or $B$ would have to
be zero. Can you find an example where this is not the case for
matrices?

This will probably be too challenging. Give a hint that it can be done
with a single 2-x-2 matrix whose square vanishes. A very important
example of such a matrix is

$$N = \begin{bmatrix}
0 & 1 \\
0 & 0
\end{bmatrix}\ .$$

This is a nilpotent matrix -- its powers vanish. This matrix and
matrices like it have a prominent role to play in Linear Algebra.

[QUESTION 11.]{.underline} You can compute a square or a cube of a
(square) matrix, but can you compute an arbitrary power? Try it with a
diagonal matrix... Now try:

$$A = \begin{bmatrix}
2 & 1 \\
0 & 2
\end{bmatrix}\ .$$

Students can perhaps guess at a pattern after a few powers. For a more
principled solution (that will assist in Linear Algebra courses) remind
them of the Binomial Theorem and compute $A^{n}$ as

$${A^{n} = (2I + N)^{n} = (2I)^{n} + n(2I)^{n - 1}N + \begin{pmatrix}
n \\
2
\end{pmatrix}(2I)^{n - 2}N^{2} + \cdots
}{\ \ \ \ \ \  = 2^{n}I + n2^{n - 1}N = \begin{bmatrix}
2^{n} & n2^{n - 1} \\
0 & 2^{n}
\end{bmatrix}}$$

This is a rather advanced problem: not for the beginner. Be sure to have
done Question 10.

[QUESTION 12.]{.underline} Why is it that the inverse of the product of
two invertible matrices is the product of the inverses, but in reverse
order? That is,

$$(AB)^{- 1} = B^{- 1}A^{- 1}$$

Students should use the definition of the inverse and the associativity
of multiplication.

[QUESTION 13.]{.underline} Is it the case that the inverse of a power of
an invertible matrix is the power of the inverse? That is, is it true
that:

$$\left( A^{n} \right)^{- 1} = \left( A^{- 1} \right)^{n} = :A^{- n}$$

Students should again use the definition
of the inverse. This is a good opportunity to discuss the role of
notation in Mathematics: we set up the notation to help us think
clearly. As a follow-up question, ask what -- if the notation is
consistent -- the definition of $A^{0}$ should be for a square matrix.

[QUESTION 14.]{.underline} Why is it that
the transpose of the product of two matrices is the product of the
transposes, but in reverse order?

$$(AB)^{T} = B^{T}A^{T}$$

The argument used with inverses no longer works... What to do? Students
will likely struggle with this one, probably resorting eventually to
verifying it on an example. Point out that although this does not
suffice, it does send one in the right direction. This is a good time to
remind students about the power and convenience of multiplying matrices
by putting the product "in the corner." Upon so doing, flipping the
entire diagram along the diagonal of the product matrix reveals truth.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the following matrices / vectors:

$$A = \begin{bmatrix}
4 & - 3 \\
2 & 1
\end{bmatrix}\ \ \ :\ \ B = \begin{bmatrix}
1 & 2 & 0 \\
 - 1 & 4 & 3
\end{bmatrix}\ \ \ :\ \ \mathbf{u} = \begin{pmatrix}
3 \\
 - 2
\end{pmatrix}\ \ \ :\ \ \mathbf{v} = \begin{pmatrix}
4 \\
1 \\
 - 1
\end{pmatrix}$$

Compute the following products, if possible: if not, explain why not.

A\) $AB$ B) $B\mathbf{v}$ C) $\mathbf{u}^{T}A$ D) $A\mathbf{u}$ E)
$A^{2}$

[PROBLEM 2.]{.underline} Consider the following matrices / vectors:

$$A = \begin{bmatrix}
1 & 3 \\
2 & 2 \\
 - 1 & 4
\end{bmatrix}\ \ \ :\ \ B = \begin{bmatrix}
 - 1 & 1 & 0 \\
3 & - 6 & 4 \\
1 & 0 & 2
\end{bmatrix}\ \ \ :\ \ \mathbf{x} = \begin{pmatrix}
1 \\
 - 1
\end{pmatrix}\ \ \ :\ \ \mathbf{y} = \begin{pmatrix}
2 \\
1 \\
0
\end{pmatrix}$$

Compute the following products, if possible: if not, explain why not.

A\) $BA$ B) $B\mathbf{y}$ C) $\mathbf{x}^{T}A$ D) $A\mathbf{x}$ E)
$A^{2}$

[PROBLEM 3.]{.underline} Consider the following matrices / vectors:

$$A = \begin{bmatrix}
1 & - 1 \\
0 & 2 \\
 - 1 & 0
\end{bmatrix}\ :B = \begin{bmatrix}
3 & 1 & 2 \\
5 & - 1 & 2
\end{bmatrix}\ :\ \mathbf{x} = \begin{pmatrix}
2 \\
 - 1
\end{pmatrix}\ :\mathbf{y} = \begin{pmatrix}
0 \\
1 \\
1
\end{pmatrix}$$

Compute the following products, if possible: if not, explain why it's
not possible.

A\) $(BA)^{2}$ B) $\mathbf{y}^{T}A^{T}$ C)
$\left( A\mathbf{x} \right)^{T}$ D) $(AB)\mathbf{x}$

[PROBLEM 4.]{.underline} Consider the following matrix and vectors:

$$A = \begin{bmatrix}
1 & - 1 & 2 & 2 \\
 - 1 & 3 & 0 & 1 \\
2 & 1 & - 3 & 1 \\
4 & 0 & 1 & 0
\end{bmatrix}\ \ \ :\ \ \mathbf{u} = \begin{pmatrix}
1 \\
0 \\
 - 2 \\
1
\end{pmatrix}\ \ \ :\ \ \mathbf{v} = \begin{pmatrix}
1 \\
1 \\
1 \\
2
\end{pmatrix}\ \ \ :\ \ \mathbf{w} = \begin{pmatrix}
 - 2 \\
 - 1 \\
1 \\
 - 3
\end{pmatrix}$$

A\) Evaluate the dot product $\mathbf{v} \cdot A\mathbf{u}$ if possible.
If not, explain why not.

B\) Compute $A\mathbf{u} + A\mathbf{v} + A\mathbf{w}$ .

C\) Compute the quantity
$\left( \mathbf{v}\ \mathbf{w}^{T} \right)\mathbf{u}$ if it exists; if
not, explain why not.

[PROBLEM 5.]{.underline} Consider the following matrices / vectors:

$$A = \begin{bmatrix}
3 & 6 \\
 - 2 & 5 \\
7 & - 1
\end{bmatrix}\ :B = \begin{bmatrix}
4 & 1 & 3 \\
2 & - 6 & 0
\end{bmatrix}\ :\ \mathbf{x} = \begin{pmatrix}
2 \\
 - 1
\end{pmatrix}\ :\mathbf{y} = \begin{pmatrix}
3 \\
0 \\
 - 2
\end{pmatrix}$$

Compute the following products, if possible: if not, explain why not.

A\) $AB$ B) $B\mathbf{y}$ C) $\mathbf{x}B$ D)
$\mathbf{x}^{T}B\mathbf{y}$ E) $A^{T}$

[PROBLEM 6.]{.underline} Consider the following matrix product.

$$AB = \begin{bmatrix}
1 & - 4 & 0 & 7 \\
7 & 3 & 2 & - 3 \\
 - 4 & 1 & 4 & 8 \\
0 & - 3 & 11 & - 3 \\
3 & 2 & 0 & 1
\end{bmatrix}\begin{bmatrix}
1 & - 5 & - 4 & 1 & - 2 \\
2 & 5 & 4 & 3 & - 3 \\
0 & 1 & 8 & 9 & 3 \\
3 & 2 & 2 & - 1 & 8
\end{bmatrix}$$

A\) What are the sizes of the two matrices $A$ and $B$ (write as
\_\_\_-by-\_\_\_) and what is the size of the product matrix $AB$?

B\) Compute the $(2,\ 3)$ entry of the product matrix: $(AB)_{2,3}$.

C\) Write out the transpose $B^{T}$.

[PROBLEM 7.]{.underline} Consider the following matrix and vectors:

$$A = \begin{bmatrix}
1 & - 1 & 2 & 2 \\
 - 1 & 3 & 0 & 1 \\
2 & 1 & - 3 & 1 \\
4 & 0 & 1 & 0
\end{bmatrix}\ \ \ :\ \ \mathbf{u} = \begin{pmatrix}
2 \\
0 \\
2 \\
0
\end{pmatrix}\ \ \ :\ \ \mathbf{v} = \begin{pmatrix}
0 \\
1 \\
1 \\
 - 1
\end{pmatrix}\ \ \ :\ \ \mathbf{w} = \begin{pmatrix}
1 \\
1 \\
1 \\
1
\end{pmatrix}$$

A\) Evaluate the dot product $\mathbf{v} \cdot A\mathbf{u}$

B\) Which of the vectors, $A\mathbf{u},\ A\mathbf{v},$ or $A\mathbf{w}$
is longest?

C\) Compute the quantity
$\left( \mathbf{v}\ \mathbf{w}^{T} \right)\mathbf{u}$ (if it exists ; if
not, explain why not).

[PROBLEM 8.]{.underline} Consider the following pair of linear
equations:

$${3x - 5y = - 6
}{5x - 8y = 2}$$

A\) Rewrite this as a linear system of the form
$A\mathbf{x} = \mathbf{b}$, specifying $A,\ \mathbf{x}$, and
$\mathbf{b}$ carefully.

B\) What's the inverse of the matrix $A$ you found above?

C\) Of course, you could solve this system using basic algebra or a
matrix inverse. Please do not do that. Do please solve this very
carefully using row reduction and substitution.

[PROBLEM 9.]{.underline} Consider the following row reduction of an
augmented matrix:

$$\begin{bmatrix}
0 & 0 & 1 & a & 1 \\
0 & 2 & 1 & - 1 & 3 \\
1 & - 2 & 1 & 5 & c \\
3 & b & 5 & 22 & 8
\end{bmatrix}\sim\begin{bmatrix}
1 & - 2 & 1 & 5 & c \\
0 & 2 & 1 & - 1 & 3 \\
0 & 0 & 1 & a & 1 \\
3 & - 6 & 5 & 22 & 8
\end{bmatrix}\sim\begin{bmatrix}
1 & - 2 & 1 & 5 & c \\
0 & 2 & 1 & - 1 & 3 \\
0 & 0 & 1 & a & 1 \\
0 & 0 & 2 & 7 & 8
\end{bmatrix}\sim\begin{bmatrix}
1 & - 2 & 1 & 5 & c \\
0 & 2 & 1 & - 1 & 3 \\
0 & 0 & 1 & a & 1 \\
0 & 0 & 0 & 3 & 6
\end{bmatrix}\ $$

*Here,* $a,\ b,\ c$ *are some constants, to be determined.*

A\) Using variable names $x_{1},\ x_{2},x_{3},x_{4}$, write out the
original system of equations as represented by the augmented matrix on
the left, above. *(With* $a,\ b,\ c$ *constants too...)*

B\) Determine the values of the constants $a,\ b,\ c$ with brief
explanations please.

C\) Using the rightmost augmented matrix above (with $a,\ b,\ c$ values
filled in), solve the system of equations.

[PROBLEM 10.]{.underline} Consider the following row-reduction with five
steps:

$${\begin{bmatrix}
3 & 13 & - 5 \\
1 & 3 & - 1 \\
2 & 4 & 3
\end{bmatrix}\overset{1}{\rightarrow}\begin{bmatrix}
1 & 3 & - 1 \\
3 & 13 & - 5 \\
2 & 4 & 3
\end{bmatrix}\overset{2}{\rightarrow}\begin{bmatrix}
1 & 3 & - 1 \\
0 & 4 & - 2 \\
2 & 4 & 3
\end{bmatrix}\overset{3}{\rightarrow}
}{\begin{bmatrix}
1 & 3 & - 1 \\
0 & 4 & - 2 \\
0 & - 2 & 5
\end{bmatrix}\overset{4}{\rightarrow}\begin{bmatrix}
1 & 3 & - 1 \\
0 & 2 & - 1 \\
0 & - 2 & 5
\end{bmatrix}\overset{5}{\rightarrow}\begin{bmatrix}
1 & 3 & - 1 \\
0 & 2 & - 1 \\
0 & 0 & 4
\end{bmatrix}}$$

A\) Write out descriptions of what happened at each step. Be precise,
citing row numbers and what operations took place.

B\) If an augmented matrix row-reduces as above to give:

$$\ \begin{bmatrix}
1 & 3 & - 1 \\
0 & 2 & - 1 \\
0 & 0 & 4
\end{bmatrix}\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
13 \\
 - 8 \\
24
\end{pmatrix}$$

Solve for $x,\ y,\ $and $z$.

[PROBLEM 11.]{.underline} The following is a sequence of elementary row
operations that row-reduces the beginning matrix, $A$, to a triangular
form.

$$\begin{bmatrix}
2 & - 2 & 0 & 2 & \  \\
1 & - 1 & 0 & - 2 & 11 \\
0 & 0 & 2 & 5 & 1 \\
0 & - 2 & 1 & 3 & - 2 \\
 - 3 & 3 & 4 & \  & - 15
\end{bmatrix} \Rightarrow \begin{bmatrix}
1 & - 1 & 0 & 1 & 5 \\
1 & - 1 & 0 & - 2 & 11 \\
0 & 0 & 2 & 5 & 1 \\
0 & - 2 & 1 & 3 & - 2 \\
 - 3 & 3 & 4 & \  & - 15
\end{bmatrix} \Rightarrow \begin{bmatrix}
1 & - 1 & 0 & 1 & 5 \\
0 & 0 & 0 & - 3 & 6 \\
0 & 0 & 2 & 5 & 1 \\
0 & - 2 & 1 & 3 & - 2 \\
 - 3 & 3 & 4 & \  & - 15
\end{bmatrix}$$

$$\Rightarrow \begin{bmatrix}
1 & - 1 & 0 & 1 & 5 \\
0 & 0 & 0 & - 3 & 6 \\
0 & 0 & 2 & 5 & 1 \\
0 & - 2 & 1 & 3 & - 2 \\
0 & 0 & 4 & \  & 0
\end{bmatrix} \Rightarrow \begin{bmatrix}
1 & - 1 & 0 & 1 & 5 \\
0 & - 2 & 1 & 3 & - 2 \\
0 & 0 & 2 & 5 & 1 \\
0 & 0 & 0 & - 3 & 6 \\
0 & 0 & 4 & \  & 0
\end{bmatrix} \Rightarrow \begin{bmatrix}
1 & - 1 & 0 & 1 & 5 \\
0 & - 2 & 1 & 3 & - 2 \\
0 & 0 & 2 & 5 & 1 \\
0 & 0 & 0 & - 3 & 6 \\
0 & 0 & 0 & 0 & \ 
\end{bmatrix}$$

Fill in the missing entries in the matrices above.

[PROBLEM 12.]{.underline} Consider the following system of linear
equations

$${2x + 3y + 4z = 14
}{x + 2y + 3z = 8
}{4y + 7z = 9}$$

A\) Rewrite this as a linear system of the form
$A\mathbf{x} = \mathbf{b}$, specifying $A,\ \mathbf{x}$, and
$\mathbf{b}$ carefully.

B\) Row-reduce the augmented matrix of this system to lower-triangular
form.

C\) Solve for the variables $x,\ y,\ z$ using your answer to part (B).

[PROBLEM 13.]{.underline} Consider the following row-reduction:

$$A = \begin{bmatrix}
1 & - 2 & 1 & 0 \\
3 & - 4 & 8 & 3 \\
1 & 0 & 7 & 4 \\
0 & 2 & 5 & 6
\end{bmatrix}\ \  \Rightarrow \ \ \begin{bmatrix}
1 & - 2 & 1 & 0 \\
0 & 2 & 5 & 3 \\
0 & 0 & 1 & 1 \\
0 & 0 & 0 & 3
\end{bmatrix}\  = B\ $$

A\) Write out the steps of the row-reduction, identifying each row
operation.

B\) Solve the system of equations given by

$$B\mathbf{x} = \ \begin{bmatrix}
1 & - 2 & 1 & 0 \\
0 & 2 & 5 & 3 \\
0 & 0 & 1 & 1 \\
0 & 0 & 0 & 3
\end{bmatrix}\begin{pmatrix}
x \\
y \\
z \\
w
\end{pmatrix} = \begin{pmatrix}
 - 2 \\
1 \\
 - 1 \\
6
\end{pmatrix}$$

[PROBLEM 14.]{.underline} Consider the following system of linear
equations

$${x - 2y + 3u - v = - 10
}{2x - 7y - 4u + v = \ 16
}{3u - 2v = - 13
}{6u + 3v = - 12}$$

A\) Rewrite this as a linear system of the form
$A\mathbf{x} = \mathbf{b}$, specifying $A,\ \mathbf{x}$, and
$\mathbf{b}$ carefully.

B\) Row-reduce the augmented matrix of this system to lower-triangular
form.

C\) Solve the original equations for the unknowns using your answer to
part (B) and back-substitution.

[PROBLEM 15.]{.underline} Consider the following row-reduction of an
augmented matrix:

$$\begin{bmatrix}
4 & 1 & - 3 & 7 \\
1 & 0 & - 2 & 1 \\
0 & 1 & 3 & 5
\end{bmatrix}\ \  \Rightarrow \lbrack work\rbrack \Rightarrow \ \ \begin{bmatrix}
1 & 0 & - 2 & 1 \\
0 & 1 & 5 & 3 \\
0 & 0 & - 2 & 2
\end{bmatrix}\ \ $$

A\) If this comes from the augmented matrix for a system of equations
$A\mathbf{x} = \mathbf{b}$, then specify the matrix $A$ and the vector
$\mathbf{b}$.

B\) Write out the steps of the row-reduction above, identifying
carefully what happens at each step.

C\) Solve the original system of equations for
$\mathbf{x} = (x\ \ \ y\ \ \ z)^{T}$.

[PROBLEM 16.]{.underline} Consider the following system of linear
equations

$${x_{1} + 3x_{2} - x_{4} = - 1
}{3x_{1} + 7x_{2} - 2x_{3} = - 5
}{2x_{2} + 5x_{3} - 3x_{4} = 11
}{3x_{3} + 5x_{4} = 19}$$

A\) Rewrite this as a linear system of the form
$A\mathbf{x} = \mathbf{b}$, specifying $A,\ \mathbf{x}$, and
$\mathbf{b}$ carefully.

B\) Row-reduce the augmented matrix of this system to lower-triangular
form.

C\) Solve the original equations for the unknowns using your answer to
part (B).

[PROBLEM 17.]{.underline} The matrices $A$ and $B$ below are invertible,
with inverses given:

$${A = \begin{bmatrix}
 - 5 & 2 & - 3 \\
4 & 2 & - 3 \\
2 & 1 & 3
\end{bmatrix}\ \ \ :\ {\ A}^{- 1} = \frac{1}{9}\begin{bmatrix}
 - 1 & 1 & 0 \\
2 & 1 & 3 \\
0 & - 1 & 2
\end{bmatrix}
}{B = \begin{bmatrix}
2 & 3 & 4 \\
0 & 3 & 2 \\
 - 2 & 5 & 1
\end{bmatrix}\ \ \ \ \ :\ B^{- 1} = \frac{1}{2}\begin{bmatrix}
7 & - 17 & 6 \\
4 & - 10 & 4 \\
 - 6 & 16 & - 6
\end{bmatrix}}$$

Solve the equation $(AB)\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
0 \\
1 \\
0
\end{pmatrix}$ for $x,\ y,\ $and $z$.

[PROBLEM 18.]{.underline} Consider the following system of linear
equations

$${x_{2} + 3x_{3} - x_{4} = - 2
}{x_{1} + 2x_{2} - x_{4} = \ 0
}{3x_{3} + 5x_{4} = 3
}{x_{1} + 2x_{2} = 3}$$

A\) Rewrite this as a linear system of the form
$A\mathbf{x} = \mathbf{b}$, specifying $A,\ \mathbf{x}$, and
$\mathbf{b}$.

B\) The inverse of the matrix $A$ you should have found above is given
below. Use it to solve for the variables in the equations above.

$$A^{- 1} = \begin{bmatrix}
 - 2 & 12 & 2 & - 11 \\
1 & - 6 & - 1 & 6 \\
0 & \frac{5}{3} & \frac{1}{3} & - \frac{5}{3} \\
0 & - 1 & 0 & 1
\end{bmatrix}\ \ \ \ \ $$

[PROBLEM 19.]{.underline} Consider the following matrices:

$$A = \begin{bmatrix}
3 & - 4 & 0 & 0 \\
4 & - 5 & 0 & 0 \\
0 & 0 & 2 & - 7 \\
0 & 0 & - 1 & 3
\end{bmatrix}\ \ \ \ \ \ \ :\ \ \ \ B = \begin{bmatrix}
0 & 0 & - 2 & 4 \\
0 & 0 & 5 & 0 \\
2 & 1 & 0 & 0 \\
3 & - 1 & 0 & 0
\end{bmatrix}$$

A\) Compute the matrix $B^{2}$

B\) Compute the inverse $A^{- 1}$

[PROBLEM 20.]{.underline} (22c:3.2.2) In the following, please
explain/follow instructions.

A\) ***Explain***: if $A$ is a diagonal square matrix, then is it
invertible? *If true, give reasons why; if false, give an example which
demonstrates your answer.*

B\) ***Explain:*** in the case where $A$ and $B$ are 3-by-3 matrices and
$\mathbf{v}$ is a vector in $\mathbb{R}^{3}$, then
is$\ A^{T}B\mathbf{v}$ also a vector in $\mathbb{R}^{3}$? *If true, give
reasons why; if false, give an example which demonstrates your answer.*

C\) ***Explain***: Is it ever possible for a matrix to be its own
inverse? $A^{- 1} = A$ for some $A$? *If possible, give an example;
else, give a reason why it is impossible.*

D\) **Explain:** If, for a matrix $B$ its square $B^{2}$ exists, then
must $B$ be a square matrix? *If true, give reasons why; if false, give
an example which demonstrates your answer.*

[PROBLEM 21.]{.underline} (22a:2.2.2) Consider the following square
matrix

$$A = \begin{bmatrix}
1 & - 4 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & - 3 \\
0 & 0 & 4 & 0 & 0 \\
0 & 0 & 0 & 1/3 & 0 \\
3 & 2 & 0 & 0 & 0
\end{bmatrix}$$

Compute the inverse $A^{- 1}$ (if it exists), showing all steps.

[PROBLEM 22.]{.underline} (22c:2.4.2) Consider the following matrices:

$$A = \begin{bmatrix}
 - 2 & 1 \\
 - 1 & 0 \\
2 & - 1
\end{bmatrix}\ \ :\ B = \ \begin{bmatrix}
0 & 4 \\
2 & 3
\end{bmatrix}\ \ :\ C = \begin{bmatrix}
1 & - 3 \\
0 & 2
\end{bmatrix}\ \ :\ D = \begin{bmatrix}
0 & 1 & 3 \\
2 & - 1 & 0
\end{bmatrix}\ $$

Compute/simplify the following inverses, if possible: if not, explain
why not.

A\) $B^{- 1}$ B) $(DA)^{- 1}$ C) $\left( B^{T} \right)^{- 1}$ D)
$(CD)^{- 1}$

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) $\begin{bmatrix}
7 & - 4 & - 9 \\
1 & 8 & 3
\end{bmatrix}$ ; B) $\left( \begin{array}{r}
6 \\
 - 3
\end{array} \right)$ ; C) $\begin{bmatrix}
8 & - 11
\end{bmatrix}$ ; D) $\left( \begin{array}{r}
18 \\
4
\end{array} \right)$ ; E) $\begin{bmatrix}
10 & - 15 \\
10 & - 5
\end{bmatrix}$

[PROBLEM 2.]{.underline} A) $\begin{bmatrix}
1 & - 1 \\
 - 13 & 13 \\
 - 1 & 11
\end{bmatrix}$ ; B) $\left( \begin{array}{r}
 - 1 \\
0 \\
2
\end{array} \right)$ ; C) Nope ; D) $\left( \begin{array}{r}
 - 2 \\
0 \\
 - 5
\end{array} \right)$ ; E) Nope

[PROBLEM 3.]{.underline} A) $\begin{bmatrix}
 - 2 & 6 \\
 - 18 & 46
\end{bmatrix}$ ; B) Nope ; C) $\begin{bmatrix}
3 & - 2 & - 2
\end{bmatrix}$ ; D) Nope

[PROBLEM 4.]{.underline} A) $12$ ; B) $0$ ; C) $\left( \begin{array}{r}
 - 7 \\
 - 7 \\
 - 7 \\
 - 14
\end{array} \right)$

[PROBLEM 5.]{.underline} A) $\begin{bmatrix}
24 & - 33 & 9 \\
2 & - 32 & - 6 \\
26 & 13 & 21
\end{bmatrix}$ ; B) $\left( \begin{array}{r}
6 \\
6
\end{array} \right)$ ; C) Nope ; D) $6$ ; E) $\begin{bmatrix}
3 & - 2 & 7 \\
6 & 5 & - 1
\end{bmatrix}$

[PROBLEM 6.]{.underline} A) 5-by-4 times 4-by-5 yields 5-by-5 ; B) $- 6$
; C) $\begin{bmatrix}
1 & 2 & 0 & 3 \\
 - 5 & 5 & 1 & 2 \\
 - 4 & 4 & 8 & 2 \\
1 & 3 & 9 & - 1 \\
 - 2 & - 3 & 3 & 8
\end{bmatrix}$

[PROBLEM 7.]{.underline} A) $- 14$ ; B) $A\mathbf{u}$ ; C)
$\left( \begin{array}{r}
0 \\
4 \\
4 \\
 - 4
\end{array} \right)$

[PROBLEM 8.]{.underline} A) $\begin{bmatrix}
3 & - 5 \\
5 & - 8
\end{bmatrix}\left( \begin{array}{r}
x \\
y
\end{array} \right) = \left( \begin{array}{r}
 - 6 \\
2
\end{array} \right)\ $; B) $\begin{bmatrix}
3 & - 5 \\
5 & - 8
\end{bmatrix}^{- 1} = \begin{bmatrix}
 - 8 & 5 \\
 - 5 & 3
\end{bmatrix}$

[PROBLEM 9.]{.underline} B) $a = 2,\ \ b = - 6,\ \ c = 0$ ; C)
$x_{1} = 1,\ \ x_{2} = 4,\ \ x_{3} = - 3,\ \ x_{4} = 2$

[PROBLEM 10.]{.underline} A) steps are 1: $R_{1} \Leftrightarrow R_{2}$,
2: $R_{2} \Rightarrow R_{2} - 3R_{1}$, 3:
$R_{3} \Rightarrow R_{3} - 2R_{1}$, 4:
$R_{2} \Rightarrow \frac{1}{2}R_{2}$, 5:
$R_{3} \Rightarrow R_{3} + R_{2}$ ; B) $z = 6,\ \ y = - 1,\ \ x = 22$

[PROBLEM 11.]{.underline} in order,
$(10,\ 7),\ (7),\ (7),\ (10),\ (10),\ ( - 2)$

[PROBLEM 12.]{.underline} A) $\begin{bmatrix}
2 & 3 & 4 \\
1 & 2 & 3 \\
0 & 4 & 7
\end{bmatrix}\left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
14 \\
8 \\
9
\end{array} \right)$ ; B) $x = 3,\ y = 4,\ z = - 1$

[PROBLEM 13.]{.underline} A) $R_{2} \Rightarrow R_{2} - 3R_{1}$, then
$R_{3} \Rightarrow R_{3} - R_{1}$, $R_{3} \Rightarrow R_{3} - R_{2}$,
and $R_{4} \Rightarrow R_{4} - R_{2}$ ; B)
$x = 11,\ y = 5,\ z = - 3,\ w = 2$

[PROBLEM 14.]{.underline} A) $\begin{bmatrix}
1 & - 2 & 3 & - 1 \\
2 & - 7 & - 4 & 1 \\
0 & 0 & 3 & - 2 \\
0 & 0 & 6 & 3
\end{bmatrix}\left( \begin{array}{r}
x \\
y \\
u \\
v
\end{array} \right) = \left( \begin{array}{r}
 - 10 \\
16 \\
 - 13 \\
 - 12
\end{array} \right)\ $; B) $\left( \begin{array}{r}
x \\
y \\
u \\
v
\end{array} \right) = \left( \begin{array}{r}
1 \\
0 \\
 - 3 \\
2
\end{array} \right)$

[PROBLEM 15.]{.underline} A) $\begin{bmatrix}
4 & 1 & - 3 \\
1 & 0 & - 2 \\
0 & 1 & 3
\end{bmatrix}\left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
7 \\
1 \\
5
\end{array} \right)$ ; C) $z = - 1,\ y = 8,\ x = - 1$

[PROBLEM 16.]{.underline} A) $\begin{bmatrix}
0 & 3 & 0 & - 1 \\
3 & 7 & - 2 & 0 \\
0 & 2 & 5 & - 3 \\
0 & 0 & 3 & 5
\end{bmatrix}\left( \begin{array}{r}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right) = \left( \begin{array}{r}
 - 1 \\
 - 5 \\
11 \\
19
\end{array} \right)\ $; B) $\left( \begin{array}{r}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right) = \left( \begin{array}{r}
 - 2 \\
1 \\
3 \\
2
\end{array} \right)$

[PROBLEM 17.]{.underline} compute $B^{- 1}A^{- 1}\left( \begin{array}{r}
0 \\
1 \\
0
\end{array} \right) = B^{- 1}\frac{1}{9}\left( \begin{array}{r}
1 \\
1 \\
 - 1
\end{array} \right) = \frac{1}{9}\left( \begin{array}{r}
 - 8 \\
 - 7 \\
8
\end{array} \right)$

[PROBLEM 18.]{.underline} A) $\begin{bmatrix}
0 & 1 & 3 & - 1 \\
1 & 2 & 0 & - 1 \\
0 & 0 & 3 & 5 \\
1 & 2 & 0 & 0
\end{bmatrix}\left( \begin{array}{r}
x_{1} \\
x_{2} \\
x_{3} \\
x_{4}
\end{array} \right) = \left( \begin{array}{r}
 - 2 \\
0 \\
3 \\
3
\end{array} \right)$ ; B) $\left( \begin{array}{r}
4 \\
2 \\
0 \\
0
\end{array} \right) + 3\left( \begin{array}{r}
 - 9 \\
5 \\
 - \frac{4}{3} \\
1
\end{array} \right) = \left( \begin{array}{r}
1 \\
17 \\
 - 4 \\
3
\end{array} \right)$

[PROBLEM 19.]{.underline} A) $B^{2} = \begin{bmatrix}
8 & - 6 & 0 & 0 \\
10 & 5 & 0 & 0 \\
0 & 0 & 1 & 8 \\
0 & 0 & - 11 & 12
\end{bmatrix}$ ; B) $A^{- 1} = \begin{bmatrix}
 - 5 & 4 & 0 & 0 \\
 - 4 & 3 & 0 & 0 \\
0 & 0 & - 3 & - 7 \\
0 & 0 & - 1 & - 2
\end{bmatrix}$

[PROBLEM 20.]{.underline} A) No ; B) Yes ; C) Yes ; D) Yes

[PROBLEM 21.]{.underline} $A^{- 1} = \begin{bmatrix}
\frac{1}{7} & 0 & 0 & 0 & \frac{2}{7} \\
 - \frac{3}{14} & 0 & 0 & 0 & \frac{1}{14} \\
0 & 0 & \frac{1}{4} & 0 & 0 \\
0 & 0 & 0 & 3 & 0 \\
0 & - \frac{1}{3} & 0 & 0 & 0
\end{bmatrix}$

[PROBLEM 22.]{.underline} A) $\frac{1}{8}\begin{bmatrix}
 - 3 & 4 \\
2 & 0
\end{bmatrix}$ ; B) $\begin{bmatrix}
2 & 3 \\
3 & 5
\end{bmatrix}$ ; C) $\frac{1}{8}\begin{bmatrix}
 - 3 & 2 \\
4 & 0
\end{bmatrix}$ ; D) Nope

## 

## Week 4 : Linear Transformations

### OUTLINE

MATERIALS: Calculus BLUE : Vol 1 : Chapters 14-18

TOPICS:

- Bases, including orthogonal and orthonormal bases

- Coordinates of a Euclidean vector in a given basis

- Change of coordinates from the standard basis to another

- Linear transformations; algebraic and geometric interpretations

- Rotation, rescaling, and shear matrices in 2-D

- Order of operations in linear transformations

- Determinants: computation via minor expansion

- Geometric interpretation of determinants as oriented volumes

- Computation of determinants via row reduction operations

- Determinants under products and transposes

LEARNING OBJECTIVES:

- Distinguish between general, orthogonal, and orthonormal bases

- Express a vector in coordinates of a new basis via linear system

- Recognize rotations, shears, and rescalings in terms of matrices

- Discern a linear transformation based on how it acts on basis vectors

- Compose linear transformations via matrix multiplication

- Compute determinants via minor expansion and/or blocks

- Compute volumes via determinants

- Compute determinants via row operations and reduction

- Compute determinants via matrix multiplication/factoring

### PRIMER

This week marks a change from algebra to geometry; from matrices as
passive data structures to matrices as active functions, transforming
vectors.

BASES. We have worked with vectors in $\mathbb{R}^{n}$ as $n$-tuples of
coordinates. In reality, vectors are a bit more complicated than this.
Implicit in our conventions is the so-called *standard basis*: either
$\widehat{i},\ \widehat{j},\ \widehat{k}$ in 3-D or
$\{{\widehat{e}}_{i}\}$ in $\mathbb{R}^{n}$. What if you had a vector of
$x$, $y$, and $z$ coordinates which you needed to send to someone whose
basis was not the same as yours? Perhaps their convention was to switch
the $y$ and $z$ axes: how would you communicate and agree upon what your
vectors represent?

In the general case, consider a *basis* on $\mathbb{R}^{n}$ as a
collection of $n$ vectors $\mathbf{u}_{1},\ldots,\ \mathbf{u}_{n}$ such
that any vector $\mathbf{v}\ $can be written *uniquely* as a linear
combination of basis vectors:

$$\mathbf{v\ } = \ \ c_{1}\mathbf{u}_{1} + c_{2}\mathbf{u}_{2} + \cdots + c_{n}\mathbf{u}_{n}\ \  = \ \sum_{i = 1}^{n}{c_{i}\mathbf{u}_{i}}\mathbf{\ ,}$$

where the constants $c_{i}$ are the *coefficients* of $\mathbf{v}$ in
the new basis. There are subtleties here -- how do we know this
decomposition *exists* and is *unique*? This will be answered when you
take linear algebra; for now, assume that you are given a basis that
satisfies these conditions. Given the vector $v$ in standard coordinates
and the list of new vectors forming a basis, the coordinates $c_{i}$ can
be solved for by solving the linear system $A\mathbf{c} = \mathbf{v}$
for $\mathbf{c} = (c_{i})$, where $A$ is the matrix whose columns are
the new basis vectors $\mathbf{u}_{\mathbf{i}}$.

Certain types of bases are more or less special/convenient. A basis is
said to be *orthogonal* if every pair of basis vectors is orthogonal
(dot product zero). If, in addition, the basis vectors all have unit
length, then the basis is called *orthonormal*.

Changing from the standard basis to a new basis is but one example of a
more general phenomenon -- that of transforming vectors. In the calculus
to come, we will work with very general changes in coordinates: for the
time being, we will focus on transformations of vectors that are
*linear*.

LINEAR TRANSFORMATIONS. An $m$-by-$n$ matrix $A$ can be interpreted as a
function that takes vectors as inputs and returns vectors as outputs.
Consider the function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is
defined by $f\left( \mathbf{x} \right) = A\mathbf{x}$. Any such function
is a *linear transformation*, meaning that it takes the sum of two
inputs to the sum of the two outputs and a rescaled input to a rescaled
output:

$$f\left( \mathbf{x} + \mathbf{y} \right) = f\left( \mathbf{x} \right) + f\left( \mathbf{y} \right)\ \ \ \ \ :\ \ \ \ f\left( c\mathbf{x} \right) = cf(\mathbf{x})$$

Every linear transformation from $\mathbb{R}^{n}$ to $\mathbb{R}^{m}$
can be so represented by a $m$-by-$n$ matrix: matrices are verbs as well
as nouns. It is this role of a matrix as a linear function that
foreshadows the primacy of matrices in multivariable calculus. Before
dealing with nonlinear multivariate functions, one should understand the
linear counterparts.

It is simplest to work at first with linear transformations having two
inputs and two outputs: such are represented as 2-by-2 matrices. The
following three examples of matrices give three fundamental types of
linear transformations of the plane:

$$A = \begin{bmatrix}
\lambda_{1} & 0 \\
0 & \lambda_{2}
\end{bmatrix}\ \ \ \ \ \ :\ \ \ \ \ B = \begin{bmatrix}
\cos\theta & {- sin}\theta \\
\sin\theta & \cos\theta
\end{bmatrix}\ \ \ \ \ \ :\ \ \ \ \ C = \begin{bmatrix}
1 & c \\
0 & 1
\end{bmatrix}$$

The first, $A$, is a *rescaling* matrix which rescales the $x$-axis by a
factor $\lambda_{1}$ and the $y$-axis by a factor $\lambda_{2}$. For
$|\lambda_{i}|\  > 1$, the axis is stretched; when
$\left| \lambda_{i} \right| < 1$, the axis is squeezed. A flip happens
when a coefficient $\lambda_{i}$ is negative. The second example, $B$,
is called a *rotation* matrix: it rotates the plane about the origin by
an angle $\theta$ (in the counterclockwise direction) measured from the
$x$-axis. The last example, $C$, is least familiar: it is a *shear*
matrix which, in this case, preserves the $x$-axis and shears along the
horizontal direction: the positive $y$-axis is sheared to the right; the
negative to the left.

To understand what shears, rotations, or any other matrices do, the
following is a simple but effective method. Feeding the $i$^th^ basis
vector ${\widehat{e}}_{i}$ into $A$ returns the $i$^th^ column of $A$,
clearly. This means that the unit cube in the domain is sent to the
parallelopiped in the image spanned by the columns of $A$. Linearity
means that the grid of cubes in the domain is sent to its image as a
grid of parallelopipeds spanned by columns.

More interesting linear transformations can be obtained by composing
elementary pieces. One must be careful: the linear transformation with
matrix $AB$ does not correspond to "*do* $A$ *then do* $B$" but rather
the reverse, as can be seen by associativity:
$(AB)\mathbf{v} = A(B\mathbf{v})$.

DETERMINANTS. To each square matrix is associated a particular scalar
that determines whether it is invertible. We encountered this
*determinant* last week:

$$\det\begin{bmatrix}
a & b \\
c & d
\end{bmatrix} = ad - bc\ .$$

The matrix is invertible if and only if the determinant is nonzero. For
a 1-by-1 matrix, the determinant is even simpler: it is the single
entry. For a 3-by-3 matrix, the determinant is complicated but strangely
familiar: it equals the scalar triple of the three column vectors in
order. The determinant of an $n$-by-$n$ matrix generalizes the scalar
triple product, both algebraically and geometrically.

We noted that a linear transformation maps the standard basis vectors to
the columns of the matrix. For a square matrix, the associated linear
transformation maps the unit $n$-dimensional cube spanned by the basis
vectors to a parallelopiped spanned by the column vectors. The
determinant of the matrix is, precisely, the $n$-dimensional volume of
this parallelopiped, with a plus-or-minus sign depending on an
orientation (*cf.* the antisymmetry of the cross product in Week 2).
This makes sense with respect to the obstruction to invertibility --
smashing the unit cube to a lower-dimensional object is a violence that
cannot be undone.

The geometric approach to determinants reveals a deep connection with
matrices as linear transformations. Since the matrix $AB$ represents
doing $B$ first and then $A$; then, for square matrices, one can
interpret the actions on volumes to obtain the following fundamental
result:

$$\det(AB) = \left( \det A \right)\left( \det B \right)\ .$$

This will be important to us much later, in Week 11 and beyond.

COMPUTATION. Computing the determinant of a general $n$-by-$n$ matrix is
nontrivial. There are a few ways to proceed algorithmically. One popular
approach is a reductive method called *minor expansion*. Fix a matrix
$A$; the $(i,j)$ *minor* $M_{ij}$ is the matrix obtained from $A$ by
deleting the $i$^th^ row and the $j$^th^ column. Minors of a square
matrix are themselves square and have a proper determinant. The
determinant of $A$ by expansion about the $i$^th^ row or the $j$^th^
column (where the choice of row or column is arbitrary) is computed as
an alternating sum:

$$\det A\  = \ \sum_{k = 1}^{n}{{( - 1)^{i + k}A}_{ik}\det M_{ik}}\  = \ \sum_{k = 1}^{n}{( - 1)^{j + k}A_{kj}\det M_{kj}}\ .$$

Minor expansion is not helpful for large matrices, since the number of
operations can be factorial in $n$; however, for not-too-large matrices,
or for matrices where one row or column has many zeros, this is a decent
approach, as long as one is very careful with the alternating signs.

From this, one sees that certain matrices are trivial to work with. For
a triangular matrix (upper or lower), iterated minor expansion reveals
that the determinant is the product of the particular values along the
diagonal -- a critical result. This, in combination with the
multiplicative property of the determinant and our understanding of
linear transformations, leads to the capstone result of this Volume. As
seen in Week 3, we can row-reduce a square matrix to a diagonal matrix
via the three row reduction operations. By writing out the row
operations as linear transformations (!) and computing the determinants
of these three simple types of matrices, we arrive at the following
conclusions:

- R1 : exchanging rows multiplies the determinant by a factor of $- 1$;

- R2 : rescaling a row by $c$ changes the determinant by a factor of
  $c$;

- R3 : adding one rescaled row to another leaves the determinant the
  same.

This is an excellent approach to determinant computation: the repeated
use of the third row operation is particularly nice, as one does not
have to keep track of factors.

This approach can be interpreted as factoring a matrix into a simple
(triangular) term times a sequence of simple row-operation matrices.
This perspective is a deep idea in applied linear algebra -- matrix
factorization is an extensive subject. In the context of our story, the
immediate payoff is a clear proof of an otherwise difficult result to
grasp: $\det A^{T} = \det A\ .$

### DISCUSSION

[QUESTION 1.]{.underline} Given the vector
$3\widehat{i} - 4\widehat{j}$, what is its coordinates in a new basis
given by $\mathbf{u} = 2\widehat{i} + 5\widehat{j}$ and
$\mathbf{v} = 3\widehat{i} + 7\widehat{j}$?

Emphasize the fact that the coordinates of these vectors as given are
coordinates in the standard basis. Encourage students to rewrite the
problem in the form of a linear system & use what they know about the
inverse of a 2x2 matrix. As a follow-up, what is the general formula for
the coordinates of a vector with standard coordinates $c_{x},\ c_{y}$?

[QUESTION 2.]{.underline} Consider the rotation matrix $R_{\theta}$ that
rotates the plane CCW by an angle $\theta$. How do you remember the
signs on the off-diagonal terms? How do you rotate it CW instead? What
are the determinant and inverse of this, and why does this make sense?
What happens if you compose or take powers of rotation matrices?

There are so many good, simple questions here with rotation matrices
that intersect with this week's material. This is where thinking of the
columns as the image of the standard basis vectors really pays off. Be
sure to spend a lot of time thinking through these with students.

[QUESTION 3.]{.underline} What is the difference between a shear and a
rotation? How many different types of shears are there? Why is it that a
shear matrix does not change area? Do shears commute with rotations?

This is a good time to talk about the physical interpretation of a
transpose, which flips vertical and horizontal shears. Lots of good
questions that can be asked here.

[QUESTION 4.]{.underline} What matrix corresponds to the linear
transformation given by "do $A$ then do $B$ then do $C$"?

This is a good time to review associativity and commutativity. Why is it
that composition of linear transformations seems to be "in the wrong
order" when multiplying matrices? Always keep in mind that matrices act
on vectors from the left, so think: $A\mathbf{x}$ then
$B\left( A\mathbf{x} \right) = (BA)\mathbf{x}$ etc.

[QUESTION 5.]{.underline} Draw two parallelograms in the plane, each
with one corner at the origin and the other corner points at integer
coefficients. What is the linear transformation that takes the first
parallelogram to the second one?

This is challenging as stated, since it is a general 2-D change-of-basis
problem in disguise. After letting students wrestle with this a bit --
maybe setting it up as a linear system -- try suggesting the simpler
problem of which linear transformations $A$ and $B$ take the unit square
to each of these two shapes. After doing so, try to get students to hit
on the idea of $BA^{- 1}$ as the transformation that undoes $A$ then
does $B$. This shows the power of composition.

[QUESTION 6.]{.underline} Let
$\{\mathbf{v}_{1},\ldots,\ \mathbf{v}_{n}\}$ denote an orthonormal basis
for $\mathbb{R}^{n}$. If you use these vectors as columns of a square
matrix $Q$, you have (by definition) an *orthogonal matrix* (with
apologies for the confusing terminology). What is its determinant? How
would you invert an orthogonal matrix $Q$? Is the product of two
orthogonal matrices still orthogonal?

Lots of good questions to explore. The inverse problem, of course, seems
too general to be done explicitly: but that is a clue. Ask students what
does it mean to have an orthonormal basis. Recall from Week 3 Question 3
the relationship between the dot product and
multiplication-by-transpose. This is enough to get students to see that
$Q^{T}Q = I$. But what about $QQ^{T}$?

[QUESTION 7.]{.underline} Assume that a 2-by-3 matrix $A$ can be written
as $A = Q_{1}\Sigma Q_{2}$, where $\Sigma$ is diagonal and $Q_{1},Q_{2}$
are orthogonal matrices. What can you say about the sizes of these
matrices? What can you say about the determinant of $A$ -- on what does
it depend? If $Q_{1}$ and $Q_{2}$ are rotations, what does this mean
about $A$ and its geometry?

This is not an essential question for this course, but it is
foundational to understanding in later linear algebra classes. It is of
itself a useful exercise in keeping track of dimensions and in building
some intuition for the geometry of a linear transformation. The curious
student might ask how one would know if such a factorization exists: the
answer, of course, is that such always exists for any matrix, and this
is the foundation for the Singular Value Decomposition.

[QUESTION 8.]{.underline} A *permutation matrix* is a square binary
matrix such that each row and each column has exactly one "$1$" in it --
all the rest are zeros. What can you say about the geometry of such
linear transformations? What can you say about their determinants (and
thus invertibility)? How would you compute an inverse? Are permutations
closed under composition?

For determinants, this permits several approaches. Try to get students
to think about different ways of computing these. Are all such matrices
rotations of one form or another? Are they orthogonal matrices? This
might be a good time to foreshadow notions of orientability. As a
follow-up question, try to get students to argue whether permutations
are closed under composition.

[QUESTION 9.]{.underline} Compute the determinant of the following
matrix:

$$A\  = \ \begin{bmatrix}
0 & 0 & 0 & 1 & - 2 \\
0 & 1 & 5 & 2 & 3 \\
0 & 0 & 7 & 0 & - 6 \\
0 & 0 & 0 & - 3 & 5 \\
2 & 4 & - 7 & 1 & 3
\end{bmatrix}$$

Which way is easier? Minor expansion? Row reduction? There's room for
debate here; it might be a good idea to have the class split into teams
trying it different ways and then doing a compare-and-contrast. How is
it that signs/orientations manifest in each approach?

[QUESTION 10.]{.underline} For which value(s) of $C$ are these vectors
coplanar?

$$\mathbf{u} = \begin{pmatrix}
1 \\
0 \\
3C
\end{pmatrix}\ \ \ \ :\ \ \ \mathbf{v} = \begin{pmatrix}
C \\
2 \\
 - 3
\end{pmatrix}\ \ \ \ :\ \ \ \mathbf{w} = \begin{pmatrix}
0 \\
1 \\
C
\end{pmatrix}\ \ $$

This can be viewed in terms of the scalar triple product or, perhaps
better, the determinant. Why is it that having a vanishing determinant
means that all three vectors lie in some plane? This can lead to good
discussions.

[QUESTION 11.]{.underline} Consider a simple block-diagonal matrix of
the form

$$D = \begin{bmatrix}
A & 0 \\
0 & B
\end{bmatrix}$$

Where $A$ and $B$ are square matrices, and the off-diagonal blocks are
all zero. Argue that $\det D = (\det A)(\det B)$.

This is good practice for reasoning about determinants. If students get
stuck, try suggesting thinking in terms of row reduction to triangular
forms. If they solve the problem by row reduction, suggest a redo using
composition & see if they can factor $M$ as a product of two matrices
where the $A$ and $B$ blocks are swapped out for identities. Does the
order of multiplication matter?

[QUESTION 12.]{.underline} As a follow-up to the previous Question, try
to guess at the determinant of

$$C = \begin{bmatrix}
0 & B \\
A & 0
\end{bmatrix}$$

Students may be tempted to say it is $- (\det A)(\det B)$, without
paying attention to the delicacies of signs. This is a good opportunity
to teach care and precision in reasoning. A few examples may suffice to
lead to the correct pattern.

[QUESTION 13.]{.underline} Consider the *Vandermonde* matrix $V$ on $n$
variables $x_{1},\ldots,\ x_{n}$ whose entries are given by
$V_{ij} = x_{i}^{j}$. Write out an example in the case $n = 3$ using
$x,\ y,\ z$ variables. These matrices are useful in digital signal
processing, interpolation, number theory, and much more. Here is a fact
about their determinants:

$$\det V = \ \prod_{1 \leq i < j \leq n}^{\ }{(x_{j} - x_{i})}$$

(Recall that the $\Pi$ means *product*...) Can you derive this result in
the $n = 3$ case? Which method(s) of computation will work best?

If students try minor expansion, go with it and see how the
factorization problem presents itself. Row reduction is also not without
difficulties. This is a good example of multiple pathways to compute a
difficult determinant. For a more advanced student, it offers a
springboard to an induction proof.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the following pair of vectors in
$\mathbb{R}^{2}$

$$\mathbf{a} = 4\widehat{i} + 6\widehat{j}\ \ \ \ :\ \ \ \ \mathbf{b} = 3\widehat{i} - 2\widehat{j}$$

A\) These form a basis for $\mathbb{R}^{2}$. Is this basis orthogonal?
orthonormal? neither?

B\) Compute the coefficients of the vector
$\mathbf{v} = 5\widehat{i} + 7\widehat{j}$ in this
$\left( \mathbf{a},\mathbf{b} \right)$ basis using matrices and vectors.
Identify your final answer as $a$- and $b$-coordinates.

C\) What are the coefficients of the vector $\mathbf{a} + \mathbf{b}$ in
this $\left( \mathbf{a},\mathbf{b} \right)$ basis?

[PROBLEM 2.]{.underline} Consider the following three linear
transformations of the plane, represented as matrices:

$$A = \begin{bmatrix}
1/2 & 0 \\
0 & - 1
\end{bmatrix}\ \ :\ \ B = \begin{bmatrix}
1 & 0 \\
 - 1 & 1
\end{bmatrix}\ \ :\ \ C = \begin{bmatrix}
0 & - 2 \\
2 & 0
\end{bmatrix}\ $$

A\) Describe in words what the linear transformation $B$ does to the
plane.

B\) What is the matrix that represents "first do $A$ then do $B$ then do
$C$". Please express your answer as a single matrix.

C\) Which vector is sent by $B$ to the vector $\begin{pmatrix}
1 \\
1
\end{pmatrix}$?

[PROBLEM 3.]{.underline} Consider the linear transformation
$f:\mathbb{R}^{3} \rightarrow \mathbb{R}^{3}$ given by
$f\left( \mathbf{x} \right) = A\mathbf{x}$, where

$$\mathbf{x} = \begin{pmatrix}
x \\
y \\
z
\end{pmatrix}\ \ \ \ :\ \ \ \ A = \begin{bmatrix}
\frac{1}{2} & - \frac{\sqrt{3}}{2} & 0 \\
\frac{\sqrt{3}}{2} & \frac{1}{2} & 0 \\
0 & 0 & - 2
\end{bmatrix}$$

A\) Where does this linear transformation send the point
$x = 4,\ y = 2,\ z = 3$?

B\) *Describe in words:* what does this linear transformation do to the
$z$-axis?

C\) *Describe in words:* what does this linear transformation do to the
$(x,y)$-plane?

[PROBLEM 4.]{.underline} Let $A$ and $B$ denote linear transformations
from the plane to itself. That is,
$A:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}$ and
$B:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}$. Assume that:

1.  The matrix representing $A$ is $A = \begin{bmatrix}
    2 & 1 \\
    4 & 3
    \end{bmatrix}$.

2.  The transformation $AB$ takes the $\widehat{i}$ vector to
    $\begin{pmatrix}
    4 \\
    10
    \end{pmatrix}$ and the $\widehat{j}$ vector to $\begin{pmatrix}
     - 2 \\
    1
    \end{pmatrix}$.

A\) What is the matrix that represents the linear transformation $AB$?

B\) What is the matrix that represents the linear transformation $B$?

[PROBLEM 5.]{.underline} Consider the linear transformation
$f:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}$ given by
$f\left( \mathbf{x} \right) = A\mathbf{x}$, where $A$ is the product of
three matrices:

$$\mathbf{x} = \begin{pmatrix}
x \\
y
\end{pmatrix}\ \ \ \ :\ \ \ A = \begin{bmatrix}
0 & - 1 \\
1 & 0
\end{bmatrix}\begin{bmatrix}
2 & 0 \\
0 & - 1/2
\end{bmatrix}\begin{bmatrix}
0 & 1 \\
 - 1 & 0
\end{bmatrix}$$

A\) Describe *in words* what this linear transformation does:

*Finally,* $f$ *does this.*

B\) Write out $A$ as a single 2-by-2 matrix:

C\) Draw a picture of what the linear transformation given by $A$ does
to the image on the left...

[PROBLEM 6.]{.underline} Consider the linear transformation from
$\mathbb{R}^{4}$ to $\mathbb{R}^{4}$:

$$f\begin{pmatrix}
x \\
y \\
u \\
v
\end{pmatrix} = \begin{pmatrix}
 - 2y \\
2x \\
 - 6u + 4v \\
3u - 2v
\end{pmatrix}$$

A\) What matrix $A$ represents this linear transformation? That is, for
what matrix $A$ is $f\left( \mathbf{x} \right) = A\mathbf{x}$, where
$\mathbf{x} = (x\ \ \ y\ \ \ u\ \ \ v)^{T}$ ?

B\) Do the columns of $A$ form an orthogonal basis?

C\) Describe in words what the linear transformation above does to the
$(x,y)$ plane.

[PROBLEM 7.]{.underline} Consider the following pair of vectors in
$\mathbb{R}^{2}$

$$\mathbf{u} = 5\widehat{i} - 7\widehat{j}\ \ \ \ :\ \ \ \ \mathbf{v} = 4\widehat{i} - 6\widehat{j}$$

A\) These form a basis for $\mathbb{R}^{2}$. Is this basis orthogonal?
orthonormal? neither?

B\) Compute the coefficients of the vector
$\mathbf{x} = 9\widehat{i} - 5\widehat{j}$ in this
$\left( \mathbf{u},\mathbf{v} \right)$ basis using matrices and vectors.
Identify your final answer as $\mathbf{u}$- and
$\mathbf{v}$-coordinates.

C\) What are the coefficients of the vector $\mathbf{u} - \mathbf{v}$ in
this $\left( \mathbf{u},\mathbf{v} \right)$ basis?

[PROBLEM 8.]{.underline} Consider the linear transformation from
$\mathbb{R}^{4}$ to $\mathbb{R}^{4}$ given by:

$$f\begin{pmatrix}
x \\
y \\
u \\
v
\end{pmatrix} = \begin{pmatrix}
Cx + 2Cy - 2u \\
3y + 4u - 3Cv \\
 - u + Cv \\
2u - 6v
\end{pmatrix}$$

where $C$ is an unknown constant.

A\) What matrix $A$ represents this linear transformation? That is, for
what matrix $A$ is $f\left( \mathbf{x} \right) = A\mathbf{x}$, where
$\mathbf{x} = (x\ \ \ y\ \ \ u\ \ \ v)^{T}$ ?

B\) For which value(s) of $C$ is $A$ invertible? Use the determinant to
answer.

[PROBLEM 9.]{.underline} Compute the determinants of the following
matrices, using any method you wish.

A\) $\begin{bmatrix}
0 & - 9 & 5 \\
 - 1 & 2 & 17 \\
0 & 3 & 0
\end{bmatrix}$ B) $\begin{bmatrix}
1 & 3 & 0 & 0 \\
 - 4 & - 8 & 0 & 0 \\
0 & 0 & 3 & 3 \\
0 & 0 & 6 & 5
\end{bmatrix}$ C) $\begin{bmatrix}
2 & - 4 & - 9 & 15 \\
0 & 1 & 2 & - 18 \\
4 & 0 & 4 & 7 \\
1 & 0 & 1 & 3
\end{bmatrix}$

[PROBLEM 10.]{.underline} Consider the following matrices:

$$A = \begin{bmatrix}
 - 4 & 5 & 0 & 0 & 0 \\
1 & - 2 & 0 & 0 & 0 \\
0 & 0 & 1 & 4 & 0 \\
0 & 0 & 2 & 3 & 0 \\
0 & 0 & 0 & 0 & - 1
\end{bmatrix}\ \ \ \ \ \ \ :\ \ \ \ B = \begin{bmatrix}
7 & 6 & 0 & 0 & 0 \\
6 & 5 & 0 & 0 & 0 \\
0 & 0 & 2 & 8 & 0 \\
0 & 0 & 1 & 5 & 0 \\
0 & 0 & 0 & 0 & 3
\end{bmatrix}$$

Compute: A) $det(AB)$ ; and B) $\det\left( B^{- 1} \right)$.

[PROBLEM 11.]{.underline} Consider the following matrices:

$$A = \begin{bmatrix}
0 & 1 \\
2 & 0 \\
3 & - 4
\end{bmatrix}\ \ \ :\ \ B = \begin{bmatrix}
3 & 2 & - 9 \\
0 & 0 & 4 \\
6 & 3 & 17
\end{bmatrix}\ \ \ :\ \ C = \begin{bmatrix}
 - 1 & 0 & 1 \\
0 & 1 & 1 \\
1 & 0 & 1
\end{bmatrix}\ \ \ :\ \ D = \ \begin{bmatrix}
3 & 0 & 5 \\
2 & - 1 & 0
\end{bmatrix}$$

Compute/simplify the following determinants, if possible: if not,
explain why it's not possible.

A\) $\det{(B)}$ ; B) $\det{(C^{3})}$ ; C) $\det{(CA)}$ ; D) $\det{(DA)}$

[PROBLEM 12.]{.underline} Consider the following matrices:

$$A = \begin{bmatrix}
 - 1 & - 1 \\
2 & 0 \\
3 & 1
\end{bmatrix}\ :B = \begin{bmatrix}
 - 1 & 0 & 0 \\
0 & 1 & - 6 \\
0 & - 2 & 9
\end{bmatrix}\ :C = \begin{bmatrix}
3 & 0 & 0 \\
0 & 2 & 5 \\
0 & 4 & 8
\end{bmatrix}\ :D = \ \begin{bmatrix}
1 & 2 & - 3 \\
0 & 3 & 1
\end{bmatrix}$$

Compute/simplify the determinants of the following inverses, if
possible: if not, explain why it's not possible.

A\) $det(C^{- 1})$ ; B) $det(BC)$ ; C) $det(DA)$ ; D) $det(BA)$

[PROBLEM 13.]{.underline} Compute the determinants of the following
matrices, using any method you wish.

A\) $\begin{bmatrix}
1 & - 5 & 3 \\
 - 2 & 19 & 7 \\
0 & 2 & 0
\end{bmatrix}$ B) $\begin{bmatrix}
2 & 4 & 0 & 0 \\
5 & 8 & 0 & 0 \\
0 & 0 & - 4 & 5 \\
0 & 0 & 6 & - 9
\end{bmatrix}$ C) $\begin{bmatrix}
2 & 4 & 7 & 19 & - 9 \\
0 & 5 & 5 & 12 & 8 \\
0 & 0 & - 1 & 0 & 7 \\
0 & 0 & 0 & - 5 & 7 \\
0 & 0 & 0 & 1 & 3
\end{bmatrix}$

[PROBLEM 14.]{.underline} Consider the following matrices $A$ and $B$:

$$A\  = \ \begin{bmatrix}
1 & - 2 & 4 & 3 \\
2 & - 1 & 0 & 5 \\
0 & 0 & 1 & 3 \\
0 & 0 & - 2 & 1
\end{bmatrix}\ \ \ \ \ \ B\  = \ \begin{bmatrix}
1 & 2 & 0 & 0 \\
 - 2 & - 1 & 0 & 0 \\
4 & 0 & 1 & - 2 \\
3 & 5 & 3 & 1
\end{bmatrix}\ \ \ \ \ \ $$

Compute the determinant of the matrix $AB$.

[PROBLEM 15.]{.underline} Consider the matrix $A$ expressed as a product
of two other matrices:

$$A = \begin{bmatrix}
 - 1 & - 4 & 2 & 3 \\
0 & 3 & 9 & - 8 \\
0 & 0 & 2 & 7 \\
0 & 0 & 0 & 5
\end{bmatrix}\begin{bmatrix}
2 & 0 & 0 & 0 \\
13 & 1 & 0 & 0 \\
6 & 3 & - 3 & 0 \\
 - 17 & - 7 & 4 & 1
\end{bmatrix}$$

A\) Explain: why is $A$ an invertible matrix?

B\) What is the determinant of $A^{- 1}$?

[PROBLEM 16.]{.underline} Use row reduction to compute the determinant
of this matrix:

$$\begin{bmatrix}
1 & - 2 & 8 & 19 & - 5 & 11 \\
3 & - 1 & 3 & 5 & 10 & - 9 \\
0 & 0 & 0 & 0 & - 3 & 2 \\
0 & 0 & 0 & 3 & - 5 & 17 \\
0 & 0 & 1 & - 1 & 2 & 0 \\
0 & 0 & 0 & 0 & 0 & 3
\end{bmatrix}$$

[PROBLEM 17.]{.underline} Given the following facts about 3-by-3
matrices $A$ and $B$:

1)  $AB = BA$

2)  $\det A^{2} = 4$

3)  $\det B^{2} = 9$

4)  $\det(A + B) = 5$

5)  $\det(A - B) = - 7$

Compute the following: A) $\det A$ ; B) $\det B\ $ ; C)
$\det{(A^{2} - B^{2})}$

[PROBLEM 18.]{.underline} Assume the following matrix has determinant
equal to 7.

$$\begin{bmatrix}
a_{1} & a_{2} & a_{3} \\
b_{1} & b_{2} & b_{3} \\
c_{1} & c_{2} & c_{3}
\end{bmatrix}$$

Compute the determinants of the following matrices:

$$A = \begin{bmatrix}
c_{1} & c_{2} & c_{3} \\
{2c_{1} + b}_{1} & 2c_{2} + b_{2} & 2c_{3} + b_{3} \\
 - 3a_{1} & - 3a_{2} & - 3a_{3}
\end{bmatrix}\ \ :\ \ B = \begin{bmatrix}
{- a}_{1} & b_{1} + c_{1} & b_{1} \\
 - a_{2} & b_{2} + c_{2} & b_{2} \\
 - a_{3} & b_{3} + c_{3} & b_{3}
\end{bmatrix}$$

$$C = \begin{bmatrix}
a_{1} & a_{2} & 0 & a_{3} & 0 \\
0 & 0 & - 4 & 0 & 0 \\
b_{1} & b_{2} & 0 & b_{3} & 0 \\
c_{1} & c_{2} & 0 & c_{3} & 0 \\
0 & 0 & 0 & 0 & 3
\end{bmatrix}$$

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) orthogonal ; B)
$\mathbf{v} = \frac{31}{26}\mathbf{a} + \frac{1}{13}\mathbf{b}$ ; C)
$(1,1)$

[PROBLEM 2.]{.underline} B) $CBA = \begin{bmatrix}
1 & 2 \\
1 & 0
\end{bmatrix}$ ; C) $\left( \begin{array}{r}
1 \\
2
\end{array} \right)$

[PROBLEM 3.]{.underline} A) ; B)

[PROBLEM 4.]{.underline} A) $AB = \begin{bmatrix}
4 & - 2 \\
10 & 1
\end{bmatrix}$ ; B) $B = \begin{bmatrix}
1 & - 7/2 \\
2 & 5
\end{bmatrix}$

[PROBLEM 5.]{.underline} B) $\begin{bmatrix}
 - 1/2 & 0 \\
0 & 2
\end{bmatrix}$

[PROBLEM 6.]{.underline} A) $A = \ \begin{bmatrix}
0 & - 2 & 0 & 0 \\
2 & 0 & 0 & 0 \\
0 & 0 & - 6 & 4 \\
0 & 0 & 3 & - 2
\end{bmatrix}$

B\) nope ; C) counterclockwise rotation by $\pi/2$ and rescale by $2$

[PROBLEM 7.]{.underline} A) neither ; B)
$\mathbf{x} = 17\mathbf{u} - 19\mathbf{v}$ ; C) $(1, - 1)$

[PROBLEM 8.]{.underline} A) $A = \ \begin{bmatrix}
C & 2C & - 2 & 0 \\
0 & 3 & 4 & - 3C \\
0 & 0 & - 1 & C \\
0 & 0 & 2 & - 6
\end{bmatrix}$ ; B) $C \neq 0,\ 3$

[PROBLEM 9.]{.underline} A) $- 15\ $; B) $- 12$ ; C) $15$

[PROBLEM 10.]{.underline} A) $- 90$ ; B) $- 1/6$

[PROBLEM 11.]{.underline} A) $12$ ; B) $- 8$ ; C) nope ; D) $- 4$

[PROBLEM 12.]{.underline} A) $- 1/12$ ; B) $- 36$ ; C) $30$ ; D) nope

[PROBLEM 13.]{.underline} A) $- 26$ ; B) $- 24$ ; C) $220$

[PROBLEM 14.]{.underline}
$\det(AB) = \left( \det A \right)\left( \det B \right) = (21)(21) = 441$

[PROBLEM 15.]{.underline} A) $\det A = ( - 30)( - 6) = 180 \neq 0$ ; B)
$\det\left( A^{- 1} \right) = \left( \det A \right)^{- 1} = 1/180$

[PROBLEM 16.]{.underline} $\det{= 135}$

[PROBLEM 17.]{.underline} A) $\pm 2$ ; B) $\pm 3$ ; C) $- 35$

[PROBLEM 18.]{.underline} A) $\det A = 21$; B) $\det B = 7$ ; C)
$\det C = 84$

# VOLUME II : DERIVATIVES

## Week 5 : The Derivative

### OUTLINE

MATERIALS: Calculus BLUE : Vol 2 : Chapters 1-4

TOPICS:

- Multivariate functions and their applications

- Partial derivatives: computation and interpretation

- The derivative as a matrix of partials

- The derivative as a linear transformation on vectors of rates of
  change

- Sensitivity of input-output pairs based on partial derivatives

- Definition of the derivative

- The derivative of the polar coordinate transformation

- Continuous but non-differentiable functions

- The derivative as a 1^st^ order term in a Taylor expansion

LEARNING OBJECTIVES:

- Manipulate functions having multiple inputs and outputs

- Compute the partial derivatives of a function

- Compute the derivative as a matrix

- Evaluate the derivative at different inputs

- Discern number of inputs and outputs of a function based on derivative

- Use the derivative to transform vectors of rates of change of inputs

- Use the derivative to discern sensitivities of inputs/outputs

### PRIMER

This is the first week in which we can be said to be doing calculus with
truly multivariate functions with multiple inputs and multiple outputs.
Our work with matrices as linear transformations has prepared us for
this moment, yielding a plentiful source of functions
$A:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ of a simple (linear)
nature.

PARTIAL DERIVATIVES. For a function
$f:\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ of $n$ variables
$\mathbf{x} \in \mathbb{R}^{n}$, we can consider what happens to the $m$
outputs $f_{1},\ f_{2},\ \ldots,\ f_{m}$ if only one of the inputs,
$x_{j}$, is varied. If we restrict attention to the $i$^th^ output
$f_{i}$, keeping all but one input variable $x_{j}$ fixed, then $f$ is
in the familiar form from single-variable calculus. The derivative of
the $i$^th^ output $f_{i}$ with respect to the $j$^th^ input $x_{j}$ is
the *partial derivative* $\partial f_{i}/\partial x_{j}$. The
differentiation operator with respect to $x_{j}$ holding all other
inputs constant is denoted $\partial/\partial x_{j}$.

The collection of all partial derivatives of a function $f$ can be
unwieldy, depending on the number of inputs and outputs. Our strategy
for managing all this data is, at first, to use a matrix as a convenient
data structure. This matrix, often called the *Jacobian*, will
henceforth be called simply *the* *derivative*. It is denoted with
square brackets to reinforce its matrix-like nature:

$$\lbrack Df\rbrack = \left\lbrack \frac{\partial f_{i}}{\partial x_{j}} \right\rbrack_{ij}$$

It is important to remember that the columns of the derivative
correspond to the inputs of $f$ and the rows correspond to the outputs
of $f$: a little practice at this will pay off in this and future weeks.

As in single variable calculus, there is an important difference between
the derivative evaluated at a particular point and the un-evaluated
derivative, whose entries are functions of the input variables. The
derivative evaluated at a point is a matrix whose entries are numerical.
These numbers -- the partial derivatives -- can be interpreted as
*sensitivities* of input-output pairs: the sensitivity of the $i$^th^
output with respect to the $j$^th^ input. It is not itself a rate of
change or an actual change in output. Like a slope, the partial
derivative is a ratio of output-to-input rates of change, and, like
slope, the sign matters greatly, indicating positive or negative
correlation between input and output. As always, changing the evaluation
point of the derivative can change the entries dramatically (but
hopefully continuously).

THE DERIVATIVE. Why is the derivative best presented as a matrix? Is
this simply a convenient data structure or is there something deeper at
work? The derivative is more than a matrix: *it is a linear
transformation*. When evaluated at a fixed input point $\mathbf{a}$, the
derivative is a linear transformation $\lbrack Df\rbrack_{\mathbf{a}}$
which takes a vector of rates of change of inputs to $f$ at $\mathbf{a}$
to a vector of rates of change of the outputs of $f$ at the point
$f(\mathbf{a})$. This is the first deep idea in multivariable calculus,
and it takes some effort to grasp.

Thinking in terms of a parametrized curve or surface can be helpful
(*cf.* Week 2 and the discussion problems below), as is working with a
simple example, such as polar coordinates. The true power of the
derivative as a linear transformation lies in its ability to handle very
large complex sets of inputs and outputs. Given a derivative at a point,
call this matrix $A = \lbrack Df\rbrack_{\mathbf{a}}$, and let
$\mathbf{h}$ be a vector of rates of change of inputs. The vector of
rates of change of outputs is easily computed as the product
$\mathbf{b} = A\mathbf{h}$. If, on the other hand, a vector $\mathbf{b}$
of rates of change of outputs is desired, with the goal being to find
the precise rates of change of inputs that effect this, then we are
again faced with a linear system: solve $A\mathbf{h} = \mathbf{b}$ for
$\mathbf{h}$. All the possible complexities -- does a solution exist and
is it unique -- are a solved problem, thanks to the work done in Weeks
3-4.

One may wonder what other results from matrix algebra will be useful to
us in light of derivatives as linear transformations. What does matrix
multiplication or the determinant mean? We will return to this question
next week.

THE DEFINITION. The derivative $\lbrack Df\rbrack$ has thus far been
defined solely in terms of (computable and interpretable) partial
derivatives. A more formal definition is necessary and proper. The
derivative of $f$ at a point $\mathbf{a}$ is the linear transformation
$\lbrack Df\rbrack_{\mathbf{a}}$ which acts on vectors $\mathbf{h}$ such
that

$$\lim_{\left| \mathbf{h} \right| \rightarrow 0^{+}}\frac{f\left( \mathbf{a} + \mathbf{h} \right) - f\left( \mathbf{a} \right) - \lbrack Df\rbrack_{\mathbf{a}}\mathbf{h}}{|\mathbf{h}|}\ \  = \ \mathbf{0}\ ,$$

where the right hand side is the zero vector. This is a limit not as the
vector $\mathbf{h}$ goes to zero, but rather as the length of
$\mathbf{h}$ goes to zero. We cannot use the "usual" definition of the
derivative since we cannot divide by a vector. Details of the definition
are not essential to our story, though proper definitions are the glory
of Mathematics. If it seems confusing now, it may be a good idea to
revisit this after learning about Taylor expansion in Week 7.

One way in which this more formal definition is helpful appears when
working with functions whose partial derivatives are simply too numerous
to handle. For example, the function $S$ which takes an $n$-by-$n$
matrix $A$ to its square $S(A) = A^{2}$ can be thought of as a function
$S:\mathbb{R}^{n^{2}} \rightarrow \mathbb{R}^{n^{2}}$ whose partial
derivatives are not readily computed except in the case of small values
of $n$. Nevertheless, by stacking the (rates of) changes to inputs as a
matrix $H$, one has via a Taylor-expansion-type argument that

$$S(A + H) = (A + H)^{2} = S(A) + AH + HA + H^{2}$$

Thus, by the formal definition, $\lbrack DS\rbrack_{A}H = AH + HA$:
*cf.* the single-variable calculus result that
$\left( x^{2} \right)'\left. \  \right|_{a} = 2a$.

### DISCUSSION

[QUESTION 1.]{.underline} Compute the following partial derivatives.

$$F = \frac{x^{2}\sqrt{y^{3}}}{z^{4}}\ \ \ \ \ \ \  \Rightarrow \ \ \ \ \ \ \ \frac{\partial F}{\partial x} = \cdots\ \ \ \ \ \ :\ \ \ \ \ \ \frac{\partial F}{\partial y} = \cdots\ \ \ \ \ \ :\ \ \ \ \ \ \frac{\partial F}{\partial z} = \cdots\ $$

Start off by computing enough examples that students feel comfortable
with partial derivatives. This will not take long. It's good to say out
loud the first few times "Ok, if everything but $x$ is a constant..."

[QUESTION 2.]{.underline} Recall from Week 1 working with quadratic
surfaces. Compute the derivative -- as a matrix -- of the following
parametrization of a paraboloid:

$$z = x^{2} + y^{2}\ \ \ \ \ \ \  \Rightarrow \ \ \ \ \ \ \ \ G\begin{pmatrix}
s \\
t
\end{pmatrix} = \begin{pmatrix}
s \\
t \\
s^{2} + t^{2}
\end{pmatrix}$$

This is simple -- for now -- but will be important in understanding why
$\lbrack DG\rbrack$ is more than a matrix. In this case (where the
inputs are not $x$ and $y$), does it matter what order we use for the
inputs? What determines which input is first? This is a good time to
emphasize writing the inputs and outputs as ordered lists of variables.
If you want to permute, you can, but consistency is vital.

[QUESTION 3.]{.underline} Thinking in terms of our preparatory work in
Week 2 may be helpful. Explain the velocity vector of a parametrized
curve $\gamma(t)$ in terms of the definition of a derivative that we
have learned this week.

Emphasize the relationship between the velocity vector $\gamma'$ and the
derivative $\lbrack D\gamma$\], both of which are column vectors.
Students with a physics background may find it helpful to think of the
velocity vector as a linear transformation -- give me a $\Delta t$ and
get a $\Delta\mathbf{x}$. Be sure to remind students that the derivative
can be very different (or even zero) depending on where it is evaluated,
just as in single variable calculus.

[QUESTION 4.]{.underline} Why is it that a derivative is a linear
transformation? Consider a parametrized surface in 3-D, of the form
$f:\mathbb{R}^{2} \rightarrow \mathbb{R}^{3}$. (Question 2 may still be
on the board...) Assume input parameters $s$ and $t$ with outputs
$x,\ y,z$ each depending on the two parameters. What happens if, at a
particular input $(s_{0},\ t_{0})$ we fix one of the parameters and let
the other change? What does the output look like?

This is a good excuse to draw a picture of a surface in 3-D on the
board. Using two colors for the image of the $s$ axis and the $t$ axis
is helpful. Try to get students to think in terms of velocity vectors.
Once the axes -- the partial derivatives -- are grasped, then ask what
happens if both inputs are increased? What if one is increased and the
other decreased? Be sure to recall the notion of a basis from Week 4. We
will revisit this idea of a tangent plane in Week 7.

[QUESTION 5.]{.underline} Consider the linear function

$$\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
x + 2y \\
3x + 5y
\end{pmatrix} = \begin{bmatrix}
1 & 2 \\
3 & 5
\end{bmatrix}\begin{pmatrix}
x \\
y
\end{pmatrix}$$

Compute all the partials $\partial v/\partial x$, etc. Then, invert the
transformation and compute the partials of the inverses. Is
$\partial x/\partial v$ equal to the reciprocal of
$\partial v/\partial x$? Compare with the single-variable case of
$v(x) = 3x$. What is different in this case?

This foreshadows the Chain Rule and Inverse Function Theorem of Week 6,
but, more importantly, this stresses the need to understand the entire
derivative, and not simply keep a list of partials.

[QUESTION 6.]{.underline} Think back to single-variable calculus... What
is the derivative of the affine (linear + constant) function
$y = ax + b$? Of course, the derivative is a constant. Now, what is the
derivative of the vector-valued function $\mathbf{y} = A\mathbf{x + b}$?

Clever students will guess at the correct answer. It is worth doing an
explicit example with, say, a randomly-generated matrix $A$ of size
2-by-3, just to emphasize the conventions of rows/columns being what
they are. This problem also foreshadows the Taylor perspective of Week
7.

[QUESTION 7.]{.underline} The following *Cobb-Douglas* model is a
classic in basic Economics. One models production, $P$, as a function of
materials, $M$, and labor, $L$, via

$$P = \kappa M^{\alpha}L^{\beta},$$

where $\kappa > 0$ and $0 < \alpha,\ \beta < 1$ are constants and
$\alpha + \beta = 1$. If the investment in labor is increased and the
investment in materials is decreased at an equal rate, what is the
impact on production?

One may wish to begin with a discussion of the model: why these
fractional powers? The crucial idea is that if one doubles $M$ and $L$,
one should double $P$ (think: clone the factory). This is not important
to this week's material, but it is a good review of dimensional-analytic
thinking.

In this -- and perhaps other problems -- students who have seen some
multivariable calculus before will want to think of everything as a
function of time, $t$, and then use Chain Rule arguments to get a rate
of change. This is a good exercise: do it both ways and see how the
matrix version compares to the old-fashioned version. The matrix
approach seems less intuitive and more cumbersome, to be fair. However
-- and this is crucial -- ask students to consider what happens in the
case of a model that is not a textbook cartoon. What happens when there
are a hundred products being built by a dozen overlapping teams of
workers managing a supply chain of a thousand different components? How
should investments be allocated among the different labor teams and in
which input material streams? How does one measure rates of change of
productivity in the case of a hundred outputs? Not everything can be
reduced to a function of a single (time) parameter.

[QUESTION 8.]{.underline} The following function gives the midpoint
deflection $u$ of a beam of length $L$ supported at the endpoints, with
a cross section of width $w$ and height $h$, bearing a load of weight
$F$:

$$u = \frac{FL^{3}}{4Ewh^{2}}$$

Here $E$ is a constant that depends on the material from which the beam
is made. Compute the derivative $\lbrack Du\rbrack$ and evaluate at
$L = 4,\ F = 100,\ E = 1,\ w = 3,\ h = 2$. At these values, what happens
to the deflection if each input is increased at a unit rate? Does the
beam deflect more or less?

This problem is perhaps too much in-the-weeds for non-engineering
students, but this function has more than two inputs and one cannot
automatically guess the correct answer.

[QUESTION 9.]{.underline} Consider a function $f$ such that, at a
particular point,

$$\lbrack Df\rbrack\begin{pmatrix}
1 \\
 - 1
\end{pmatrix} = \begin{pmatrix}
3 \\
 - 2
\end{pmatrix}$$

Start off by asking how many inputs and outputs $f$ has.

What happens if inputs change at rates $\mathbf{h} = ( - 2,2)^{T}$?
*This is doable.*\
What if $\mathbf{h} = (3,3)^{T}$? *This is not doable, but why not?*\
What if you also know that $\lbrack Df\rbrack\begin{pmatrix}
1 \\
2
\end{pmatrix} = \begin{pmatrix}
4 \\
 - 4
\end{pmatrix}$? Can you answer the previous?

[QUESTION 10.]{.underline} The derivative of a function $f$ evaluated at
an input $\mathbf{a}$ equals

$$\lbrack Df\rbrack_{\mathbf{a}}\  = \ \ \begin{bmatrix}
 - 1 & 5 & - 3 \\
0 & 1 & 4 \\
1 & - 1 & 7 \\
2 & 0 & - 6
\end{bmatrix}$$

How many inputs and outputs does $f$ have? Which input-output pair has
the greatest sensitivity at this point? If all the inputs are decreasing
at the same rate, which outputs are increasing? If the first and second
inputs are increasing at the same rate, with the third input unchanged,
which output is most sensitive to the change?

There are many similar questions one can ask with a randomly generated
matrix.

[QUESTION 11.]{.underline} The example of a function acting on square
matrices from the videotext was... intimidating. But let's keep going!
What is the derivative of the functions on square matrices given by
$f(A) = A^{3}$ or even $f(A) = A^{- 1}$ (assuming invertibility)?
Recall, you do not want to compute partials here!

This is intimidating, but it emphasizes the definition of the
derivative, and it gets at the Taylor expansion approach. The cube
function seems easy -- this is just the Binomial Theorem, right?

$$(A + H)^{3} = A^{3} + 3A^{2}H + 3AH^{2} + H^{3}$$

Oops... What have we forgotten here? AH-HA! Commutativity should not be
assumed...

The inverse is harder: use the geometric series to get the 1st-order
term as ${- A}^{- 1}HA^{- 1}$, since

$${(A + H)^{- 1} = \left( A\left( I - \left( - A^{- 1}H \right) \right) \right)^{- 1} = \left( I - \left( - A^{- 1}H \right) \right)^{- 1}A^{- 1} = \left( I - A^{- 1}H + O\left( H^{2} \right) \right)A^{- 1}
}{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  = A^{- 1} - A^{- 1}HA^{- 1} + O\left( H^{2} \right)}$$

What elementary single-variable calculus result does this remind you of?

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the following function:

$$f\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
u^{2}v^{- 3}w \\
2u - 5w \\
uv - vw
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$, showing work.

B\) Evaluate this derivative at the point where
$u = 1,\ v = - 1,\ w = 2$.

C\) If, at this point (using your result from part (B)), all the inputs
are *decreasing* at the same rate, which output is increasing the most?

[PROBLEM 2.]{.underline} At a particular point, a function $f$ has
derivative

$$\lbrack Df\rbrack_{\mathbf{a}} = \begin{bmatrix}
1 & 0 & - 1 \\
4 & - 8 & 2 \\
2 & 13 & - 1 \\
 - 1 & 7 & 0 \\
3 & 0 & - 3
\end{bmatrix}\ $$

A\) How many inputs and outputs does $f$ have?

B\) Assume, at this point, the first input is increasing at a unit rate;
the last input is decreasing at twice this unit rate; and all other
inputs are not changing. Then, at what rates are the outputs changing?

C\) If, at this point, the inputs are changing at a rate such that the
last output is not changing (*i.e.*, the rate of change of the last
output is zero), then what can you say about the rates of change of the
inputs?

[PROBLEM 3.]{.underline} Consider the following function:

$$f\begin{pmatrix}
x \\
y \\
s \\
t
\end{pmatrix} = \begin{pmatrix}
xy^{2} + 5t \\
xs - yt \\
2x - 3y + s^{2}t
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$.

B\) Evaluate this derivative at the point where
$x = 0,\ y = - 1,\ s = 1,\ t = 2$.

C\) If, at this point (using your result from part (B)), the $s$ and $t$
inputs are increasing at a unit rate and the $x$ and $y$ inputs are
decreasing at the same unit rate, which output is changing the least?

[PROBLEM 4.]{.underline} Consider the following function:

$$f\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
(1 + 2x + 3y)^{- 1} \\
e^{2y - 5z} \\
(x + 2)(z - 4)
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$, showing work.

B\) Evaluate this derivative at the origin.

C\) At the origin, which output is most sensitive with respect to which
input? (That is, which input-output pair experiences the largest
changes, when all other inputs are held constant.)

[PROBLEM 5.]{.underline} Consider the following square matrix, depending
on variables $x$ and $y$:

$$A = \begin{bmatrix}
x & 1 & 7x \\
0 & 2 & y \\
0 & x & 3y
\end{bmatrix}$$

A\) Compute and simplify the determinant $\det{(A)}$, showing work.

B\) Define "$sum(A)$" to be the sum of all (nine) entries of the matrix
$A$. Compute and simplify this $sum(A)$, showing work.

C\) Write out explicitly the function
$f:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}$ defined by

$$f\begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
\det(A) \\
sum(A)
\end{pmatrix}\ ,$$

using your results from above. Compute the derivative
$\lbrack Df\rbrack$, showing work.

[PROBLEM 6.]{.underline} Consider the three functions\
$$f\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
(x - y)^{3} \\
(z - 1)(x + 5)
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ g\begin{pmatrix}
s \\
t
\end{pmatrix} = \begin{pmatrix}
5s - t \\
\sin{(2t - 3s)} \\
e^{- s} - e^{- 2t}
\end{pmatrix}\ \ \ \ \ \ :\ \ \ \ \ h = f \circ g$$
*The third,* $h = f \circ g$*, is the composition of* $f$ *with* $g$*.*

A\) Compute the derivatives of $f$ and $g$, evaluated at the origin.

B\) Which of the following compositions are legal?

$$f \circ h\ \ \ \ \ \ :\ \ \ \ \ g \circ h\ \ \ \ \ \ :\ \ \ \ \ h \circ f\ \ \ \ \ :\ \ \ \ \ h \circ g$$

C\) Are there any inputs at which the derivative of $g$ vanishes?

[PROBLEM 7.]{.underline} Consider the following function:

$$f\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
2z - \sqrt{x^{2} + y^{2}} \\
y - \ln{(z^{2} + x^{2})}
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$, showing work.

B\) Evaluate this derivative at the point where
$x = 2,\ y = - 3,\ z = 0$.

C\) If, at this point (using your result from part (B)), all the inputs
are *decreasing* at the same rate, which output is increasing the most?

[PROBLEM 8.]{.underline} Consider the following function:

$$\begin{pmatrix}
y_{1} \\
y_{2} \\
y_{3} \\
y_{4}
\end{pmatrix} = f\left( \begin{array}{r}
\begin{matrix}
x_{1} \\
x_{2} \\
x_{3}
\end{matrix} \\
x_{4}
\end{array} \right) = \begin{pmatrix}
x_{1}^{2} - x_{4}^{3} \\
x_{1}x_{2}x_{3} \\
x_{1} + 2x_{2} + 4x_{4} \\
x_{1} - x_{4} - x_{2}x_{3}
\end{pmatrix}\ $$

A\) Compute the derivative $\lbrack Df\rbrack$.

B\) Which partial derivatives of $f\ $vanish (are equal to zero)? Please
list.

C\) Evaluate $\lbrack Df\rbrack$ at the origin: how many entries of this
matrix are nonzero?

[PROBLEM 9.]{.underline} Consider the following function:

$$f\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
e^{w} - 2u \\
u + v + w \\
u^{2}w \\
uv^{2} - 3w
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$, showing work.

B\) Evaluate this derivative at the point where $u = 1,\ v = 2,\ w = 0$.

C\) If, at this point (using your result from part (B)), all the inputs
are *increasing* at a unit rate, which output is increasing the most?

[PROBLEM 10.]{.underline} Assume that a function $f$ has derivative

$$\lbrack Df\rbrack = \begin{bmatrix}
2y & 2x \\
 - 3 & 5y
\end{bmatrix}\ $$

A\) How many inputs and outputs does $f$ have?

B\) Evaluate this derivative at the point where $x = 2$ and $y = - 1$.

C\) Assume, at some point $(x,y)$, the inputs are changing at rates
$+ 2$ and $- 1$ and the **outputs** are changing at rates $- 2$ and
$+ 4$. Write out an equation that expresses this, using vectors of rates
of change and the derivative $\lbrack Df\rbrack$ as a linear
transformation.

D\) Using the equations from (C) above, solve for which point this is
happening at.

[PROBLEM 11.]{.underline} At a particular point, a function $f$ has
derivative

$$\lbrack Df\rbrack_{\mathbf{a}} = \begin{bmatrix}
1 & - 2 & 3 \\
3 & 7 & 5 \\
 - 2 & - 6 & - 1 \\
4 & - 5 & 3 \\
7 & 0 & - 6
\end{bmatrix}\ $$

A\) How many inputs and outputs does $f$ have?

B\) Assume that at this point you have to change each input
independently, either increasing or decreasing it at a fixed rate. If
your goal is to increase all the outputs, how would you toggle the
changes to the inputs?

[PROBLEM 12.]{.underline} Assume that at a particular input
$\mathbf{a}$, a function $f$ has derivative

$$\lbrack Df\rbrack_{\mathbf{a}} = \begin{bmatrix}
1 & 2 & - 1 & - 5 & - 3 \\
2 & - 7 & - 2 & 1 & 0 \\
 - 3 & 4 & 2 & 3 & 8 \\
0 & 2 & 1 & - 2 & 0 \\
3 & 0 & 6 & 7 & 2 \\
0 & - 7 & 0 & 0 & - 1
\end{bmatrix}\ $$

A\) How many inputs and outputs does $f$ have?

B\) If the first three inputs are increasing at a unit rate and the
remaining inputs are decreasing at twice this rate; then at what rate is
the fourth output changing? Be sure to explain and show your work.

C\) If all the inputs are decreasing at a unit rate, which of the
outputs is increasing the most?

[PROBLEM 13.]{.underline} Assume that a function $f$ has derivative at a
point $\mathbf{a}$ given by

$$\lbrack Df\rbrack_{\mathbf{a}} = \begin{bmatrix}
2 & 0 & 0 \\
0 & 6 & - 3 \\
0 & 3 & - 1
\end{bmatrix}\ $$

A\) If, at this point, the three inputs are changing at rates
$+ 4,\  - 3,$ and $+ 2$ respectively; then, at what rates are the three
outputs changing?

B\) If, at this point, the three inputs are changing at unknown rates
$c_{1},\ c_{2},$ and $c_{3}$ and the three outputs are changing at rates
$- 6,\  - 3$, and $+ 1$ respectively; then, at what rates are the inputs
changing?

[PROBLEM 14.]{.underline} Consider the parametrized surface in 3-D given
by

$$F\begin{pmatrix}
s \\
t
\end{pmatrix} = \begin{pmatrix}
s^{3} - 2t^{2} + 7 \\
(s - 1)(t - 2) \\
s^{2} - t^{2}
\end{pmatrix}\begin{matrix}
 \leftarrow x \\
 \leftarrow y \\
 \leftarrow z
\end{matrix}$$

A\) Compute the derivative of $F.$

B\) Evaluate the derivative of $F$ at $s = 1$ and $t = 2$.

C\) Assume that you start at the point on the surface in which $s = 1$
and $t = 2$. Is it possible to increase both the $s$ and $t$ parameters
at nonzero rates so that both the $y$- and $z$-components on the surface
change at rate zero? If so, at what rates should these inputs be
increased?

[PROBLEM 15.]{.underline} Assume that at a point $\mathbf{a}$, the
derivative of a function $f$ equals

$$\lbrack Df\rbrack_{\mathbf{a}} = \begin{bmatrix}
2 & - 1 & 0 & 0 \\
 - 3 & 1 & 0 & 0 \\
0 & 0 & 3 & - 2 \\
0 & 0 & - 5 & 4
\end{bmatrix}\ \ $$

A\) Which input-output pair has the highest sensitivity here? That is,
which output varies the most with respect to which input?

B\) If the 1^st^ input is changing at rate $+ 1$; the 2^nd^ at rate
$+ 2$; the 3rd at $+ 3$; and the 4^th^ at $+ 4$, then at what rates are
the outputs changing?

C\) If the outputs are all changing at a rate of $+ 1$, at what rates
are the inputs changing?

[PROBLEM 16.]{.underline} Consider linear transformations $f$ and $g$
whose matrices are given by:

$$f\ :\begin{bmatrix}
1 & 3 & 0 & - 2 \\
 - 1 & 0 & 1 & 0 \\
7 & 0 & 1 & 5
\end{bmatrix}\ \ \ \ \ \ \ \ \ \ \ g\ :\begin{bmatrix}
2 & 0 & 1 \\
0 & 2 & - 5 \\
4 & - 1 & 0 \\
 - 1 & 0 & 2
\end{bmatrix}$$

A\) How many inputs and outputs does $f$ have?

B\) Which input-output pair of $g$ is most sensitive to change?

C\) Compute the derivative of the composition $f \circ g$ at the origin.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A/B) $\begin{bmatrix}
2uv^{- 3}w & - 3u^{2}v^{- 4}w & u^{2}v^{- 3} \\
2 & 0 & - 5 \\
v & u - w & - v
\end{bmatrix} \Rightarrow \begin{bmatrix}
 - 4 & - 6 & 1 \\
2 & 0 & - 5 \\
 - 1 & - 1 & 1
\end{bmatrix}$ ; C) 1^st^ output

[PROBLEM 2.]{.underline} A) 3 inputs, 5 outputs ; B)
$\lbrack Df\rbrack_{a}\left( \begin{array}{r}
1 \\
0 \\
 - 2
\end{array} \right) = \left( \begin{array}{r}
3 \\
0 \\
4 \\
 - 1 \\
9
\end{array} \right)$ ; C) the rates of change of the 1^st^ and 3^rd^
inputs are equal

[PROBLEM 3.]{.underline} A/B) $\begin{bmatrix}
y^{2} & 2xy & 0 & 5 \\
s & - t & x & - y \\
2 & - 3 & 2st & s^{2}
\end{bmatrix} \Rightarrow \begin{bmatrix}
1 & 0 & 0 & 5 \\
1 & - 2 & 0 & 1 \\
2 & - 3 & 4 & 1
\end{bmatrix}$ ; C) 2^nd^ output

[PROBLEM 4.]{.underline} A/B) $\begin{bmatrix}
2(1 + 2x + 3y)^{- 2} & 3(1 + 2x + 3y)^{- 2} & 0 \\
0 & 2e^{2y - 5z} & - 5e^{2y - 5z} \\
z - 4 & 0 & x + 2
\end{bmatrix} \Rightarrow \begin{bmatrix}
2 & 3 & 0 \\
0 & 2 & - 5 \\
 - 4 & 0 & 2
\end{bmatrix}$ ; C) the 2^nd^ output and 3^rd^ input

[PROBLEM 5.]{.underline} $f\left( \begin{array}{r}
x \\
y
\end{array} \right) = \left( \begin{array}{r}
5xy \\
9x + 4y + 3
\end{array} \right);\ \ \lbrack Df\rbrack = \begin{bmatrix}
5y & 5x \\
9 & 4
\end{bmatrix}$

[PROBLEM 6.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
0 & 0 & 0 \\
 - 1 & 0 & 5
\end{bmatrix},\ \lbrack Dg\rbrack = \begin{bmatrix}
5 & - 1 \\
 - 3 & 2 \\
 - 1 & 2
\end{bmatrix}$ ; B) $g \circ h$ and $h \circ f$ ; C) no

[PROBLEM 7.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
 - x\left( x^{2} + y^{2} \right)^{- \frac{1}{2}} & - y\left( x^{2} + y^{2} \right)^{- \frac{1}{2}} & 0 \\
 - 2x\left( x^{2} + z^{2} \right)^{- 1} & 1 & - 2z\left( x^{2} + z^{2} \right)^{- 1}
\end{bmatrix}$ ;

B\) $\begin{bmatrix}
 - 2/\sqrt{13} & 3/\sqrt{13} & 2 \\
 - 1 & 1 & 0
\end{bmatrix}$ ; C) $\left( \begin{array}{r}
 - 2 - 1/\sqrt{13} \\
0
\end{array} \right)$

[PROBLEM 8.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2x_{1} & 0 & 0 & - 3x_{4} \\
x_{2}x_{3} & x_{1}x_{3} & x_{1}x_{2} & 0 \\
1 & 2 & 0 & 4 \\
1 & - x_{3} & - x_{2} & - 1
\end{bmatrix}$ ; C) $\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
1 & 2 & 0 & 4 \\
1 & 0 & 0 & - 1
\end{bmatrix}$

[PROBLEM 9.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
 - 2 & 0 & e^{w} \\
1 & 1 & 1 \\
2uw & 0 & u^{2} \\
v^{2} & 2uv & - 3
\end{bmatrix}$ ; B) $\begin{bmatrix}
 - 2 & 0 & 1 \\
1 & 1 & 1 \\
0 & 0 & 1 \\
4 & 4 & - 3
\end{bmatrix}$ ; C) 4^th^ output

[PROBLEM 10.]{.underline} A) 2-by-2 ; B) $\begin{bmatrix}
 - 2 & 4 \\
 - 3 & - 5
\end{bmatrix}$ ; C) $\begin{bmatrix}
2y & 2x \\
 - 3 & 5y
\end{bmatrix}\left( \begin{array}{r}
2 \\
 - 1
\end{array} \right) = \left( \begin{array}{r}
 - 2 \\
4
\end{array} \right)$ ; D) $\left( \begin{array}{r}
x \\
y
\end{array} \right) = \left( \begin{array}{r}
 - 3 \\
 - 2
\end{array} \right)$

[PROBLEM 11.]{.underline} A) 3 inputs, 5 outputs ; B) increase;
decrease; increase

[PROBLEM 12.]{.underline} A) 5 inputs, 6 outputs ; B) $7$ ; C) 6^th^
output

[PROBLEM 13.]{.underline} A) $\left( \begin{array}{r}
8 \\
 - 24 \\
 - 11
\end{array} \right)$ ; B) $\left( \begin{array}{r}
c_{1} \\
c_{2} \\
c_{3}
\end{array} \right) = \begin{bmatrix}
1/2 & 0 & 0 \\
0 & - 1/3 & 1 \\
0 & - 1 & 2
\end{bmatrix}\left( \begin{array}{r}
 - 6 \\
 - 3 \\
1
\end{array} \right) = \left( \begin{array}{r}
 - 3 \\
2 \\
5
\end{array} \right)$

[PROBLEM 14.]{.underline} A/B) $\lbrack Df\rbrack = \begin{bmatrix}
3s^{2} & - 4t \\
t - 2 & s - 1 \\
2s & - 2t
\end{bmatrix} \Rightarrow \begin{bmatrix}
3 & - 8 \\
0 & 0 \\
2 & - 4
\end{bmatrix}$ ; B) $\left( \begin{array}{r}
h_{s} \\
h_{t}
\end{array} \right) = C\left( \begin{array}{r}
2 \\
1
\end{array} \right)$

[PROBLEM 15.]{.underline} A) 3^rd^ input, 4^th^ output ; B)
$\lbrack Df\rbrack_{a}\left( \begin{array}{r}
1 \\
2 \\
3 \\
4
\end{array} \right) = \left( \begin{array}{r}
0 \\
 - 1 \\
1 \\
1
\end{array} \right)$ ; C) $\left( \begin{array}{r}
 - 2 \\
 - 5 \\
3 \\
4
\end{array} \right)$

[PROBLEM 16.]{.underline} A) 4 inputs, 3 outputs; B) 2^nd^ output, 3^rd^
input ; C) as these are linear transformations, composition is matrix
multiplication and

$$\left\lbrack D(f \circ g) \right\rbrack = \begin{bmatrix}
1 & 3 & 0 & - 2 \\
 - 1 & 0 & 1 & 0 \\
7 & 0 & 1 & 5
\end{bmatrix}\begin{bmatrix}
2 & 0 & 1 \\
0 & 2 & - 5 \\
4 & - 1 & 0 \\
 - 1 & 0 & 2
\end{bmatrix} = \begin{bmatrix}
4 & 6 & - 18 \\
 - 6 & - 1 & - 1 \\
14 & - 1 & 17
\end{bmatrix}$$

## Week 6 : Differentiation

### OUTLINE

MATERIALS: Calculus BLUE : Vol 2 : Chapters 5-8

TOPICS:

- Differentiation as a linear operator

- The Chain Rule and composition

- Applications of the Chain Rule

- BONUS : the material derivative

- Inverse functions in multivariate setting

- The Inverse Rule for derivatives

- The Inverse Function Theorem and its uses

- BONUS : The Implicit Function Theorem and its uses

LEARNING OBJECTIVES:

- Use linearity to compute derivatives of linear combinations of
  functions

- Infer when functions can and cannot be composed

- Use the Chain Rule to compute derivatives of compositions

- Explain the idea of an inverse of a multivariate function

- Explain the difference between local and global invertibility

- Use the Inverse Function Theorem to determine local invertibility

- Use the Inverse Rule to determine the derivative of an inverse

### PRIMER

Recall from single-variable calculus the array of derivative rules which
make computing derivatives of very complicated functions relatively
procedural. This week, we recapitulate those rules and push out to new
and deeper results.

THE RULES. The primal differentiation rule is that of linearity:
differentiation is a *linear operator*. This is perhaps remembered from
single variable calculus as the *Addition Rule*, though linearity also
entails a scalar-multiplication rule as well. For any scalar $c$ and
compatibly-sized functions $f$ and $g$,
$\left\lbrack D(f + cg) \right\rbrack = \lbrack Df\rbrack + c\lbrack Dg\rbrack$.

The most useful differentiation rule is the *Chain Rule* governing the
composition of functions. Recall from single variable calculus that,
given functions $g$ and $f$, the derivative of the composition
$f \circ g$ is the product of the derivatives, evaluated at the correct
points: $(f \circ g)_{a}' = f_{g(a)}'g_{a}'$. (If not immediately
recalled, take the derivative of $\sin x^{2}$ to see what the hand
remembers.) The Chain Rule for multivariate functions is the same. For
compatibly sized functions $g:\mathbb{R}^{n} \rightarrow \mathbb{R}^{p}$
and $f:\mathbb{R}^{p} \rightarrow \mathbb{R}^{m}$ the composition
$(f \circ g):\mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ has derivative

$$\left\lbrack D(f \circ g) \right\rbrack_{\mathbf{a}} = \lbrack Df\rbrack_{g\left( \mathbf{a} \right)}\lbrack Dg\rbrack_{\mathbf{a}}\ .$$

Matrix multiplication is what converts the derivatives of $f$ and $g$
into that of their composition. This is not a surprising result -- we
saw in Week 4 that composition of linear transformations corresponds to
multiplication of the matrices together (in the correct order!). What is
surprising is that 20^th^ century calculus texts did not use matrices to
explain the Chain Rule, relying instead on memorization of many
different formulae depending on the number of inputs and outputs.

The Chain Rule can be used to derive most other interesting
differentiation rules, such as the product rules for dot products and
cross products from Week 2. In fact, the Chain Rule can use simple
matrix multiplication to re-derive the classical product rule for single
variable functions, should one want to do such a thing. More advanced
applications (such as the *material derivative* from Mechanics) are
possible. Though not essential to our story, this is a quick sidequest.
Consider an elastic or fluid substance of particles (a continuum of
atoms, if one prefers) at positions $\mathbf{x}(t)$ (usually in
$\mathbb{R}^{2}$ or $\mathbb{R}^{3}$) where $t$ is time -- the substance
is bending or flowing. A time-dependent function $h(\mathbf{x},t)$,
perhaps representing pressure or temperature of the substance, is in
fact a composition, since $\mathbf{x} = \mathbf{x}(t)$. The *material
derivative* of $h$ measures how $h$ changes in time, from the
perspective of a moving particle. In Physics, this has a special
notation:

$$\frac{D}{Dt}h = \frac{\partial h}{\partial t} + \frac{\partial h}{\partial\mathbf{x}}\frac{\partial\mathbf{x}}{\partial t}\ .$$

This is, of course, just the Chain Rule in action for $h$ as a function
of $t$. (See the Epilogue for the use of this notation in fluid
dynamics.)

THE INVERSE FUNCTION THEOREM. Recall from pre-calculus the notion of the
inverse of a function of one variable. One says
$f\mathbb{:R \rightarrow R}$ is invertible if there exists an inverse,
denoted $f^{- 1}$, such that $f^{- 1}\left( f(x) \right) = x$ and
$f\left( f^{- 1}(y) \right) = y$ for all $x$ and $y$. The same
definition holds in the multivariate case where $f$ and $f^{- 1}$ have
domain and range in $\mathbb{R}^{n}$ (the number of variables *must* be
equal). The computation of an inverse (much less its existence!) is
difficult in all but the simplest settings. For a linear function,
$f\left( \mathbf{x} \right) = A\mathbf{x}$, where $A$ is an $n$-by-$n$
matrix, we know that the inverse function
$f^{- 1}\left( \mathbf{y} \right) = A^{- 1}\mathbf{y}$ exists only when
$\det A \neq 0$ (thus guaranteeing the inverse matrix $A^{- 1}$ exists).
This points the way to a deep theorem about the existence of an inverse
function.

The *Inverse Function Theorem* states that a continuously differentiable
function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ is guaranteed to
be *locally invertible* at input $\mathbf{a}$ and output
$f\left( \mathbf{a} \right)$ if the derivative of $f$ at $a$ is
invertible; that is, if ${\det\lbrack Df\rbrack}_{a} \neq 0$.
Furthermore, if the inverse exists, its derivative is
$\left\lbrack Df^{- 1} \right\rbrack_{f(a)} = \lbrack Df\rbrack_{a}^{- 1}$.
This follows from the Chain Rule applied to the definition of an
inverse:

$$f \circ f^{- 1} = id = f^{- 1} \circ f\ \ \ \ \  \Rightarrow \ \ \ \ \ \ \ \lbrack Df\rbrack\left\lbrack Df^{- 1} \right\rbrack = I = \left\lbrack Df^{- 1} \right\rbrack\lbrack Df\rbrack\ .$$

There are a few subtleties here. The first is the sufficiency of the
criterion. If the derivative is invertible, then the function is
\[locally\] invertible as well; if the derivative is not invertible,
then the criterion fails, and more work is needed to determine
invertibility. It is not an if-and-only-if condition. This can be seen
from the simple single-variable example of $f(x) = x^{3}$, whose inverse
$f^{- 1}(y) = \sqrt[3]{y}$ exists, even though $f'(0) = 0$.

The second and more difficult subtlety is the local nature of the
theorem. The function $f(x) = x^{2}$ is locally invertible about any
point that is nonzero, though the size of the domain on which
invertibility holds shrinks as one approaches a zero derivative. It is
fantastic that the Inverse Function Theorem can guarantee invertibility
of the fully nonlinear function based only on linear data: the price of
that power is a loss of certainty about the region on which
invertibility holds.

THE IMPLICIT FUNCTION THEOREM. The Inverse Function Theorem is the
shadow of a much deeper and more fundamental result. Consider an
implicit equation (or, really, $m$ equations) of the form

$$F\left( \mathbf{x,y} \right) = \mathbf{0}\ ,$$

where $F:\mathbb{R}^{n + m} \rightarrow \mathbb{R}^{m}$ with input
variables split into $\mathbf{x} \in \mathbb{R}^{n}$ and
$\mathbf{y} \in \mathbb{R}^{m}$. We say that one can solve implicitly
for the $\mathbf{y}$ variables in terms of the $\mathbf{x}$ variables if
there is a function $\mathbf{y} = \mathbf{y}(\mathbf{x})$ which
satisfies the equation
$F\left( \mathbf{x},\ \mathbf{y}\left( \mathbf{x} \right) \right) = \mathbf{0}$.
When is this possible? Not always, as one recalls from such simple
equations as $x^{2} + y^{2} - 1 = 0$. The Implicit Function Theorem says
the one can solve for $\mathbf{y} = \mathbf{y}(\mathbf{x})$ locally,
about some input $\mathbf{a}$, if a certain square submatrix of the
derivative $\lbrack DF\rbrack$ is invertible, *i.e.*:

$$\det\left\lbrack \frac{\partial F}{\partial\mathbf{y}} \right\rbrack \neq 0\ .$$

In this case, one is guaranteed to have a *local* solution
$\mathbf{y} = \mathbf{y}(\mathbf{x})$ whose derivative is:

$$\left\lbrack \frac{\partial\mathbf{y}}{\partial\mathbf{x}} \right\rbrack\  = \  - \left\lbrack \frac{\partial F}{\partial\mathbf{y}} \right\rbrack^{- 1}\left\lbrack \frac{\partial F}{\partial\mathbf{x}} \right\rbrack\ .$$

This is extremely useful, as it allows one to linearly approximate a
solution to a set of nonlinear equations with simple partial derivative
data and matrix algebra. Applications of this result range from
Economics to GPS and much more: we will use the IFT in two weeks when
doing constrained optimization.

### DISCUSSION

[QUESTION 1.]{.underline} Consider the following three derivatives of
functions, each of which takes the origin to the origin:

$$\lbrack Df\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & - 3 & 0 \\
2 & 7 & 1 \\
3 & 1 & 0 \\
 - 2 & 1 & - 1
\end{bmatrix}\ \ \ \ :\ \ \ \lbrack Dg\rbrack_{\mathbf{0}} = \begin{bmatrix}
4 & 1 \\
 - 2 & 0 \\
 - 1 & 1
\end{bmatrix}\ \ \ \ :\ \ \ \lbrack Dh\rbrack_{\mathbf{0}} = \begin{bmatrix}
3 & - 1 & 2 & 0 \\
 - 2 & - 1 & 1 & 4 \\
5 & 0 & 4 & 2
\end{bmatrix}\ $$

Which derivatives (at the origin) can you compute among:

$$f \circ g\ \ \ \ :\ \ \ \ g \circ f\ \ \ \ :\ \ \ \ g \circ h\ \ \ \ :\ \ \ \ h \circ g\ \ \ \ :\ \ \ \ f \circ h\ \ \ \ :\ \ \ h \circ f$$

This is a good time to emphasize that you can tell a lot about a
function by knowing its derivative at a single point. How many inputs
and how many outputs do these functions have? Why did we have to specify
that these functions all take the origin to the origin? Wait, is this
the same origin?

[QUESTION 2.]{.underline} Do you remember your differentiation rules?
Let's recall... \[*have students recall the product and quotient
rules...*\] Are there any other rules you remember? \[*students will
likely recall the inverse rule, so go ahead and see how that follows
from the Chain Rule...*\] What else follows from the Chain Rule? What
about:

$$\left( {u(x)}^{v(x)} \right)' = u'vu^{v - 1} + v'u^{v}\ln u$$

Use a simple matrix product to derive this result, using
$f(u,v) = u^{v}$ and $g(x) = \left( u(x),\ v(x) \right)^{T}$. Having
just reviewed some partial derivatives, the rest is straightforward. It
is worth reminding students that memorizing this formula is not
productive -- once you internalize the Chain Rule.

[QUESTION 3.]{.underline} What is the derivative of the square
$f\left( \mathbf{x} \right) = Q^{2}$ of the quadratic form
$Q\left( \mathbf{x} \right) = \mathbf{x} \cdot A\mathbf{x}$?

The derivative of $Q$ was computed in the videolectures to be
$\lbrack DQ\rbrack = \mathbf{x}^{T}\left( A + A^{T} \right)$. This
problem makes students very uncomfortable -- the notation is unfamiliar.
If this happens, encourage them to work out a simple example and verify
what the notation means.

[QUESTION 4.]{.underline} You might remember from single-variable
calculus using the Chain Rule together with the Fundamental Theorem of
Integral Calculus to compute derivatives of integrals with respect to a
variable appearing in the limits of integration. Use the multivariate
Chain Rule to redo this, computing $\lbrack DF\rbrack$ where:

$$F(x) = \int_{g(x)}^{h(x)}{f(t)\ dt}$$

Is this a multivariable problem at all? Does matrix multiplication help
here? This is a good problem for generating discussion, as well as
recalling the importance of integrals and the FTIC.

[QUESTION 5.]{.underline} Consider the function
$F:\mathbb{R}^{3} \rightarrow \mathbb{R}^{3}$ given by

$$F\left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
u \\
v \\
w
\end{array} \right) = \left( \begin{array}{r}
\arctan{(x + y)} \\
3x - 2z \\
1 + e^{y + z}
\end{array} \right)\ .$$

Is this function locally invertible near $x = y = z = 0$? Is it
everywhere locally invertible? Is it in fact invertible?

Students often struggle with the logical details of a
sufficient-but-not-necessary criterion. Thinking in terms of domain and
range (or codomain) is helpful in this particular case. For a twist,
replace the arctangent function with a cube root.

[QUESTION 6.]{.underline} The following are three implicitly defined
surfaces:

$$x^{2} + y^{2} + z^{2} = C_{1}\ \ \ \ \ \ :\ \ \ \ \ z - \cosh x - \cosh y = C_{2}\ \ \ \ \ \ :\ \ \ \ \ x - 2yz + y^{3} = C_{3}$$

These surfaces all intersect at the point $(1,1,2)$ when
$C_{1} = \sqrt{6}\ ,\ C_{2} = 2 - e - 1/e$, and $C_{3} = - 2$. What does
this point of mutual intersection do when you wiggle the three constants
$\{ C_{i}\}$? Does it fill in a neighborhood of $(1,1,2)$? Or does it
trace out some lower-dimensional set?

This certainly seems like a strange question that has nothing to do with
this week's material. Try to get students to think of the $C_{i}$ as
variables $\mathbf{C}$ and the point of common intersection
$\mathbf{x} = (x,y,z)^{T}$ in terms of a function
$\mathbf{C} = F\left( \mathbf{x} \right).$ What does the Inverse
Function Theorem mean in this setting? What does local invertibility
imply about the intersection point? Where should the derivative
$\lbrack DF\rbrack$ be evaluated?

[QUESTION 7.]{.underline} Let's practice the Implicit Function Theorem,
starting with the 1-D case. Where can we solve for $y = y(x)$ given that
$xe^{y} - ye^{x} = 1$?

*For the Implicit Function Theorem, students will be intimidated: review
the statement that for* $F\left( \mathbf{x,y} \right) = \mathbf{0}$*, we
have*
$\left\lbrack \frac{\partial y}{\partial x} \right\rbrack = \left\lbrack \frac{\partial F}{\partial y} \right\rbrack_{\ }^{- 1}\left\lbrack \frac{\partial F}{\partial x} \right\rbrack$
if the inverse exists, beginning with the 1-D version.

[QUESTION 8.]{.underline} A fully nonlinear problem: can you solve for
$c,\ d$ as a function of $a,b,\ $ given that $ab - bc + cd = 8\ $and
$a + 2b - 2c + 4d = 12$, assuming that you are near
$a = 1,\ b = 2,\ c = 3,\ d = 4$.

What does this problem mean geometrically? What does the solution to
these equations look like in 4-D? Try to get students to think in terms
of dimension and degrees of freedom (if only we had a name for this
quantity... that's for next semester!)

[QUESTION 9.]{.underline} Now, repeat the last problem in the linear
case. Assuming that you are near $a = 0,\ b = 1,\ c = - 2,\ d = 2$ solve
for $c,d$ if

$${a + 2b - 3c + 4d = 16
}{- 2a + b + c - 3d = - 7}$$

Do this via the IFT, then do it explicitly using row reduction. Ahha!
Row reduction and back-substitution is really the linear version of what
the IFT is doing...

[QUESTION 10.]{.underline} Recall the infinite power tower, defined
implicitly via $y = x^{y}$. In single-variable calculus, one uses
logarithmic differentiation to show that the derivative (where it
exists) equals

$$\frac{dy}{dx} = \frac{y^{2}}{x\left( 1 - \ln x \right)}$$

The derivative does not exist everywhere, because we cannot solve for
$y = y(x)$ everywhere given the implicit equation $y = x^{y}$. What does
the Implicit Function Theorem say about this situation?

This should cause some perhaps frustrated or confused discussion. What
is the function $F(x,y) = 0$ to be used? Try to get students to remember
that the IFT is a local result and requires evaluation at a particular
point. In this case, constraining $\partial F/\partial y$ is doable:
$1 - x^{y}\ln x \neq 0$. But what does this mean in terms of bounds on
$x$? Clearly $x = y = 1$ is a solution. Methods outside the bounds of
this course can give a precise interval of convergence:
$e^{- e} \leq x \leq e^{1/e}$.

[QUESTION 11.]{.underline} Consider the following multivariate infinite
power tower defined by $z(x,y) = x^{y^{z}}$. This is like the original
infinite power tower, but with two inputs ($x$ and $y$) with alternating
powers. Are there any values of $x$ and $y$ at which this function can
be said to exist, and can you compute its derivative?

This is an unreasonably intimidating and difficult problem: not
recommended unless students want a real challenge. Still, it is doable
with enough labor.

[QUESTION 12.]{.underline} Can you show that the Inverse Function
Theorem is a special case of the Implicit function theorem?

Try starting with a recollection of inverse and implicit functions.
Because of the confusion possible between variable names, it might be
best to start with $\mathbf{u} = f(\mathbf{v})$ and look for a putative
inverse of the form $\mathbf{v} = g(\mathbf{u})$. Beginning with the
equations
$F\left( \mathbf{u,v} \right) = \mathbf{u} - f\left( \mathbf{v} \right) = \mathbf{0}$,
then, what would it mean to have $\mathbf{v =}g\mathbf{(u)}$ satisfying
$\mathbf{u - f}\left( \mathbf{g}\left( \mathbf{u} \right) \right)\mathbf{= 0}$?
In this case, what derivatives do you have to check? Remember, an
inverse must also satisfy
$\mathbf{g}\left( \mathbf{f}\left( \mathbf{v} \right) \right)\mathbf{= v}$**:**
does this cause any trouble?

[QUESTION 13.]{.underline} When training a simple single-layer neural
network, one is interested in a loss function of the form
$\mathcal{L =}\sigma(W\mathbf{x} + \mathbf{b})$, where $\mathbf{x}$ is a
vector of input \[training\] data, $W$ is a matrix of *weights*,
$\mathbf{b}$ is a *bias vector*, and $\sigma\mathbb{:R \rightarrow R}$
is an activation function acting componentwise (sigmoids, and hyperbolic
tangents being common choices). To train the network, one needs to
compute the derivative of $\mathcal{L}$ with respect to $W$ and
$\mathbf{b}$ (not with respect to $\mathbf{x}$). In this simple
single-layer case, what is the derivative?

*This is not exactly difficult, but it feels unnatural to regard*
$\mathbf{x}$ *as the constants. The only reason to do this problem is to
see whether students will ask questions such as "Why do we need to
compute this derivative?" or "What happens with a multi-layer network?"
or "who cares about this at all?" -- all good questions, providing an
excellent opportunity to hint at the method of backpropagation as the
Chain Rule in action.*

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the functions

$$f\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
x^{2} + yz^{2} \\
2x + y^{3} - z
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ g\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
u^{2} - v^{2} \\
uv \\
3u - 2v
\end{pmatrix}$$

A\) Compute the derivatives of $f$ and $g$.

B\) Using the Chain Rule, compute the derivative of the composition
$f \circ g$ at the point where all its inputs equal $+ 1$.

C\) If all inputs of $f \circ g$ equal $+ 1$ and are decreasing at a
unit rate, at what rate is the first output of $f \circ g$ changing?

[PROBLEM 2.]{.underline} Consider the following functions:

$$f\begin{pmatrix}
a \\
b \\
c
\end{pmatrix} = \begin{pmatrix}
\ln{(ab)} \\
abc \\
b^{2} + 5c
\end{pmatrix}\ \ \ \ :\ \ \ \ g\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
x^{2} + y^{2}z \\
2x - y + 3z \\
3x - 4y
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Dg\rbrack$ and evaluate at
$x = 1,\ y = 0,\ z = 0$.

B\) Compute the derivative $\lbrack Df\rbrack$.

C\) Use the Chain Rule to compute $\lbrack D(f \circ g)\rbrack$
evaluated at $x = 1,\ y = 0,\ z = 0$.

[PROBLEM 3.]{.underline} Consider the following functions:

$$f\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
2u - 3v \\
(u - v)^{- 1} \\
uv
\end{pmatrix}\ \ \ \ :\ \ \ \ g\begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
x + 2y \\
(x - 3y)^{2}
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$.

B\) Compute the derivative $\lbrack Dg\rbrack$.

C\) Use the Chain Rule to compute the derivative of $f \circ g$
evaluated at $(1,1)$.

[PROBLEM 4.]{.underline} Consider the following functions:

$$h\begin{pmatrix}
s \\
t
\end{pmatrix} = \begin{pmatrix}
t^{2} - 4s \\
\ln{(1 - s)} \\
e^{s - t^{2}}
\end{pmatrix}\ \ \ \ :\ \ \ \ g\begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
3x + 2y \\
1 - x + y^{2}
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Dh\rbrack$.

B\) Compute the derivative $\lbrack Dg\rbrack$.

C\) Use the Chain Rule to compute the derivative of $h \circ g$ at the
origin.

[PROBLEM 5.]{.underline} There are three functions, $f,\ g,\ \&\ h$,
each of which sends the origin to the origin. At the origin, these
functions have derivatives equal to:

$$\lbrack Df\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & 2 & 0 & 4 \\
2 & 0 & - 1 & 0 \\
0 & 3 & - 3 & - 2
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dg\rbrack_{\mathbf{0}} = \begin{bmatrix}
 - 1 & 0 & 2 \\
0 & 2 & 0 \\
 - 3 & 5 & 1 \\
2 & 0 & 1
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dh\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & 0 \\
3 & 2 \\
 - 2 & - 1
\end{bmatrix}\ \ \ $$

A\) How many inputs and outputs does the function
$f \circ g \circ h = f(g(h))$ have?

B\) Compute the derivative at the origin,
$\left\lbrack D(f \circ g \circ h) \right\rbrack_{\mathbf{0}}$.

C\) If, at the origin, the inputs of $f \circ g \circ h$ are all
changing at the same rate, which output is most sensitive to the change?

[PROBLEM 6.]{.underline} There are three differentiable functions,
$f,\ g,\ \&\ h$, each of which sends the origin to the origin. At the
origin, these functions have derivatives equal to:

$$\lbrack Df\rbrack_{\mathbf{0}} = \begin{bmatrix}
2 & 0 & 2 \\
0 & 1 & 0 \\
3 & 1 & 0 \\
0 & 0 & 4
\end{bmatrix}\ \ \ \ :\ \ \ \lbrack Dg\rbrack_{\mathbf{0}} = \begin{bmatrix}
 - 1 & 0 & 3 \\
0 & 1 & - 6 \\
7 & 5 & - 2 \\
2 & 0 & 1
\end{bmatrix}\ \ \ \ :\ \ \ \lbrack Dh\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & 2 & 0 & 1 \\
2 & 0 & - 1 & 0 \\
0 & 3 & 0 & 2
\end{bmatrix}\ \ \ \ \ \ $$
A) Which of the following compositions are not legal?

$$(f \circ g)\ \ \ \ :\ \ \ (g \circ g)\ \ \ \ :\ \ \ (g \circ h)\ \ \ \ :\ \ (h \circ g)\ \ \ :\ \ (f \circ h)\ \ \ :\ \ (h \circ h)$$

B\) Compute the derivative at the origin,
$\left\lbrack D(g - 3f) \right\rbrack_{\mathbf{0}}$.

C\) Compute the derivative of $h \circ f\ $at the origin.

[PROBLEM 7.]{.underline} There are three differentiable functions,
$f,\ g,\ \&\ h$, each of which sends the origin to the origin. At the
origin, these functions have derivatives equal to:

$$\lbrack Df\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & 2 & 0 & 4 \\
2 & 0 & - 1 & 0 \\
0 & 3 & - 3 & - 2
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dg\rbrack_{\mathbf{0}} = \begin{bmatrix}
 - 1 & 0 & 2 \\
0 & 2 & 0 \\
 - 3 & 5 & 1 \\
2 & 0 & 1
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dh\rbrack_{\mathbf{0}} = \begin{bmatrix}
7 & - 9 \\
5 & 3 \\
2 & - 4
\end{bmatrix}\ \ \ $$
A) Which compositions are not legal?

$$(f \circ g)\ \ \ \ :\ \ \ (g \circ f)\ \ \ \ :\ \ \ (g \circ h)\ \ \ \ :\ \ \ (h \circ g)\ \ \ :\ \ (f \circ h)\ \ \ :\ \ (h \circ f)$$

B\) Compute the derivative at the origin,
$\left\lbrack D(g \circ f) \right\rbrack_{\mathbf{0}}$.

C\) If, at the origin, the first input of $g \circ f$ is increasing and
the remaining inputs are decreasing at the same rate, which output is
most sensitive to the change?

[PROBLEM 8.]{.underline} Consider the following functions:

$$f\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
u^{2} - 3vw \\
uv^{2} - uvw
\end{pmatrix} = \begin{pmatrix}
s \\
t
\end{pmatrix}\ \ \ :\ \ g\begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
x - y \\
x^{2} - y^{2} \\
2x - 3y
\end{pmatrix} = \begin{pmatrix}
a \\
b \\
c
\end{pmatrix}$$

A\) Compute the derivatives $\lbrack Df\rbrack$ and $\lbrack Dg\rbrack$.

B\) What are the input and output variables of the composition
$f \circ g$?

C\) Compute the derivative of the composition
$D\lbrack f \circ g\rbrack$ at the point where all inputs are equal to
$2$.

D\) Is the composition $f \circ g$ invertible locally where all the
inputs equal $2$?

[PROBLEM 9.]{.underline} There are three functions, $f,\ g,\ \&\ h$,
each of which sends the origin to the origin. At the origin, these
functions have derivatives equal to:

$$\lbrack Df\rbrack_{\mathbf{0}} = \begin{bmatrix}
3 & 0 & 1 & 0 \\
0 & - 1 & 2 & - 1
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dg\rbrack_{\mathbf{0}} = \begin{bmatrix}
0 & 2 \\
4 & 3 \\
1 & - 1 \\
0 & - 3
\end{bmatrix}\ \ \ :\ \ \ \lbrack Dh\rbrack_{\mathbf{0}} = \begin{bmatrix}
1 & 3 \\
 - 2 & 0
\end{bmatrix}\ \ \ $$

A\) How many inputs and outputs does the function
$f \circ g \circ h = f(g(h))$ have?

B\) Compute the derivative at the origin of the composition
$f \circ g \circ h$.

C\) Compute the derivative at the origin of $h^{- 1}$, the inverse of
$h$.

[PROBLEM 10.]{.underline} Consider the functions

$$f\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
u^{2} - 3v \\
uv - w^{2}
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ g\begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
xy^{2} \\
x^{2} - y \\
3y
\end{pmatrix}$$

A\) Compute the derivatives of $f$ and $g$.

B\) Compute the derivative of the composition $f \circ g$ at the point
where all its inputs equal $+ 1$.

C\) If all inputs of $f \circ g$ equal $+ 1$ and are decreasing at a
unit rate, at what rate is the last output changing?

[PROBLEM 11.]{.underline} Consider the following functions:

$$f\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
x^{2}y - 2z \\
x - y^{3} + z^{2} \\
x - y + z
\end{pmatrix}\ \ \ \ :\ \ g\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
2u \\
v + w \\
2v + 3w
\end{pmatrix}\ \ \ \ \ $$

A\) Compute the derivatives $\lbrack Df\rbrack$ and $\lbrack Dg\rbrack$
evaluated at the origin.

B\) Compute the derivative of $f$ composed with $g$ at the origin,
$\left\lbrack D(f \circ g) \right\rbrack_{\mathbf{0}}$.

C\) Explain: how would you compute the derivative of $f^{- 1}$, the
inverse of $f$, at the origin?

[PROBLEM 12.]{.underline} Is the function

$$G\begin{pmatrix}
x \\
y \\
u \\
v
\end{pmatrix} = \begin{pmatrix}
2u + 3v \\
v + \left( 1 - u^{2} \right)^{- 1} \\
2y - \sin x \\
e^{x} - \cos{3x} + ln(1 - 2y)
\end{pmatrix}$$

locally invertible near the origin? Explain.

[PROBLEM 13.]{.underline} Is the function

$$F\begin{pmatrix}
x \\
y \\
z \\
t
\end{pmatrix} = \begin{pmatrix}
2t - \sin z \\
z - e^{t} - 1 \\
e^{3x} - 2y - 1 \\
2x + ln(1 + y)
\end{pmatrix}$$

locally invertible near the origin?

[PROBLEM 14.]{.underline} Consider the functions

$$f\left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
u \\
v
\end{array} \right) = \left( \begin{array}{r}
xy + yz \\
xz + yz
\end{array} \right)\ \ \ \ \ :\ \ \ \ g\left( \begin{array}{r}
s \\
t
\end{array} \right) = \left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
s^{2}\cos t \\
s^{2}\sin t \\
s
\end{array} \right)$$

A\) Compute the derivatives of $f$ and $g$

B\) Argue carefully that one can solve for $(s,t)$ as functions of
$(u,v)$ when $(s,t)$ is close to the point
$\left( 2,\frac{\pi}{2} \right)$ and $(u,v)$ is close to $(8,8)$.

[PROBLEM 15.]{.underline} Consider the function

$$f\left( \begin{array}{r}
x \\
y \\
z
\end{array} \right) = \left( \begin{array}{r}
u \\
v \\
w
\end{array} \right) = \left( \begin{array}{r}
e^{x + 2y} - 1 \\
1 + 3x - \cos{2y} \\
\sin{(3x + y^{3} - 2z)}
\end{array} \right)$$

A\) Compute the derivative $\lbrack Df\rbrack$ evaluated at the origin.

B\) Explain: what does the Inverse Function Theorem say about the
invertibility of this $f$ near the origin?

[PROBLEM 16.]{.underline} Consider the equations

$${2uv = \sin(x - 2y)
}{4x + y^{2} = 3u + e^{v}}$$

A\) Write these equations in the form $F = 0$ for $F$ some function of 4
inputs and 2 outputs. What is $F$?

B\) Use the Implicit Function Theorem to show that near the origin, one
can solve for $x = x(u,v)$ and $y = y(u,v)$ while satisfying these
equations.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2x & z^{2} & 2yz \\
2 & 3y^{2} & - 1
\end{bmatrix},\ \lbrack Dg\rbrack = \begin{bmatrix}
2u & - 2v \\
v & u \\
3 & - 2
\end{bmatrix}$ ; B) $g\left( \begin{array}{r}
1 \\
1
\end{array} \right) = \left( \begin{array}{r}
0 \\
1 \\
1
\end{array} \right)$ so that
$\left\lbrack D(f \circ g) \right\rbrack_{1,1} = \begin{bmatrix}
0 & 1 & 2 \\
2 & 3 & - 1
\end{bmatrix}\begin{bmatrix}
2 & - 2 \\
1 & 1 \\
3 & - 2
\end{bmatrix} = \begin{bmatrix}
7 & - 3 \\
4 & 1
\end{bmatrix}$ ; C) $- 7 - ( - 3) = - 4$

[PROBLEM 2.]{.underline} A) $\lbrack Dg\rbrack_{0} = \begin{bmatrix}
2 & 0 & 0 \\
2 & - 1 & 3 \\
3 & - 4 & 0
\end{bmatrix}$; B) $\lbrack Df\rbrack = \begin{bmatrix}
b/a & a/b & 0 \\
bc & ab & ab \\
0 & 2b & 5
\end{bmatrix}$ ; C) $g\left( \begin{array}{r}
1 \\
0 \\
0
\end{array} \right) = \left( \begin{array}{r}
1 \\
2 \\
3
\end{array} \right)$, so

by the Chain Rule,
$\left\lbrack D(f \circ g) \right\rbrack_{1,0,0} = \begin{bmatrix}
1 & 1/2 & 0 \\
6 & 3 & 2 \\
0 & 4 & 5
\end{bmatrix}\begin{bmatrix}
2 & 0 & 0 \\
2 & - 1 & 3 \\
3 & - 4 & 0
\end{bmatrix} = \begin{bmatrix}
3 & - 1/2 & 3/2 \\
24 & - 11 & 9 \\
23 & - 24 & 12
\end{bmatrix}$

[PROBLEM 3.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2 & - 3 \\
 - (u - v)^{- 2} & (u - v)^{- 2} \\
v & u
\end{bmatrix}$ ; B) $\lbrack Dg\rbrack = \begin{bmatrix}
1 & 2 \\
2(x - 3y) & - 6(x - 3y)
\end{bmatrix}$ ; C) since $g\left( \begin{array}{r}
1 \\
1
\end{array} \right) = \left( \begin{array}{r}
3 \\
4
\end{array} \right)$, by the Chain Rule,

$$\left\lbrack D(f \circ g) \right\rbrack_{1,1} = \begin{bmatrix}
2 & - 3 \\
 - ( - 1)^{- 2} & ( - 1)^{- 2} \\
4 & 3
\end{bmatrix}\begin{bmatrix}
1 & 2 \\
 - 4 & 12
\end{bmatrix} = \begin{bmatrix}
14 & - 32 \\
 - 5 & 10 \\
 - 8 & 44
\end{bmatrix}$$

[PROBLEM 4.]{.underline} A)
$\lbrack Dh\rbrack = \left\lbrack - \begin{matrix}
 - 4 & 2t \\
(1 - s)^{- 1} & 0 \\
e^{s - t^{2}} & - 2te^{s - t^{2}}
\end{matrix} \right\rbrack$ ; B) $\lbrack Dg\rbrack = \begin{bmatrix}
3 & 2 \\
 - 1 & 2y
\end{bmatrix}$ ; C) $g\left( \begin{array}{r}
0 \\
0
\end{array} \right) = \left( \begin{array}{r}
0 \\
1
\end{array} \right)$, so, by the Chain Rule
$\left\lbrack D(h \circ g) \right\rbrack_{0} = \begin{bmatrix}
 - 4 & 2 \\
 - 1 & 0 \\
1/e & - 2/e
\end{bmatrix}\begin{bmatrix}
3 & 2 \\
 - 1 & 2y
\end{bmatrix} = \begin{bmatrix}
 - 14 & - 8 \\
 - 3 & - 2 \\
5/e & 2/e
\end{bmatrix}$

[PROBLEM 5.]{.underline} A) 2 inputs, 3 outputs ; B) $\begin{bmatrix}
7 & 2 \\
 - 20 & - 13 \\
 - 12 & - 13
\end{bmatrix}$ ; C) 2^nd^ output

[PROBLEM 6.]{.underline} A) $f \circ g$, $g \circ g$, $h \circ h$
illegal ; B) $\begin{bmatrix}
 - 7 & 0 & - 3 \\
0 & - 2 & - 6 \\
 - 2 & 2 & - 2 \\
2 & 0 & - 11
\end{bmatrix}$ ; C) $\begin{bmatrix}
2 & 2 & 6 \\
1 & - 1 & 4 \\
0 & 3 & 8
\end{bmatrix}$

[PROBLEM 7.]{.underline} A) $h \circ g,\ f \circ h,\ h \circ f$ illegal
; B) $\begin{bmatrix}
 - 1 & 4 & - 6 & - 8 \\
4 & 0 & - 2 & 0 \\
7 & - 3 & - 8 & - 14 \\
2 & 7 & - 3 & 6
\end{bmatrix}$ ; C) 3^rd^ output

[PROBLEM 8.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2u & - 3w & - 3v \\
v^{2} - vw & 2uv - uw & - uv
\end{bmatrix},\ \lbrack Dg\rbrack = \begin{bmatrix}
1 & - 1 \\
2x & - 2y \\
2 & - 3
\end{bmatrix}$; B) $\left( \begin{array}{r}
x \\
y
\end{array} \right) \mapsto \left( \begin{array}{r}
s \\
t
\end{array} \right)$ ; C) $g\left( \begin{array}{r}
2 \\
2
\end{array} \right) = \left( \begin{array}{r}
0 \\
0 \\
 - 2
\end{array} \right)$, so
$\left\lbrack D(f \circ g) \right\rbrack_{2,2} = \begin{bmatrix}
0 & 6 & 0 \\
0 & 0 & 0
\end{bmatrix}\begin{bmatrix}
1 & - 1 \\
4 & - 4 \\
2 & - 3
\end{bmatrix} = \begin{bmatrix}
24 & - 24 \\
0 & 0
\end{bmatrix}$ ; D) maybe, maybe not -- the Inverse Function Theorem is
inconclusive

[PROBLEM 9.]{.underline} A) 2 inputs, 2 outputs ; B) $\begin{bmatrix}
 - 9 & 3 \\
2 & - 6
\end{bmatrix}$ ; C)
$\left\lbrack Dh^{- 1} \right\rbrack_{0} = \lbrack Dh\rbrack_{0}^{- 1} = \frac{1}{6}\begin{bmatrix}
0 & - 3 \\
2 & 1
\end{bmatrix}$ (note that $h$ sends the origin to itself, which is
important)

[PROBLEM 10.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2u & - 3 & 0 \\
v & u & - 2w
\end{bmatrix}$ and $\lbrack Dg\rbrack = \begin{bmatrix}
y^{2} & x \\
2x & - 1 \\
0 & 3
\end{bmatrix}$ ; B) $\lbrack Dg\rbrack_{1,1} = \begin{bmatrix}
1 & 1 \\
2 & - 1 \\
0 & 3
\end{bmatrix}$ and $g\left( \begin{array}{r}
1 \\
1
\end{array} \right) = \left( \begin{array}{r}
1 \\
0 \\
3
\end{array} \right)$, so $\lbrack Df\rbrack_{1,0,3} = \begin{bmatrix}
2 & - 3 & 0 \\
0 & 1 & - 6
\end{bmatrix}$ and, via the Chain Rule,
$\left\lbrack D(f \circ g) \right\rbrack_{1,1} = \begin{bmatrix}
2 & - 3 & 0 \\
0 & 1 & - 6
\end{bmatrix}\begin{bmatrix}
1 & 1 \\
2 & - 1 \\
0 & 3
\end{bmatrix} = \begin{bmatrix}
 - 4 & 5 \\
2 & - 19
\end{bmatrix}$ ; C)
$\left\lbrack D(f \circ g) \right\rbrack_{1,1}\left( \begin{array}{r}
 - 1 \\
 - 1
\end{array} \right) = \begin{bmatrix}
 - 4 & 5 \\
2 & - 19
\end{bmatrix}\left( \begin{array}{r}
 - 1 \\
 - 1
\end{array} \right) = \left( \begin{array}{r}
 - 1 \\
17
\end{array} \right)$

[PROBLEM 11.]{.underline} A) $\lbrack Df\rbrack_{0} = \begin{bmatrix}
0 & 0 & - 2 \\
1 & 0 & 0 \\
1 & - 1 & 1
\end{bmatrix}$ and $\lbrack Dg\rbrack_{0} = \begin{bmatrix}
2 & 0 & 0 \\
0 & 1 & 1 \\
0 & 2 & 3
\end{bmatrix}$ ; B) by the Chain Rule,
$\left\lbrack D(f \circ g) \right\rbrack_{0} = \begin{bmatrix}
0 & 0 & - 2 \\
1 & 0 & 0 \\
1 & - 1 & 1
\end{bmatrix}\begin{bmatrix}
2 & 0 & 0 \\
0 & 1 & 1 \\
0 & 2 & 3
\end{bmatrix} = \begin{bmatrix}
0 & - 4 & - 6 \\
2 & 0 & 0 \\
2 & 1 & 2
\end{bmatrix}$ ; C)
$\left\lbrack Df^{- 1} \right\rbrack_{0} = \lbrack Df\rbrack_{0}^{- 1}$
(note that $f$ sends the origin to itself, which is important)

[PROBLEM 12.]{.underline} $\lbrack DG\rbrack_{0} = \begin{bmatrix}
0 & 0 & 2 & 3 \\
0 & 0 & 0 & 1 \\
 - 1 & 2 & 0 & 0 \\
1 & - 2 & 0 & 0
\end{bmatrix}$ which has determinant $\left| \begin{matrix}
2 & 3 \\
0 & 1
\end{matrix} \right|\left| \begin{matrix}
 - 1 & 2 \\
1 & - 2
\end{matrix} \right| = 0$. Inverse Function Theorem fails and
invertibility is uncertain

[PROBLEM 13.]{.underline} $\lbrack DF\rbrack_{0} = \begin{bmatrix}
0 & 0 & - 1 & 2 \\
0 & 0 & 1 & - 1 \\
3 & - 2 & 0 & 0 \\
2 & 1 & 0 & 0
\end{bmatrix}$ which has determinant $\left| \begin{matrix}
3 & - 2 \\
2 & 1
\end{matrix} \right|\left| \begin{matrix}
 - 1 & 2 \\
1 & - 1
\end{matrix} \right| = - 7 \neq 0$. Inverse Function Theorem implies
local invertibility near the origin

[PROBLEM 14.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
y & x + z & y \\
z & z & x + y
\end{bmatrix},\ \lbrack Dg\rbrack = \begin{bmatrix}
2s\cos t & - s^{2}\sin t \\
2s\sin t & s^{2}\cos t \\
1 & 0
\end{bmatrix}$ ; B) since $g\left( \begin{array}{r}
2 \\
\frac{\pi}{2}
\end{array} \right) = \left( \begin{array}{r}
0 \\
4 \\
2
\end{array} \right)$, evaluate
$\lbrack Df\rbrack_{0,4,2} = \begin{bmatrix}
4 & 2 & 4 \\
2 & 2 & 4
\end{bmatrix}$ and
$\lbrack Dg\rbrack_{2,\frac{\pi}{2}} = \begin{bmatrix}
0 & - 4 \\
4 & 0 \\
1 & 0
\end{bmatrix}$ using the Chain Rule to determine
$\left\lbrack D(f \circ g) \right\rbrack_{2,\frac{\pi}{2}}\  = \begin{bmatrix}
4 & 2 & 4 \\
2 & 2 & 4
\end{bmatrix}\begin{bmatrix}
0 & - 4 \\
4 & 0 \\
1 & 0
\end{bmatrix} = \begin{bmatrix}
12 & - 16 \\
12 & - 8
\end{bmatrix}$, which has determinant non-zero; invertibility follows
from the Inverse Function Theorem

[PROBLEM 15.]{.underline} A) $\lbrack Df\rbrack_{0} = \begin{bmatrix}
1 & 2 & 0 \\
3 & 0 & 0 \\
3 & 0 & - 2
\end{bmatrix}$ ; B) $\det\lbrack Df\rbrack_{0} = 6$, thus, locally
invertible

[PROBLEM 16.]{.underline} The function is $F = \left( \begin{array}{r}
2uv - \sin(x - 2y) \\
4x + y^{2} - 3u - e^{v}
\end{array} \right)$ and its derivative with respect to the $(x,y)$
variables at the origin is $\begin{bmatrix}
 - 1 & 2 \\
4 & 0
\end{bmatrix}$, the determinant of which is nonzero.

## 

## Week 7 : Approximation

### OUTLINE

MATERIALS: Calculus BLUE : Vol 2 : Chapters 9-13

TOPICS:

- Level sets of scalar-valued functions

- Gradients of scalar-valued functions

- Tangent planes to surfaces via the derivative/gradient

- Differentials and approximations

- Relative rates of change and approximations via linearization

- Taylor expansion as polynomial approximation

- Multi-index notation for Taylor expansion

- Mixed higher-order partial derivatives

- The second derivative \[Hessian\] of a scalar-valued function as a
  matrix

LEARNING OBJECTIVES:

- Use the notation for level sets and describe/draw simple level sets

- Compute gradients of scalar-valued functions

- Relate gradients, derivatives, and differentials

- Compute tangent planes to implicit and parametrized surfaces

- Compute $df$ for a scalar-valued $f$ via implicit differentiation

- Compute and interpret relative rates of change via differentials

- Linearly approximate multivariate functions via differentials

- Recognize and use multi-index notation in the context of Taylor series

- Compose single-variable Taylor series to expand multivariate functions

- Organize terms in multivariate Taylor series by degree

- Determine partial derivatives of a function based on its Taylor
  expansion

### PRIMER

This is the week where we return to Geometry as a source of meaning and
intuition for derivatives.

LEVEL SETS. For the next several weeks, we will restrict attention to
functions that have a single output. We will sometimes (though more
frequently in Weeks 12-14) call these *scalar fields*: a scalar is
assigned to every point. Such scalar-valued functions can be more easily
visualized than their multi-valued counterparts by thinking in terms of
*level sets*.

The level sets of a scalar-valued function
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$ are subsets of the domain on
which $f$ is a fixed value. One can think of constant-temperature curves
(*isotherms*) on a weather map or perhaps constant-height (*contour*)
curves on a topographic map. The notation for a level set can seem
unusual, as it resembles an inverse (which of course does not exist for
$n > 1$). One denotes a level set of $f$ by:

$$f^{- 1}(c) = \left\{ \mathbf{x} \in \mathbb{R}^{n}\ :f\left( \mathbf{x} \right) = c \right\}\ .$$

It perhaps helps to read this as "*the set of all inputs on which* $f$
equals $c$." For a planar function
$f:\mathbb{R}^{2}\mathbb{\rightarrow R}$, the level sets partition the
domain into disjoint *curves* (sometimes singular -- see next week for
more on that); for $f:\mathbb{R}^{3}\mathbb{\rightarrow R}$, the level
sets are typically *surfaces*. Thinking in terms of level sets gives a
more visceral approach to rates of change: you can imagine moving
through the domain crossing level sets or remaining tangent to them
depending on the direction of change of inputs.

GRADIENTS. The idea of a derivative as a linear transformation is
fundamental, but in the more restricted setting of a scalar-valued
function, an alternative to the derivative can be interpreted
geometrically as a field of vectors: such *vector fields* will occupy
our attention greatly in Weeks 12-14. Given a scalar field
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$, there is an alternate notation
and terminology for the collection of partial derivatives called the
*gradient* of $f$:

$$\nabla f = \begin{pmatrix}
\partial f/\partial x_{1} \\
 \vdots \\
\partial f/\partial x_{n}
\end{pmatrix}\ .$$

This is interpreted as a vector at every point, in contrast to the
derivative $\lbrack Df\rbrack$ which is a linear transformation (or, in
the case of a scalar-valued $f$, a row vector) at every point. The
relationship between the two is that of transpose:
$\nabla f = \lbrack Df\rbrack^{T}$. This, then, gives a close connection
with the geometry of vectors: the rate of change of the output of $f$
when the inputs are changing at rates $\mathbf{h}$, is given by:

$$\lbrack Df\rbrack\mathbf{h} = \nabla f \cdot \mathbf{h\ .}$$

This is key to the interpretation and application of the gradient. Among
all unit vectors $\mathbf{h}$, the dot product
$\nabla f \cdot \mathbf{h}$ is maximized when $\mathbf{h}$ is aligned
with $\nabla f$. In addition, if $\mathbf{h}$ is tangent to a level set
of $f$, then the rate of change of $f$ is zero (since $f$ is unchanging
along a level set). Thus, the dot product satisfies
$\nabla f \cdot \mathbf{h} = 0$. From these observations, we note the
following important interpretations:

- *The gradient is orthogonal to the level sets of* $f$*.*

- *The gradient always points in the direction of maximal increase of*
  $f$*.*

This interpretation allows one to easily visualize the gradient as a
field of vectors that change as one moves from point-to-point.

LINEAR APPROXIMATION. The gradient is immediately useful in determining
tangent planes to implicit surfaces in 3-D. Consider a level set of
$f:\mathbb{R}^{3}\mathbb{\rightarrow R}$ and choose a point
$\mathbf{x}_{0}$ on the surface. Since the gradient of $f$ at this point
is orthogonal to the level set, we can use the formula for a plane in
3-D from Weeks 1-2 to give an equation for the tangent plane to this
surface:
$\nabla f\left. \  \right|_{\mathbf{x}_{0}} \cdot \left( \mathbf{x} - \mathbf{x}_{0} \right) = 0.$
This is in contrast to the case for a parametrized surface
$S:\mathbb{R}^{2} \rightarrow \mathbb{R}^{3}$, in which one uses the
columns of the derivative $\lbrack DS\rbrack$ as tangent vectors which
span a tangent plane. Both these approaches to finding planes tangent to
surfaces are the beginnings of using the derivative to perform *linear
approximation*.

One additional notational approach to derivatives is at this point
relevant. All of calculus operates with differential notation -- the
differential $dx$ is used both in differentiation and integration. At
first, one thinks of differential elements as "*infinitesimal changes*"
or perhaps "*small linear changes*" in a quantity. This is not wrong,
but there is more to the story. For a multivariate function $f$, one can
use implicit differentiation to compute the differential $df$. This is
not a derivative with respect to any particular variable: it is simply
the *differential*. The formula for this looks more complicated than it
really is:

$$df = \ \sum_{i = 1}^{n}\frac{\partial f}{\partial x_{i}}dx_{i}\ .$$

When integrating in single variable calculus with $u = u(x)$, one has
$du = u'(x)dx$ hardwired. Likewise, in multivariable calculus, computing
$df$ gives a combination of the differentials of the input variables.
This is another approach to linear approximation: when all the partial
derivatives are evaluated at a point $\mathbf{a}$, replacing the
differentials $dx_{i}$ with small changes in the $x_{i}$ term combine to
give the resulting approximate change in $f$. This is computationally no
different than the matrix-vector multiplication
$\lbrack Df\rbrack_{\mathbf{a}}\mathbf{h}$, where $\mathbf{h}$ is a
vector of rates of change of inputs at $\mathbf{a}$, but the
differential notation has some independent utility to be seen in later
weeks. As a sample of what differential notation is good for, consider
the problem of estimating percent changes in an output $f$ based on
percentage changes in the inputs at a point. Differentials suggest
working with the following *relative rates of change* given by

$$\frac{du}{u} = d\left( \ln u \right)\ .$$

The use of the logarithm for approximating percentage changes is crucial
in many corners of Statistics and Data Science. Differentials make clear
the relationships between linearized percentage changes and powers,
thanks to logarithm rules.

TAYLOR EXPANSION. Linear approximation is but the beginning of
higher-order polynomial approximation via Taylor expansion. This is the
same story as in single-variable calculus, but with more notation. For a
function $f:\mathbb{R}^{n}\mathbb{\rightarrow R}$, the Taylor expansion
of $f$ about an input $\mathbf{a}$ can be written as:

$$f\left( \mathbf{x} \right) = \sum_{I}^{\ }{D^{I}f\left. \  \right|_{\mathbf{a}}\frac{\left( \mathbf{x} - \mathbf{a} \right)^{I}}{I!}}\ \ \ \ \ \ \ \ :\ \ \ \ \ \ f\left( \mathbf{a} + \mathbf{h} \right) = \sum_{I}^{\ }{D^{I}f\left. \  \right|_{\mathbf{a}}\frac{\mathbf{h}^{I}}{I!}}\ .$$

This requires some unpacking. The *multi-index*
$I = (i_{1},\ldots,i_{n})$ is used to locate a particular monomial term
in a polynomial series. For inputs $\mathbf{x} = (x_{1},\ldots,x_{n})$,
the monomial power $\mathbf{x}^{I}$ is the product
$\mathbf{x}^{I} = x_{1}^{i_{1}}x_{2}^{i_{2}}\cdots x_{n}^{i_{n}}$. The
multi-index factorial is given by $I! = i_{1}!i_{2}!\cdots i_{n}!$ (with
the usual convention that $0! = 1$). The difficult part is the $I$^th^
derivative of $f$, $D^{I}f$, which is defined as follows:

$$D^{I}f = \frac{\partial^{i_{1}}}{\partial x_{1}^{i_{1}}}\frac{\partial^{i_{2}}}{\partial x_{2}^{i_{2}}}\cdots\frac{\partial^{i_{n}}}{\partial x_{n}^{i_{n}}}f\ .$$

This means that for each $k = 1\ldots n$, you take the partial of $f$
with respect to $x_{k}$ and do this $I_{k}$ times (where zero
derivatives means you do nothing). What saves us from an endless worry
of disorder is the fact that partial differentiation operators *commute*
-- the order in which you take derivatives does not matter:

$$\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}} = \frac{\partial^{2}f}{\partial x_{j}\partial x_{i}}\ .$$

Using the full formula for direct computation of Taylor series is as
unpleasant as it is rare. In practice, one can chain together
single-variable Taylor expansions fed with multivariate inputs -- it is
the Chain Rule that ensures this approach works (so long as one is
careful with evaluation points).

There are a few special cases where the notation is not as imposing.
When dealing with a function of two inputs, we can write out the terms
in long-form:

$$f\left( x + h_{x},y + h_{y} \right)\  = f(x,y) + \frac{\partial f}{\partial x}h_{x} + \frac{\partial f}{\partial y}h_{y} + \frac{1}{2}\frac{\partial^{2}f}{\partial x^{2}}h_{x}^{2} + \frac{\partial^{2}f}{\partial x\ \partial y}h_{x}h_{y} + \frac{1}{2}\frac{\partial^{2}f}{\partial y^{2}}h_{y}^{2} + \cdots$$

In general, the derivatives are too many to shepherd. For low-enough
orders (quadratic approximation), matrix notation once again returns.
Consider the following alternate form, which has the benefit of looking
more like the usual Taylor formula:

$$f\left( \mathbf{a} + \mathbf{h} \right) = \ f\left( \mathbf{a} \right) + \lbrack Df\rbrack_{\mathbf{a}}\mathbf{h} + \frac{1}{2}\mathbf{h}^{T}\left\lbrack D^{2}f \right\rbrack_{\mathbf{a}}\mathbf{h} + O(\left| \mathbf{h} \right|^{3})$$

The 1^st^-order term uses the derivative, as one expects. For the
2^nd^-order term, we can build a square matrix out of all 2^nd^
partials, sometimes called the Hessian (but for us, simply the 2^nd^
derivative):

$$\left\lbrack D^{2}f \right\rbrack_{ij} = \frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\ .$$

This 2^nd^ derivative is used to define the quadratic form
$Q\left( \mathbf{h} \right) = \mathbf{h}^{T}\left\lbrack D^{2}f \right\rbrack\mathbf{h}$.
This will give us the ability to build a 2^nd^-derivative test for
optimization problems next week.

### DISCUSSION

[QUESTION 1.]{.underline} What are some examples of scalar fields in
this room? What might their level sets look like? What about the
gradient vector field?

Typical responses will be temperature. If there is a point-source light
in the room (like a red dot on a smoke detector or a wireless access
point), then intensity of that light (which falls off as a function of
distance) gives spherical level sets and gradient fields orthogonal to
those. More light-hearted answers such as "awesomeness" or "stress" can
lead to interesting results.

[QUESTION 2.]{.underline} Compute the gradient of the planar scalar
fields $f = ax^{2} + by^{2}$ for various values on constants $a,\ b$.
What are the level sets of these functions?

It's a good idea to start with both constants equal to one; then both
constants positive. What happens if both constants are negative? This is
a good place to remind students that for many values of $c$, the level
set $f^{- 1}(c)$ may be empty, and that is ok. A mixture of negative and
positive constants leads to the most interesting case of hyperbolae.

[QUESTION 3.]{.underline} At what points in the plane are the level sets
of $g = x^{2} + y^{2} - 2xy$ and $f = 2y - 3x$ orthogonal?

These functions are simple enough that one can draw the level sets --
straight lines of slope $3/2$ in the case of $f$. What about the level
sets of $g$? This at first appears to be ellipses, but
$g = (x - y)^{2}$, so that the level sets are lines of slope $1$. Oops!
They are never orthogonal.

Changing to $g = x^{2} + y^{2} - xy$ gives an entirely different problem
that requires a different solution. Use the gradients and their dot
product. This is a good time to discuss the logical progression of "if
the gradients are orthogonal to each other, and the level sets are
orthogonal to the gradients then..."

[QUESTION 4.]{.underline} Can you explain *why* the gradient is
orthogonal to the level sets? Why does it point in the direction of
maximal *increase*? How would you figure out the direction of maximal
*decrease*?

All these questions are covered in the videotext, but it usually takes a
second pass through for typical students to internalize what is going on
(instead of memorizing the outcome). It is very worthwhile to write
explicitly the relation:
$\lbrack Df\rbrack\mathbf{h} = \nabla f \cdot \mathbf{h}$, to tie rates
of change to geometry.

[QUESTION 5.]{.underline} Compute tangent spaces to the following:

- An implicit tangent plane to $xyz - 2xy^{3} + 3z^{2} = 0$ at the point
  $(3,1,1)$.

- A parametrized tangent line to $\gamma(t) = \left( \begin{array}{r}
  t^{2} \\
   - 3t \\
  t^{3}
  \end{array} \right)$ at $t = 2$.

- A parametrized tangent plane to $S\left( \begin{array}{r}
  t_{1} \\
  t_{2}
  \end{array} \right) = \left( \begin{array}{r}
  \begin{array}{r}
  t_{1} + 3t_{2} \\
  {t_{1}t}_{2}
  \end{array} \\
  2t_{1}^{2} - t_{2}^{3}
  \end{array} \right)$ at $t_{1} = 2,\ t_{2} = - 1$.

If students are struggling with parametrized tangent planes, try doing
the two tangent lines in the previous example corresponding to the
$t_{1}$ and $t_{2}$ axes.

[QUESTION 6.]{.underline} Find the equation of a tangent hyperplane to a
unit sphere in $\mathbb{R}^{n}$ at a given point $\mathbf{x}$ on that
sphere.

The notation on this is cumbersome. Be sure to use the fact that
$\left| \mathbf{x} \right| = 1$. For students who get confused, drop
back to 2-D or 3-D using standard coordinate names ($x,\ y,\ z$) and see
how it works.

[QUESTION 7.]{.underline} For the function $f = 4x^{2}y^{- 1/2}z^{- 2}$,
compute the differential $df$ and use this to linearly approximate the
value of $f(3.1,\ 8.7,\ 2.2)$.

Make sure to consider the problem of the appropriate "base point" at
which to evaluate the function and its partial derivatives. If students
guess at the wrong point, roll with it and compare with the better
choice ex post facto.

[QUESTION 8.]{.underline} In the previous problem, if each input can
vary by as much as $1\%$ of its value, by what percentage is the output
estimated to vary, using differentials and relative rates?

Having already computed $df$, this can be quickly done the long way. As
a follow-up (or perhaps after doing the next problem) one can go back
and use logarithms to redo it quickly.

[QUESTION 9.]{.underline} Why is it that in financial analysis and stock
market data tracking, the first thing one usually does with time-series
data is take its logarithm before doing any other statistics? Ask an MBA
candidate in Finance and see whether you can find someone who says
anything more than "*Yes, that's what you always do.*" Can you answer
*why*?

This question tends to get students' attention. If they look over their
notes and make a connection with $d(\ln u)$ being the relative rate of
change, probe for why that matters in stock market data... What happens
if you have a portfolio with multiple stocks? Are you more interested in
absolute changes or percentage changes?

[QUESTION 10.]{.underline} Given the multi-index $I = (1,3,0,2)$, what
is its degree, $|I|$? What is the factorial $\ I!\ $? If
$\mathbf{x} = (x_{1}\ ,\ x_{2}\ ,\ x_{3}\ ,\ x_{4})$, what is the
monomial term $\mathbf{x}^{I}$?

Multi-index notation is a necessary evil, but it is algorithmic and
straightforward, apart from the usual question of why $0! = 1$. If this
is an issue for students, do not be afraid to ask why we should be so
cavalier in defining $0!$ in this way. Among the many justifications one
can pose, the best in the context of this week's material is to argue
from a Taylor polynomial of a polynomial being itself.

[QUESTION 11.]{.underline} Taylor expand the function
$f(x,y) = 3 - x + 2y + 5xy - y^{2}$ about the point $(1,2)$ by computing
partials the "long" way.

I tend to skip this: Taylor expansion the long way is very tedious. The
virtue in this problem is reminding students about getting a polynomial
in $(x - 1)$ and $(y - 2)$.

[QUESTION 12.]{.underline} Taylor expand the following about the origin
using composition:

- $\sinh\left( xy\cosh\left( z^{2} - xy \right) \right)$, including
  terms of degree $\leq 6$

- $\ln\left( 1 + \cos\left( y - xe^{xy} \right) \right)$, including
  terms of degree $\leq 3$

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the function
$g:\mathbb{R}^{2}\mathbb{\rightarrow R}$ given by

$$g(x,y) = \frac{x^{2}}{4} + (y - 2)^{2}\ $$

A\) Draw a picture of the level set $g^{- 1}(4)$.

B\) Compute the gradient $\nabla g$ and draw, on your figure from (A),
the gradient vectors evaluated at several points along the level set,
explaining either visually or in words the relationship between the
gradients and the level set.

[PROBLEM 2.]{.underline} Consider the function
$f:\mathbb{R}^{4}\mathbb{\rightarrow R}$ given by

$$f(x,y,z,t) = xy - xz^{2} + \frac{y}{t} - tz\ $$

A\) Compute the gradient $\nabla f$ and the derivative
$\lbrack Df\rbrack$.

B\) Explain in words what the difference is between $\nabla f$ and
\[$Df\rbrack$.

C\) The level set $f^{- 1}(0)$ passes through the point $a$ where all
inputs of $f$ equal one. For what value(s) of $C$ is
$\mathbf{v} = \begin{pmatrix}
1 \\
2 \\
3 \\
C
\end{pmatrix}$ tangent to this level set at this point $a$?

[PROBLEM 3.]{.underline} Consider the level set $f^{- 1}( - 4)$ in
$\mathbb{R}^{3}$ given by

$$f(x,y,z) = 3yz + xz^{2} - x^{3}z = - 4$$

A\) Find the $z$-coordinate(s) of the point(s) on this level set where
$x = 1$, $y = 2$*.*

B\) Give a vector that is orthogonal to this level set at a point you
found in (A).

C\) Using your results from (A) and (B), write down and fully simplify
an equation of a plane tangent to the level set at the point.

[PROBLEM 4.]{.underline} Let's say you know that the derivatives of
functions $f$ and $g$ are

$$\lbrack Df\rbrack = \begin{bmatrix}
2x & - 2y & 3
\end{bmatrix}\ \ \ \ :\ \ \ \lbrack Dg\rbrack = \begin{bmatrix}
y & x & - 2z
\end{bmatrix}$$

A\) Compute the gradients $\nabla f$ and $\nabla g$.

B\) Compute the gradient $\nabla h$ of the function $h = 2f - g$.

C\) Give the equation of the tangent plane to the level set $f^{- 1}(c)$
passing through the point $(1,2, - 3)$.

[PROBLEM 5.]{.underline} Consider the surface in $\mathbb{R}^{3}$
parametrized by

$$f\begin{pmatrix}
t_{1} \\
t_{2}
\end{pmatrix} = \begin{pmatrix}
t_{1}^{2}\ t_{2} \\
t_{1}\ t_{2} \\
t_{1}\ t_{2}^{2}
\end{pmatrix}$$

A\) Compute the derivative $\lbrack Df\rbrack$.

B\) Give a parametrization of the tangent plane to this surface at
$\begin{pmatrix}
12 \\
6 \\
18
\end{pmatrix} = f\begin{pmatrix}
2 \\
3
\end{pmatrix}$, using variables $s_{1}$ and $s_{2}$ as parameters for
the tangent plane.

[PROBLEM 6.]{.underline} Consider the function
$f(x,y,z) = (3x - yz)(xy - z)$.

A\) Compute the gradient $\nabla f$.

B\) Write down the equation of a tangent plane to the level set
$f^{- 1}(15)$ at the point where $x = 1,\ y = 2,\ z = - 1$.

C\) Fill in the blank: *The gradient* $\nabla f$ *points in the
direction of \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_*

[PROBLEM 7.]{.underline} Consider the function given by

$$f(x,y,z) = {\frac{1}{4}x}^{2} + y^{2} + \frac{1}{9}z^{2}$$

A\) Show that the level set $f^{- 1}(1)$ (that is, the set of inputs
where $f = 1$) contains the point
$\left( \sqrt{2},\ 0,\frac{3}{2}\sqrt{2} \right)$.

B\) Find a vector that is orthogonal (or normal) to the level set from
(A) at the point $\left( \sqrt{2},\ 0,\frac{3}{2}\sqrt{2} \right)$.

C\) Write down and simplify an equation of the tangent plane to the
level set from (A) at the point
$\left( \sqrt{2},\ 0,\frac{3}{2}\sqrt{2} \right)$.

[PROBLEM 8.]{.underline} Consider the function
$f:\mathbb{R}^{4}\mathbb{\rightarrow R}$ given by

$$f\left( x_{1},x_{2},\ x_{3},x_{4} \right) = x_{3}^{2}x_{4} - x_{2}^{3} + x_{1}^{4}\ $$

A\) Compute the derivative $\lbrack Df\rbrack$.

B\) Describe briefly in words what is meant by the level set
$f^{- 1}(11)$.

C\) Find a vector that is orthogonal to the level set of $f$ at the
point $(0,1,2,3)$.

D\) Find a vector that is tangent to the level set of $f$ at the point
$(0,1,2,3)$.

[PROBLEM 9.]{.underline} Consider the following two scalar-valued
functions on the plane:

$$h(x,y) = x^{2} - 3x + 2y - y^{2} + 2\ \ \ \ :\ \ \ g(x,y) = (x + 2)(y - 1)$$

A\) Compute the gradients of $h$ and of $g$.

B\) Complete the sentence: *the gradient of a function* $f$ *points in
the direction...*

C\) At the origin, give a vector $\mathbf{v}$ such that changing the
inputs of both $g$ and $h$ (as above) in the direction of $\mathbf{v}$
increases the outputs of both $g$ and $h$.

D\) Locate where in the plane the level sets of $g$ and $h$ are
orthogonal to each other.

[PROBLEM 10.]{.underline} Consider the scalar-valued function

$$G(u,v,w) = \frac{3uw^{2}}{v}$$

A\) Compute the differential $dG$, using differential notation.

B\) Compute and simplify as much as possible the linear approximation to
the percent change in $G$ given by $dG/G$.

[PROBLEM 11.]{.underline} Consider the scalar-valued function

$$f(x,y,z) = x^{2}\left( \sqrt[3]{yz^{2}} \right) = x^{2}y^{1/3}z^{2/3}$$

A\) Compute $df$ using differentials.

B\) Use the result of (A) to approximate to 1^st^ order
$(3.02)^{2}\left( \sqrt[3]{(1.04){(8.05)}^{2}} \right)$

[PROBLEM 12.]{.underline} Consider the scalar-valued function

$$f\left( x_{1},x_{2},x_{3},x_{4} \right) = \frac{x_{1}^{2}\sqrt{x_{2}x_{3}^{3}}}{x_{4}^{2}}$$

A\) Compute $df$ using differentials.

B\) Use the result of (A) to approximate to 1^st^ order
$\left( \frac{2.01}{1.99} \right)^{2}\sqrt{(4.02)(3.99)^{3}}$.

[PROBLEM 13.]{.underline} Consider the function

$$f\begin{pmatrix}
x \\
y \\
s \\
t
\end{pmatrix} = \frac{3\sqrt[3]{s}t^{2}}{\sqrt{x^{3}y}}$$

If each input can vary by $1\%$, then by what percentage can the output
of $f$ vary? Use differentials to linearly approximate.

[PROBLEM 14.]{.underline} Consider the scalar-valued function

$$F(x,y,z) = \frac{2x^{3}z^{2}}{\sqrt{y}}$$

A\) Compute the differential $dF$, using differential notation

B\) Compute and simplify as much as possible the linear approximation to
the percent change in $F$ given by $dF/F$.

C\) If each input to $F$ can vary by as much as 0.5%, by what percentage
can the output of $F$ vary, according to your linear approximation from
part (B)?

[PROBLEM 15.]{.underline} Compute the Taylor series about the origin of

$$f(x,y,z) = \frac{z\sin(xy)}{1 - x^{2} - y^{3}} = z\left( \sin(xy) \right)\frac{1}{1 - x^{2} - y^{3}}$$

up to and including terms of degree seven.

[PROBLEM 16.]{.underline} Compute the Taylor series about the origin of

$$f(x,y) = e^{y^{2} - x^{2}}\ln(1 - xy)\ $$

up to and including terms of degree seven.

[PROBLEM 17.]{.underline} Compute the Taylor series about the origin of

$$f(u,v,w) = 2u\ln{(1 + w^{3} - v^{2})} + v\cos{(2\sqrt{uw})}$$

up to and including terms of degree five.

[PROBLEM 18.]{.underline} Compute the Taylor series about the origin of

$$f(x,y) = \sin{(x\ln(1 - xy) - 3xy)}$$

up to and including terms of degree five.

[PROBLEM 19.]{.underline} Consider the function $f$ whose Taylor series
at the origin is:

$$f(x,y,z) = 5 + x - 2y + 3x^{2} + xy - \frac{1}{3}yz - z^{2} + 2xy^{2} - \frac{5}{7}yz^{2} + x^{3}z + \frac{6}{7}xyz^{2} + \cdots$$

A\) Which terms of this series have degree three?

B\) What is the gradient of $f$ evaluated at the origin?

C\) Which term in the series above corresponds to the multi-index
$I = (0,1,2)$?

[PROBLEM 20.]{.underline} A function $f(x,y)$ has this Taylor expansion
at the origin:

$$3 + 2x - \frac{5}{2}x^{2} + xy - \frac{3}{2}y^{2} + \frac{1}{3}x^{3} - \frac{1}{2}x^{2}y + xy^{2} - \frac{2}{5}y^{3} + O\left( \left| (x,y) \right|^{4} \right)$$

(that last part simply means "higher order terms" -- don't worry about
it)

A\) Find the $I$^th^ derivative of $f$ at the origin, where $I = (2,1)$
is a multi-index.

B\) What is the Taylor expansion of $\partial f/\partial x$ at the
origin? Of how many terms can you be confident?

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) an ellipse at $(0,2)$ with $x$-radius $1/4$
and $y$-radius $1/2$ ; B) $\nabla g$ points out away from the center of
the ellipse, orthogonal to it

[PROBLEM 2.]{.underline} A) $\nabla f = \left( \begin{array}{r}
y - z^{2} \\
x + t^{- 1} \\
 - 2xz - t \\
 - yt^{- 2} - z
\end{array} \right)$, $\lbrack Df\rbrack = \begin{bmatrix}
y - z^{2} & x + t^{- 1} & - 2xz - t & - yt^{- 2} - z
\end{bmatrix}$; B) in general $\nabla f = \lbrack Df\rbrack^{T}$ ; C)
where $\nabla f_{a} \cdot \mathbf{v} = 0$, namely $C = - 5/2$

[PROBLEM 3.]{.underline} A) $z = - 1$ or $z = - 4$ ; B)
$\left( \begin{array}{r}
4 \\
 - 3 \\
3
\end{array} \right)$ or $\left( \begin{array}{r}
28 \\
 - 12 \\
 - 3
\end{array} \right)$ (using the gradient at the point); C)
$4x - 3y + 3z = - 5$ or $28x - 12y - 3z = 16$ respectively

[PROBLEM 4.]{.underline} A/B)
$\nabla h = 2\nabla f - \nabla g = \left( \begin{array}{r}
4x - y \\
 - 4y - x \\
6 + 2z
\end{array} \right)$ ; C) $2x - 4y + 3z = - 15$

[PROBLEM 5.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2t_{1}t_{2} & t_{1}^{2} \\
t_{2} & t_{1} \\
t_{2}^{2} & 2t_{1}t_{2}
\end{bmatrix}$ ; B) $G\left( \begin{array}{r}
s_{1} \\
s_{2}
\end{array} \right) = \left( \begin{array}{r}
12 \\
6 \\
18
\end{array} \right) + s_{1}\left( \begin{array}{r}
12 \\
3 \\
9
\end{array} \right) + s_{2}\left( \begin{array}{r}
4 \\
2 \\
12
\end{array} \right)$

[PROBLEM 6.]{.underline} A) $\nabla f = \left( \begin{array}{r}
3x^{2y} + 4xy^{2} + \ y^{3} - 2xz - 2yz \\
x^{3} + 4x^{2y} + 3xy^{2} - 2xz - 2yz \\
 - x^{2} - 2xy - y^{2}
\end{array} \right)$; B) $\nabla f_{1.2, - 1} = \left( \begin{array}{r}
24 \\
15 \\
 - 9
\end{array} \right)$, so tangent plane is
$24(x - 1) + 15(y - 2) - 9(z - 1) = 0$ or $24x + 15y - 9z = 45$

[PROBLEM 7.]{.underline} B) $\nabla f = \left( \begin{array}{r}
\sqrt{2}/2 \\
0 \\
\sqrt{2}/3
\end{array} \right)$ ; C)
$\frac{\sqrt{2}}{2}\left( x - \sqrt{2} \right) + \frac{\sqrt{2}}{3}\left( z - \frac{3\sqrt{2}}{2} \right) = 0$

[PROBLEM 8.]{.underline} A) $\lbrack\begin{matrix}
4x_{1}^{3} & - 3x_{2}^{2} & 2x_{3}x_{4} & x_{3}^{2}
\end{matrix}\rbrack$ ; C) evaluate $\nabla f$ ; D) choose $\mathbf{v}$
with $\mathbf{v} \cdot \nabla f = 0$

[PROBLEM 9.]{.underline} A) $\nabla h = \left( \begin{array}{r}
2x - 3 \\
2 - 2y
\end{array} \right)\ ,\ \nabla g = \left( \begin{array}{r}
y - 1 \\
x + 2
\end{array} \right)$ ; C) choose $\mathbf{v}$ such that
$\left( \begin{array}{r}
 - 3 \\
2
\end{array} \right) \cdot \mathbf{v} > 0$ and $\left( \begin{array}{r}
 - 1 \\
2
\end{array} \right) \cdot \mathbf{v} > 0$ ; D) along the line $y = 1$

[PROBLEM 10.]{.underline} A)
$dG = 3w^{2}v^{- 1}du - 3uv^{- 2}w^{2}dv + 6uv^{- 1}w\ dw$ ; B)

$$\frac{dG}{G} = \frac{du}{u} - \frac{dv}{v} + \frac{2dw}{w}$$

[PROBLEM 11.]{.underline} A)
$df = 2xy^{1/3}z^{2/3}dx + \frac{1}{3}x^{2}y^{- 2/3}z^{2/3}dy + \frac{2}{3}x^{2}y^{1/3}z^{- 1/3}$
; B) use point $x = 3,\ y = 1,\ z = 8$ and
$dx = .02,\ dy = .04,\ dz = .05$ to get approximation
$36 + 0.48 + 0.48 + 0.15 = 37.11$

[PROBLEM 12.]{.underline} B)
$16.3 = 16 + 16(.01) + 2(.02) - 6(.01) + 16(.01)$ using the expansion
$x_{1} = 2,\ x_{2} = 4,\ x_{3} = 4,\ x_{4} = 2$ and
$dx_{1} = .01,\ dx_{2} = .02,\ dx_{3} = - 0.01,\ dx_{4} = - .01\ $via ;
A)

$$df = 2x_{1}x_{2}^{\frac{1}{2}}x_{3}^{\frac{3}{2}}x_{4}^{- 2}dx_{1} + \frac{1}{2}x_{1}^{2}x_{2}^{- \frac{1}{2}}x_{3}^{\frac{3}{2}}x_{4}^{- 2}dx_{2} + \frac{3}{2}x_{1}^{2}x_{2}^{\frac{1}{2}}x_{3}^{\frac{1}{2}}x_{4}^{- 2}dx_{3} - 2x_{1}^{2}x_{2}^{\frac{1}{2}}x_{3}^{\frac{3}{2}}x_{4}^{- 3}dx_{4}$$

[PROBLEM 13.]{.underline} $13/3$ % via

$$\frac{df}{f} = \  - \frac{3}{2}\frac{dx}{x} - \frac{1}{2}\frac{dy}{y} + \frac{1}{3}\frac{ds}{s} + 2\frac{dt}{t}\ $$

[PROBLEM 14.]{.underline} A)
$dF = 6x^{2}y^{- \frac{1}{2}}z^{2}\ dx - x^{3}y^{- \frac{3}{2}}z^{2}\ dy + 4x^{3}y^{- \frac{1}{2}}z\ dz$
; C) 2.75% via ; B)

$$\frac{dF}{F} = 3\frac{dx}{x} - \frac{1}{2}\frac{dy}{y} + 2\frac{dz}{z}$$

[PROBLEM 15.]{.underline} using the series for $\sin Z$ and
$(1 + Z)^{- 1}$,

$${f(x,y,z) = z\left( xy - \frac{x^{3}y^{3}}{6} + \cdots \right)\left( 1 + (x^{2} - y^{3} \right) + \left( x^{2} - y^{3} \right)^{2} + \cdots)
}{= xyz + x^{3}yz + xy^{4}z + x^{5}yz - \frac{1}{6}x^{3}y^{3}z}$$

[PROBLEM 16.]{.underline} using the series for $e^{Z}$ and $ln(1 + Z)$,

$${f(x,y) = \left( 1 + y^{2} - x^{2} + \frac{\left( y^{2} - x^{2} \right)^{2}}{2} + \frac{\left( y^{2} - x^{2} \right)^{3}}{6} + \cdots \right)\left( - xy + \frac{x^{2}y^{2}}{2} - \frac{x^{3}y^{3}}{3} + \cdots \right)
}{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \  = - xy + x^{3}y - xy^{3} - \frac{x^{2}y^{2}}{2} - \frac{x^{2}y^{4}}{2} - \frac{x^{4}y^{2}}{2} - \frac{xy^{5}}{2} - \frac{x^{5}y}{2} + \frac{5x^{3}y^{3}}{6} + \cdots}$$

[PROBLEM 17.]{.underline} using the series for $\cos Z$ and $ln(1 + Z)$,

$${f(u,v,w) = 2u\left( w^{3} - v^{2} - \frac{1}{2}\left( v^{4} + \cdots \right) \right) + v\left( 1 - uw + \frac{u^{2}w^{2}}{12} + \cdots \right)
}{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  = v - 2uvw - 2uv^{2} + 2uw^{3} - 2uv^{4} + \frac{2}{3}u^{2}vw^{2} + \cdots}$$

[PROBLEM 18.]{.underline} using the series for $\sin Z$ and $ln(1 + Z)$,

$$\sin{\left( x\left( - 2xy + \frac{x^{2}y^{2}}{2} + \ldots \right) - 3xy \right) = - 3xy - 2x^{2}y} + \frac{1}{2}x^{3}y^{2} + \ldots$$

[PROBLEM 19.]{.underline} A) $2xy^{2}, - \frac{5}{7}yz^{2}$ ; B)
$\nabla f_{0} = \left( \begin{array}{r}
1 \\
 - 2 \\
0
\end{array} \right)$ ; C) $- \frac{5}{7}yz^{2}$

[PROBLEM 20.]{.underline} A) $D^{I}f = - \frac{1}{2}I! = - 1$; B)
differentiate the series directly to obtain

$$\frac{\partial f}{\partial x} = 2 - 5x + y + x^{2} = xy + y^{2} + O\left( \left| (x,y) \right|^{3} \right)\ $$

## 

## Week 8 : Optimization

### OUTLINE

MATERIALS: Calculus BLUE : Vol 2 : Chapters 14-18

TOPICS:

- Critical points and extremization of scalar-valued functions

- Classification of critical points via 2^nd^ derivatives for planar
  functions

- Boundary conditions of scalar optimization problems

- Constrained optimization via substitution

- Constrained optimization via the Lagrange multiplier

- BONUS : linear regression formulae via optimization

- BONUS : Nash equilibria for symmetric 2-player games

LEARNING OBJECTIVES:

- Find critical points of scalar-valued functions

- Classify critical points in 2-D via the 2^nd^ derivative

- Recognize saddle points in extremization problems

- Reason about boundary conditions for optimization problems

- Distinguish between local and global extrema

- Identify cost and constraint functions in constrained optimization

- Convert constrained to unconstrained optimization via parametrization

- Setup and solve the Lagrange equations for a single constraint
  function

### PRIMER

Our work on approximating functions via Taylor expansion is about to pay
off, as we generalize the max-min problems of single-variable calculus
to the multivariate setting.

CRITICAL POINTS. Our first approach to optimization follows exactly the
script from single-variable calculus. Consider a scalar-valued function
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$. A *critical point* of $f$ is
any input point $\mathbf{a}$ at which the derivative vanishes (or is
undefined). Note that a vanishing derivative means that *all* partial
derivatives evaluate to zero -- not just one. In addition, if the
function domain is not all of $\mathbb{R}^{n}$ but rather some domain
$D \subset \mathbb{R}^{n}$, then any point on the boundary of $D$ is
critical, as the derivative is technically non-existent there. With this
in place, the familiar lemma holds: all extreme values of $f$ on $D$
must reside at critical points.

The new twist in multivariate optimization comes in the form of
classifying critical points. Local minima and local maxima are as in the
single-variable case; *saddle points* are critical points which are
*minima* with respect to changes along some direction and *maxima* along
some different direction.

A general classification scheme is not here given: such requires
*eigenvalues* and other ideas from more advanced linear algebra. For
this course, it suffices to remain in the 2-D case, in which everything
is determined by the 2^nd^ derivative $\lbrack D^{2}f\rbrack$ evaluated
at the critical point. In the case of a negative determinant, the
critical point is a saddle; in the case of a positive determinant, the
critical point depends on the *trace* of $\lbrack D^{2}f\rbrack$ (the
sum of the diagonal elements): if the trace is negative, the critical
point is a local max; if positive, it is a local min, much like the 1-D
version of the 2^nd^ derivative test.

CONSTRAINED OPTIMIZATION. It is often the case that a quantity must be
extremized subject to a constraint -- a fixed amount of money, time, or
space is common. Constraints also arise as a boundary condition on an
unconstrained optimization problem on a domain. Whatever the cause,
constrained optimization is an important subclass of problems.

Two approaches are here covered. The first involves parametrization of
the constraint set and substitution into the cost function $F$ --
converting the constrained problem into an unconstrained problem on the
parameters. The second approach is more novel and uses an implicit
representation of the constraint set, thinking of it as the level set of
some function $G$. The resulting *Lagrange equations* specify that the
constrained extrema lie where the level sets of $F$ and $G$ are tangent;
thus, where $\nabla F$ and $\nabla G$ are parallel. Since two vectors
are parallel if they are the same up to a constant of proportionality,
we have an equation -- the eponymous equation of Lagrange:

$$\nabla F = \lambda\nabla G\ \ \ \ \ \ or\ \ equivalently\ \ \ \ \ \ \lbrack DF\rbrack = \lambda\lbrack DG\rbrack\ .$$

This constant $\lambda$ is called the *Lagrange multiplier*. A quick
lemma involving the Implicit Function Theorem reveals that $\lambda$
measures the rate of change of the extremal value of $F$ with respect to
the constraint value of $G$. (Such interpretations arise in Economics as
*shadow prices*, for example.) Unfortunately, this class does not have
enough bandwidth to cover how to classify the extrema found: this will
have to wait until eigenvalues are taught in linear algebra.

Another topic that lies just past the bounds of this course is that of
multiple constraints and multiple Lagrange multipliers. If instead of a
single constraint equation, one encodes several constraints as a
multi-output equation $G$ and then constrains the optimization to the
level sets $G = \mathbf{c}$, then the generalized Lagrange equations use
a vector $\mathbf{\lambda}$ of Lagrange multipliers (one for each
constraint):

$$\lbrack DF\rbrack = \mathbf{\lambda}^{T}\lbrack DG\rbrack.\ $$

This is one of the less obvious benefits of using derivatives and matrix
notation over gradients: it is much simpler to set up higher-dimensional
optimization problems.

Both the single- and multiple-constraint versions of the Lagrange
equations work by converting the constrained optimization problem to an
unconstrained problem on a higher-dimensional space using the modified
function

$$L = F - \mathbf{\lambda}^{T}G\ .$$

\[BONUS\] DATA ANALYSIS. One important application of optimization comes
from basic statistics and elucidates the *best fit line* problems of
elementary data analysis. Given a collection of data points of the form
$(x_{i},y_{i})$, the problem of finding a best fit line of the form
$y = mx + b$ can be phrased as an optimization problem. The cost
function

$$f(m,b) = \sum_{i}^{}\left( y_{i} - \left( mx_{i} + b \right) \right)^{2}$$

records the net square distances from the idealized line to the data
points, as measured vertically. It is a challenging but useful exercise
to find the critical point of this function and classify it as a
minimum. This is the beginning of the subject of *linear regression*,
which extends both to multivariate data and to nonlinear
representations. As the reader might guess, such generalizations are
informed by Taylor series.

More generally, data analysis in the context of machine learning, AI,
deep learning, and the like is highly influenced by optimization,
whether it is in training a neural network, optimizing a discriminator,
or other convergent phenomena. The interested reader would do well to
take a dedicated course in optimization theory.

\[BONUS\] NASH EQUILIBRIA. Game theory provides another motivation for
optimization, generally, and saddle points, specifically. In a symmetric
2-player game, each player chooses a strategy from a finite set of
options (perhaps different sets between players). The game executes and,
based on the strategies chosen, a certain payout from a matrix $P$
(indexed by player strategies) is transferred from loser to winner. If
each player chooses a strategy at random based on an individual
probability distribution, then, over time, there is an average payout
function whose inputs are the terms of the probability distributions and
whose output is the average (or *expected*) payout. A *Nash equilibrium*
for the game is a saddle point of this expected payout function: a
choice of probability distributions which, if held, maximizes the gain
of the winner (with respect to the loser changing strategies) and
minimizes loss of the loser (with respect to the winner changing
strategies). It is fascinating that one can analyze best-case scenarios
and expected average payouts for repeated-play games using probabilities
as the optimization variables.

### DISCUSSION

[QUESTION 1:]{.underline} Find and classify the critical points of the
following:

- $f(x,y) = y^{4} - 2xy^{2} + x^{3} - x$

- $g(x,y) = \ e^{- y}(x^{2} - y^{2})$

Problems like this are an excellent opportunity to discuss the hidden
logical operators in solving nonlinear equations: be careful with how
AND and OR operations compose.

[QUESTION 2:]{.underline} For what values of $C$ will the function
$f(x,y) = Cx^{2} + 4xy + Cy^{2}$ have a local max at $(0,0)$? Min?
Saddle? Classify as a function of $C$.

The classification algorithm is useful, as always, but could one figure
this out without using that 2^nd^ derivative test?

[QUESTION 3:]{.underline} Compute the second derivative matrix
\[*Hessian*\] of the following function of six variables:

$$f(x,y,z,u,v,w)\  = \ u^{3} - 3uv^{2} + v^{4} + w^{3} - 3wx^{2} + x^{4} + y^{3} - 3yz^{2} + z^{4}$$

This is a good problem for conceptualizing multiple variables. As a
follow-up, consider what the 2^nd^ derivative can say about the critical
point at the origin. Why is this not a surprise?

[QUESTION 4:]{.underline} A function $f(x,y)$ has the following Taylor
expansion about $(0,0)$:

$$5 - x^{2} + \frac{3}{2}xy - \frac{5}{4}y^{2} + \frac{2}{9}x^{3} - x^{2}y + \frac{2}{5}xy^{2} + \frac{1}{3}y^{3} + \cdots$$

Is the origin a critical point? If so, of what type?

This is a good way to recall Taylor series from the previous week, as
well as to emphasize that the 2^nd^ derivative test comes from a Taylor
expansion.

[QUESTION 5.]{.underline} \[*Draw a simple closed curve in the plane*\]
Let's say you have a scalar-valued function on the plane constrained to
this curve. What are the critical points of the function $f(x,y) = y$?
What about $g(x,y) = x$? What about $h(x,y) = x - y$?

Get students to see the idea of tangent level sets, then remind them of
last week's work on gradients being orthogonal to level sets. What do
tangent level sets mean in terms of gradients? This is a good way to
naturally bring up the Lagrange equations.

[QUESTION 6:]{.underline} Consider the cost function $F = x^{2} + y^{2}$
and the constraint function $G = xy$. Draw pictures of the levels sets
of both functions and solve for maxima and minima graphically. Interpret
the Lagrange multiplier in terms of rates of change of the optimal
value.

*Students may find this challenging, but it is good practice for working
with level sets.*

[QUESTION 7:]{.underline} Use a Lagrange multiplier to find the critical
points of the function $f(x,y,z)\  = \ x^{2}\  + \ y^{2}\  + \ z^{2}$ on
the plane $ax + by + cz\  = \ 1$.

*What is the geometry of this problem? Can you classify any critical
points you found? How could you extend this problem from 3-D to*
$n$*-D?*

[QUESTION 8:]{.underline} Use a Lagrange multiplier to re-derive the
formula for the minimal distance from a point
$\left( x_{0},\ y_{0} \right)$ in the plane to the line $ax + by = c$.
*Hint:* minimize the square of the distance, using
$f(x,y)\  = \ \left( x - x_{0} \right)^{2}\  + \ \left( y - y_{0} \right)^{2}$
subject to the constraint of being on the line. Then, after finding this
minimal distance-squared, take its square root.

*Ask students if they would find it difficult to do this in arbitrary
dimensions with a hyperplane.*

[QUESTION 9:]{.underline} Recall the Cobb-Douglas model of production in
economics *\[Week 5 Question 7\].* It says that the production $P$ of an
industrial process depends on the amount of labor $L$ and materials $M$
via:

$$P\  = \kappa L^{\alpha}M^{\beta}$$

where $\alpha + \beta = 1$. Assume that labor costs $A$ dollars per
unit, and materials cost $B$ dollars per unit; use a Lagrange multiplier
to determine how you should allocate a fixed amount of $C$ dollars so as
to maximize production.

Does your answer make sense? How do things change as worker costs
increase? If there were a commodity shock of rapidly increasing prices
for raw materials, how would it impact unemployment? What would you do
to model two classes of labor (workers/management) and different types
of raw materials?

[QUESTION 10:]{.underline} Consider the cost functions $F_{1}$ and
$F_{2}$ with potential constraint functions $G_{1}$ and $G_{2}$ where:

$$F_{1} = 2x^{3} + 3y^{2} - 6\ \ \ \ \ :\ \ \ \ F_{2} = x^{2} + y^{2} - 4$$

Compare and contrast extremizing $F_{1}$ subject to $F_{2} = 0$ versus
extremizing $F_{2}$ subject to $F_{1} = 0$. Are the extrema the same?
Which problems would work better via parametrization versus via the
Lagrange method? What are the advantages and disadvantages of each
approach?

This is an opportunity once again to discuss parametrized versus
implicit representations. The ability to classify a critical point is --
at present -- an advantage for the parametrized approach.

[QUESTION 11:]{.underline} Observe that the following has a critical
point at the origin:

$$f(x,y,z)\  = \ 5x^{2} + 5y^{2} + 9z^{2} - 2xz - 2yz$$

Since it has three variables, you cannot use the 2-d method we covered
in class to classify it, so what do you do?

Asking the class might yield answers such as "Try using a computer to
plot something" -- that's not bad. If desired, one can mention
eigenvalues (to come) or Sylvester's criterion (hopefully not to come).
The following approach is something that the class can do with just a
little leading.

Note that $f$ can be factored into a sum of squares. First, work with
$x$ and $z$:

$$f(x,y,z) = \ \left( 5x^{2}\  - \ 2xz\  + \frac{1}{5}z^{2} \right)\  + \frac{44}{5}z^{2}\  - 2yz\  + \ 5y^{2}\  = \ \left( \ \sqrt{5}x\  - \frac{z}{\sqrt{5}} \right)^{2}\  + \frac{44}{5}z^{2}\  - 2yz\  + \ 5y^{2}$$

If you keep going, you will have expressed $f$near the origin as a sum
of squares, with positive coefficients in front of each square term:
thus you have a minimum. This method works in general for a Hessian
quadratic form near a critical point. It\'s not fun, but it does work!

[QUESTION 12:]{.underline} \[OPTIONAL : multiple Lagrange multipliers\]
Use two Lagrange multipliers to solve the following: what points on the
intersection of the cylinder $x^{2} + y^{2} = 4$ and the plane
$2x + 2y + z = 2$ are closest and farthest from the origin?

Hint: as before, you might find it helpful to maximize/minimize the
square of the distance to the origin, and then take the square root of
the resulting figures. For two constraints $G_{1} = 0$ and $G_{2} = 0$,
the Lagrange equations take the form:

$$\lbrack Df\rbrack\  = \lambda_{1}\lbrack DG_{1}\rbrack\  + \lambda_{2}\left\lbrack DG_{2} \right\rbrack$$

These, together with the two constraint equations, allow one to solve
for critical point(s).

[QUESTION 13:]{.underline} What are some examples of Nash equilibria
being used? In what contexts are these saddle points important?

This is a good opportunity to make connections to other disciplines and
raise philosophical and moral questions. Possible tangents include:

- The use of Nash equilibria in betting markets, evolution, warfare...
  what do you think?

- GANs \[generative adversarial networks\] pit a generator and
  discriminator neural network in a 2-player game, with the Nash
  equilibrium providing a convincing generator (such as
  thispersondoesnotexist.com). What happens when you do this with a
  "fake news detector" neural network?

[QUESTION 14:]{.underline} Compute the Nash equilibrium for the
following 2-player game, where, recall, the convention is for player A
to receive from player B the amount in the corresponding entry of the
matrix.

$$P = \begin{bmatrix}
 - 1 & 2 \\
3 & - 2
\end{bmatrix}$$

Compute the expected payout at the Nash equilibrium. Is there an
advantage to being player A or B? In practice, it's clear that the first
player has an advantage. What is not clear is how much an advantage it
is and with what probability the strategies should be played.

[QUESTION 15:]{.underline} Compute the Nash equilibrium and expected
payout for a strange variant of rock-paper-scissors, where the payout
matrix is given by

$$P = \begin{bmatrix}
2 & - 3 & 1 \\
 - 3 & 5 & 0 \\
4 & 0 & - 3
\end{bmatrix}$$

For this problem, player A will choose strategies with a probability
distribution $(a,\ b,\ 1 - a - b)^{T}$ and player B with
$(c,\ d,\ 1 - c - d)^{T}$. The average payoff function will (hopefully!)
have a single equilibrium: a saddle.

Finding the equilibrium is an algebraic challenge, but it uses 2-by-2
inverses and provides a nice review of skills. The resulting answer is a
worthwhile surprise.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the function
$f(x,y) = \ Cx^{2} - \ 8xy\  + \ 2y^{2}\ $

A\) For what value(s) of $C$ does $f$ have more than one critical point?

B\) For what value(s) of $C$ does $f$ have a saddle point at the origin?

C\) For what value(s) of $C$ does $f$ have a local maximum at the
origin?

[PROBLEM 2.]{.underline} Consider the function
$f(x,y) = \cos x + \sin y$

A\) Locate the critical points in the region where
$- \frac{\pi}{2} < x < \frac{\pi}{2}\ $and $- \pi < y < \pi$.

B\) Compute the second derivative $\lbrack D^{2}f\rbrack$.

C\) Use the second derivative $\lbrack D^{2}f\rbrack$ to classify the
critical points from (A).

[PROBLEM 3.]{.underline} Consider the function

$$f(x,y) = C_{1}x^{3} + C_{2}xy^{2} + C_{3}y^{3} + C_{4}x^{2} + C_{5}xy + C_{6}y^{2} + C_{7}x + C_{8}y + C_{9}$$

where the nine coefficients $C_{i}$ are all constants.

A\) Compute the derivative $\lbrack Df\rbrack$ at the origin and
explain: under what conditions on the constants is the origin a critical
point?

B\) Compute the 2^nd^ derivative (or Hessian) $\lbrack D^{2}f\rbrack$ of
$f$ at the origin. Assuming the origin is a critical point, under what
conditions on the constants $C_{i}$ is it a (local) minimum?

C\) If $C_{4} = - C_{6}$, what can you say about the critical point at
the origin?

[PROBLEM 4.]{.underline} Consider the function
$f\left( x_{1},x_{2},x_{3},x_{4} \right) = 5 + x_{1}^{2} - 2x_{2}^{2} + 3x_{3}^{2} - 4x_{4}^{2}.$

A\) Compute the 1^st^ derivative $\lbrack Df\rbrack$ and 2^nd^
derivative (i.e., Hessian) $\lbrack D^{2}f\rbrack$.

B\) Find all critical points of $f$.

C\) Choose one of the critical points you found in part (B) and argue
whether it should be a max, min, saddle, or degenerate point by thinking
about what the function does nearby.

[PROBLEM 5.]{.underline} Consider the function
$f(x,y) = x^{4}y^{2} - xy^{3} - 8x + 8y + 13$.

A\) Show that $f$ has a critical point at $x = 1,\ y = 2$.

B\) Compute the second derivative $\lbrack D^{2}f\rbrack$.

C\) Classify the critical point at $(1,2)$.

[PROBLEM 6.]{.underline} Find and classify the two critical points of
the function

$$f(u,v) = u^{3} - v^{3} + uv - 7\ .$$

[PROBLEM 7.]{.underline} Consider the function
$f(x,y) = y^{2} - 2y + x^{2} - xy - 4x - 2$

A\) Compute the first and second derivatives of $f$.

B\) Determine the global maximum and minimum of $f$over the domain in
the plane where $x \geq 0$.

[PROBLEM 8.]{.underline} Consider the function
$f(x,y) = x + 2xy - y^{2}$ on the unit disc $x^{2} + y^{2} \leq 1$ in
the plane.

A\) Find and classify all critical points in the interior of the disc
(that is, ignoring the boundary).

B\) Set up but do not solve the equations for finding the critical
points on the boundary of this disc.

C\) Will the global max be on the boundary or the interior? What about
the global min?

[PROBLEM 9.]{.underline} Consider the function
$f = x^{2} - y^{2} + z^{2}$ constrained to the plane given by
$x + 2y - z = 1$.

A\) Use the method of Lagrange to find the critical point of this
constrained $f$.

B\) What is the value of the Lagrange multiplier $\lambda$ you found in
part (A)?

C\) What is the value of $f$ at the critical point you found in part
(A)?

D\) Do you suspect this is a max or a min or a saddle?

[PROBLEM 10.]{.underline} Consider the function
$f(x,y)\  = \ x\  + \ xy$ restricted to the circle
$x^{2}\  + \ y^{2}\  = \ 1$.

A\) Write out and simplify the Lagrange equations for this optimization
problem.

B\) Use the Lagrange equations to solve for the critical points. Your
final answer should find three critical points on the circle.

C\) Which critical point is the global maximum?

[PROBLEM 11.]{.underline} Consider the function
$f(x,y) = \  - 3x^{2} + 4xy$

A\) Does $f$ have a global maximum?

B\) Find the critical point of $f$ constrained to the line
$3y - 2x = 9$.

[PROBLEM 12.]{.underline} Consider the function
$f(x,y) = {2x}^{2} - 6x + 5 + 2xy^{2} + y^{2}$

A\) Find and classify all the critical points of $f$.

B\) If you constrain the function $f$ to the line $2x - 3y = 5$ then
there is a single minimum. Explain the process you would use to find
that critical point.

[PROBLEM 13.]{.underline} Consider the function
$f(y,z) = 6y^{3/5}z^{2/5}$ restricted to the line $y + z = 10$ where
$y,\ z$ are both $> 0$. Your goal is to maximize $f$ restricted to this
domain.

A\) Write down in full the Lagrange equations for this optimization
problem.

B\) Use the Lagrange method to find the unique constrained critical
point.

C\) Comment briefly on how you could argue that this is a maximum.

[PROBLEM 14.]{.underline} Find and classify the extrema of the function
$f(x,y) = \sqrt{3xy}$ subject to the constraint that $3x + 4y = 18$.

[PROBLEM 15.]{.underline} Consider the
function$\ \ \ g(x,y) = \frac{2}{3}x^{3} - 2x^{2}y + 5xy + {\frac{1}{2}y}^{2}$

A\) Compute the gradient $\nabla g$

B\) Find a point $(x,y)$ where the gradient is zero.

C\) Near this point, what does $g$ "look like"? Is it a local max, min,
etc?

D\) How many nonzero terms does the Taylor series of $g$ about the
origin have?

[PROBLEM 16.]{.underline} Consider the
function$\ f = (x - 3y)(\cos{(x^{2} + y^{2}))(\sin{(2x + y))}}$.

A\) Taylor expand this about the origin, including only terms of degree
$\leq 2$.

B\) This function has a critical point at the origin. Classify it.

[PROBLEM 17.]{.underline} A) The area of an ellipse with major axis
length $B > 0$ and minor axis length $C > 0$ is equal to $A = \pi BC$.
Use a Lagrange multiplier to show that a circle (i.e., an ellipse where
$B = C$) maximizes the area subject to the constraint that $B + C$ is a
fixed number.

B\) Repeat the same argument for a solid ellipsoid in 3-d with axis
lengths $B,\ C,\ D$ and volume $V = \frac{4}{3}\pi BCD$, subject to the
constraint $B + C + D$ is constant: show, using the Lagrange method,
that the maximum volume happens when $B = C = D$.

C\) In these problems above, why is it that the critical point you found
corresponds to the maximal area/volume?

[PROBLEM 18.]{.underline} Consider the functions

$${f(x,y,z) = x^{2} + 2y^{2} + 3z^{2} - xy - yz
}{g(x,y,z) = f(y,z,x) = y^{2} + 2z^{2} + 3x^{2} - yz - xz}$$

A\) Compute the gradients $\nabla f$ and $\nabla g$.

B\) Write down an explicit set of equations that could be used to
optimize the value of $f$ along the surface where $g(x,y,z) = 4$.

C\) Explain in words what the gradient vectors of $f$ and $g$ look like
evaluated at a solution to the equations you wrote down in part (B).

[PROBLEM 19.]{.underline} Assume two quantities, $x$ and $y$, of items
which respectively cost \$2 and \$5 per unit. Assume the total cost is
\$50, and you want to minimize the function

$$F(x,y) = 2\ \sqrt{x}\ \sqrt[3]{y} = 2x^{1/2}y^{1/3}$$

A\) What is the constraint equation on $x$ and $y$?

B\) Use the method of a Lagrange multiplier to solve for the optimal $x$
and $y$. You do not need to prove that it minimizes $F$, but you must
use the Lagrange method & show work.

[PROBLEM 20.]{.underline} Consider the function

$$f(x,y,z) = 2z + xy - 3xz + {\frac{1}{2}z}^{2} + xy^{2} + xyz - x^{2}y + \frac{1}{6}z^{3}$$

A\) Compute the gradient of $f$.

B\) If you constrain $f$ to the plane $z = 0$, then $f$ has critical
points at $y = 0$ and $x = 0$, or, $y = 0\ $and $x = 1$. Classify these
two constrained critical points.

C\) Is the constrained critical point at the origin an unconstrained
critical point of $f$?

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} A) $C = 8\ $; B) $C < 8$ ; C) none: must
satisfy $C > 8$ and $2C + 4 < 0$

[PROBLEM 2.]{.underline} A) $\left( 0, \pm \frac{\pi}{2} \right)$ ; B)
$\left\lbrack D^{2}f \right\rbrack = \begin{bmatrix}
 - \cos x & 0 \\
0 & - \sin y
\end{bmatrix}$ ; C) $\left( 0, - \frac{\pi}{2} \right)$ max,
$(0,\frac{\pi}{2})$ saddle

[PROBLEM 3.]{.underline} A) $\lbrack Df\rbrack_{0} = \begin{bmatrix}
C_{7} & C_{8}
\end{bmatrix}$; B)
$\left\lbrack D^{2}f \right\rbrack_{0} = \begin{bmatrix}
2C_{4} & C_{5} \\
C_{5} & 2C_{6}
\end{bmatrix}$, so it is a local minimum if $C_{4}$, $C_{6} > 0$ and
$C_{5} < 2\sqrt{C_{4}C_{6}}$ ; C) it is a saddle

[PROBLEM 4.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2x_{1} & - 4x_{2} & 6x_{3} & - 8x_{4}
\end{bmatrix}$ ; $\left\lbrack D^{2}f \right\rbrack = \begin{bmatrix}
2 & 0 & 0 & 0 \\
0 & - 4 & 0 & 0 \\
0 & 0 & 6 & 0 \\
0 & 0 & 0 & - 8
\end{bmatrix}$ ; B) the origin is the only critical point and it is a
saddle point

[PROBLEM 5.]{.underline} B)
$\left\lbrack D^{2}f \right\rbrack_{1,2} = \begin{bmatrix}
48 & 4 \\
4 & - 10
\end{bmatrix}$ ; C) thus, saddle

[PROBLEM 6.]{.underline} $(0,0)$ is a saddle;
$\left( \frac{1}{3}, - \frac{1}{3} \right)$ is a local minimum (not
global)

[PROBLEM 7.]{.underline} A) $\lbrack Df\rbrack = \begin{bmatrix}
2x - y - 4 & 2y - x - 2
\end{bmatrix},$ $\left\lbrack D^{2}f \right\rbrack = \begin{bmatrix}
2 & - 1 \\
 - 1 & 2
\end{bmatrix}$ ; B) $\left( \frac{10}{3},\frac{8}{3} \right)$ is the
global minimum; along the edge $x = 0$, there is a local minimum at
$(0,1)$; there is no global maximum since $f \rightarrow + \infty$ as
$x,y \rightarrow + \infty$

[PROBLEM 8.]{.underline} A)
$\left( - \frac{1}{2}, - \frac{1}{2} \right)$ saddle ; B) can use
Lagrange or parametrize the circle ; C) since the interior critical
point is a saddle, all global extrema are on the boundary

[PROBLEM 9.]{.underline} A)
$\left( - \frac{1}{2},\ 1,\frac{1}{2} \right)$ ; B) $\lambda = - 1$ ; C)
$f = - \frac{1}{2}$ ; D) not a max

[PROBLEM 10.]{.underline} A) $\left( \begin{array}{r}
1 + y \\
x
\end{array} \right) = \lambda\left( \begin{array}{r}
2x \\
2y
\end{array} \right)\ $; B) $(0, - 1)$ and
$\left( \pm \frac{\sqrt{3}}{2},\frac{1}{2} \right)$ ; C)
$\left( \frac{\sqrt{3}}{2},\frac{1}{2} \right)$ is the max

[PROBLEM 11.]{.underline} A) nope ; B) $(18,15)$

[PROBLEM 12.]{.underline} A) $\left( \frac{3}{2},0 \right)$ ,
$\left( - \frac{1}{2}, \pm 2 \right)$ saddles ; B) use Lagrange or
parametrize

[PROBLEM 13.]{.underline} A) $\ \left( \begin{array}{r}
\frac{18}{5}y^{- \frac{2}{5}}z^{\frac{2}{5}} \\
\frac{12}{5}y^{\frac{3}{5}}z^{- \frac{3}{5}}
\end{array} \right) = \lambda\left( \begin{array}{r}
1 \\
1
\end{array} \right)$ ; B) $y = 6,\ z = 4$

[PROBLEM 14.]{.underline} via Lagrange or parametrization,
$x = 3,\ y = \frac{9}{4}$ and $f = \frac{9}{2}$ ; do not forget the
endpoints at $(6,\ 0)$ and $\left( 0,\ \ \frac{9}{2} \right)$ where
$f = 0$; the interior critical point is the global max

[PROBLEM 15.]{.underline} B) origin ; C) saddle ; D) four, of course

[PROBLEM 16.]{.underline} A)
$(x - 3y)(1)(2x + y) = 2x^{2} - 5xy - 3y^{2}$ ; B) saddle

[PROBLEM 17.]{.underline} symmetry is wonderful

[PROBLEM 18.]{.underline} A/B) $\left( \begin{array}{r}
2x - y \\
4y - x - z \\
6z - y
\end{array} \right) = \lambda\left( \begin{array}{r}
6x - z \\
2y - z \\
4z - y - x
\end{array} \right)$ ; C) must be parallel

[PROBLEM 19.]{.underline} A) $2x + 5y = 50$ ; B) $x = 15,\ y = 4$

[PROBLEM 20.]{.underline} A) $\nabla f = \left( \begin{array}{r}
y - 3z + y^{2} + yz - 2xy\  \\
x + 2xy + xz - x^{2} \\
2 - 3x + z + xy + \frac{1}{2}z^{2}
\end{array} \right)$ ; B) they are both saddles, since on this plane
$\left\lbrack D^{2}f \right\rbrack_{z = 0}\  = \ \begin{bmatrix}
 - 2y & 1 + 2y - 2x \\
1 + 2y - 2x & 2x
\end{bmatrix}$ ; C) nope

## 

# VOLUME III : INTEGRALS

## 

## Week 9 : Integrals & Averages

### OUTLINE

MATERIALS: Calculus BLUE : Vol 3 : Chapters 1-5

TOPICS:

- Multivariate integrals as limits of Riemann sums

- Interpretation of an integrand as a signed density

- Interpretation of an integral as a signed mass

- The Fubini Theorem and iterated anti-differentiation

- Integration domains and limits for multiple integrals

- Changing order-of-integration

- Visualizing double and triple integrals

- The use of integrals for computing areas and volumes

- Averages of scalar-valued functions via integrals

LEARNING OBJECTIVES:

- Use the Fubini Theorem to evaluate multiple integrals

- Compute areas and volumes by setting up and solving integrals

- Change limits of integration under change of order of integration

- Infer planar projections of a 3-D domain of integration based on
  limits

- Infer limits of integration based on planar projects of the domain

- Set up and evaluate averages of functions over domains

- Estimate whether an integral is positive, negative, or zero

- Use properties of even and odd functions to simplify integrals

- Use the additivity property of integrals

### PRIMER

This begins of the third quarter of our story, from differentiation to
integration.

DEFINITIONS. The single-variable story of working with indefinite
integrals as anti-derivatives and definite integrals as
area-under-a-curve does not generalize in a straightforward fashion.
More salient is the notion of a *Riemann sum* of an integrand
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$ limited to some (reasonably
nice $n$-dimensional) region of the domain $D \subset \mathbb{R}^{n}$.
The mechanics of partitioning a domain and checking that a Riemann sum
converges to a well-defined integral are both byzantine and unwelcome in
an applications-facing course. Let us assume that every integrand
continuous and bounded on a bounded domain is *integrable*, having a
Riemann sum that converges as the partition element sizes shrink to
zero.

INTERPRETATIONS. It is possible to compute geometric quantities by means
of integrals, much as in single-variable calculus, by using an integrand
of $1$. In the language of differential elements, this is summing up
volume elements

$$dV = dx_{1}dx_{2}\cdots dx_{n - 1}dx_{n}$$

to compute $n$-dimensional volume. The familiar setting of
$n = 1,\ 2,\ 3$ gives length, area, and volume respectively.

When integrating $f\left( \mathbf{x} \right)dV$, one is tempted to
imagine the integral as the "volume" "under" the "graph" of $f$. This is
inconvenient. It is better to think of the domain $D$ as a massive body
with density $f$ (which is permitted to become zero or even negative),
in which case $dM = f\left( \mathbf{x} \right)dV$ is a mass element and
the integral is mass. This will be of explicit importance next week; for
this week, density and mass are helpful metaphors.

FUBINI. The great coup of single variable calculus is the swap from
definite to indefinite integrals flowing from the Fundamental Theorem of
Integral Calculus. That theorem is as yet too lofty for us to attain in
the multivariate setting (though the end of our story beckons). Our best
tool at present for evaluating integrals is the Fubini Theorem which
allows for iterated anti-differentiation:

$$\iint_{\ }^{\ }\cdots\iint_{\ }^{\ }{f(\mathbf{x})}dx_{1}dx_{2}\cdots dx_{n - 1}dx_{n} = \int_{}^{}{\left( \int_{}^{}{\left( \cdots\int_{}^{}{\left( \int_{}^{}{f\left( \mathbf{x} \right)\ dx_{1}} \right)dx_{2}\cdots} \right)dx_{n - 1}} \right)dx_{n}}$$

One must exercise care with the limits of integration, but the order of
integration does not change the result. (Though in practice, reordering
may be advantageous for computation.)

DOUBLES & TRIPLES. An application of Fubini in $\mathbb{R}^{2}$ or
$\mathbb{R}^{3}$ leads respectively to *double* or *triple* integrals.
These are immediately useful for computing areas, volumes, masses, and
more. For our purposes this week, there are a few challenges that will
be considered carefully in the context of double and triple integrals.

- *Setting boundaries* : given a domain $D$, choose an order of the
  integration variables and set up the limits of integration to
  integrate over the interior of the domain.

- *Inferring geometry* : the often-harder inverse problem is to
  determine the geometric shape over which one is integrating given a
  particular ordering of variables and set limits of integration.

- *Reordering variables* : the previous problem is usually a precursor
  to the challenge of reordering the variables with which a multiple
  integral is computed.

In this course, the goals for integration are less about intricate
techniques for anti-differentiating a difficult integrand (always a
danger!) and more about setting up and manipulating limits of
integration. It is not possible to write out a general algorithm for
inferring geometry or reordering variables for a triple integral: these
can be very challenging. It is best to practice with double integrals at
first, then work up to handling triple integrals by ignoring the "inner"
variable and working with the planar projection to the outer two
variables' plane.

AVERAGES. Double and triple integrals have obvious motivations arising
from geometry. Of what possible use could integration be over
$\mathbb{R}^{n}$ be for $n > 3$? *Averages* are the simplest strongest
application of higher-dimensional integrals. The same formula from
single variable calculus holds here: for
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$, the average of $f$ over an
$n$-dimensional region $D$ in the domain is given by

$$\overline{f} = \frac{\int_{D}^{}{f\ dV}}{\int_{D}^{}{dV}} = \frac{1}{V}\int_{D}^{}{f\ dV}\ ,$$

where $dV$ equals the $n$-dimensional volume element
$dx_{1}dx_{2}\cdots dx_{n - 1}dx_{n}$ and $V$ is the volume of the
region $D$ (area in 2-D, length in 1-D, *etc.*). Next week this idea
will be expanded greatly to characterize the geometry and physics of
massive bodies, as well as give a quick pass over probability.

### DISCUSSION

[QUESTION 1:]{.underline} Compute the following double integral over a
rectangle:

$$\int_{\pi/6}^{\pi/3}{\int_{0}^{\pi/2}{\cos{(x - y)}\ dy\ dx}}$$

This is a simple problem, but good for recalling the basics.

[QUESTION 2:]{.underline} Compute the integral over the unit
$n$-dimensional cube:

$$\int_{0}^{1}{\int_{0}^{1}{\cdots\int_{0}^{1}{x_{1}\ dx_{n}dx_{n - 1}\cdots dx_{2}dx_{1}}}}$$

Encourage students to do it in two different orders to see how to use
Fubini... Does the final answer make sense? Why? It is useful to look at
the simplest case of $n = 1$; perhaps returning to this at the end of
the session, after focusing more on averages. This gives a foreshadowing
of Week 10 centroids.

[QUESTION 3:]{.underline} Reverse the order of integration of the
following double integrals:

$$\int_{0}^{4}{\int_{\sqrt{u}}^{2}{u^{2} + v^{2}\ dv\ du}\ }\ \ \ \ \ :\ \ \ \ \int_{0}^{1}{\int_{0}^{2x}{f(x,y)\ dy\ dx}}\ \ \ \ \ :\ \ \ \int_{0}^{2}{\int_{0}^{x^{2}}{x^{2} - y^{2}\ dy\ dx}}$$

What do you notice? Is there a pattern for how to change orders of
integration? It is best to encourage students to use elements to think
about limits: memorizing patterns does not work in general.

[QUESTION 4:]{.underline} Set up a triple integral that computes the
volume of the region that is bound by the paraboloid
$x = 4 - y^{2} - z^{2}$ for $x \geq 0$ and by the ball of radius 2 at
the origin for $x \leq 0$.

Notice how careful the wording on this is -- do not say "the region
between". After setting up this integral, ask how much effort it would
take to solve it and compute the volume? Are there any ways to make this
volume computation simpler? This emphasizes additivity of integrals.

[QUESTION 5:]{.underline} Compute the following triple integral:

$$\int_{x = 0}^{2}{\int_{y = 0}^{x}{\int_{z = x}^{y}{\ xyz\ dz\ dy\ dx}}}$$

Why is the answer negative when the integrand is everywhere positive!?
Have we made a mistake? Encourage students once again to begin by
drawing a picture of the integration domain in the $(x,y)$ plane. What
can be said about the relative sizes of $x$ and $y$? This is an
opportunity to emphasize orientation and the single-variable case of
reversing limits. This foreshadows the integration of 1-form fields
later: the $dz$ evaluates to a negative term since the $z$-coordinate is
decreasing.

[QUESTION 6:]{.underline} Compute the value of the improper integral

$$\int_{0}^{\infty}{\int_{0}^{\infty}{e^{- ax - by}\ dy\ dx}}$$

This is a good time to remind students of how improper integrals
operate, as well as how the Fubini theorem operates on a product of
"independent" functions over a product domain. Begin by doing this
double integral "the long way"; then use Fubini to split it into the
product of two cognate integrals. End with asking what happens to this
if instead of a double or a triple integral one has arbitrary numbers of
variables? These examples will be important in Week 10 when doing
multivariate probability.

[QUESTION 7:]{.underline} Under what conditions on the domain can one
assert that:

$$\iiint_{}^{}{\cdots\ \iiint_{}^{}{f_{1}\left( x_{1} \right)f_{2}\left( x_{2} \right)\cdots f_{n}\left( x_{n} \right)\ d\mathbf{x}}}\  = \left( \int_{}^{}{f_{1}\ dx_{1}} \right)\left( \int_{}^{}{f_{2}\ dx_{2}} \right)\cdots\left( \int_{}^{}{f_{n}\ dx_{n}} \right)$$

It is worth going over why it does split in the case of a rectangular
axis-aligned prism and why it does not work in general. What if you have
a cube that is rotated? As always, if looking for counterexamples, 2-D
is enough.

[QUESTION 8:]{.underline} The following integral is rather difficult to
compute directly, but it can be done without work if you think
correctly. What is the right argument?

$$\int_{x = - 1}^{1}{\int_{y = 0}^{\sqrt{1 - x^{2}}}{\int_{z = 0}^{\sqrt{1 - x^{2} - y^{2}}}{(1 + x)\ dz\ dy\ dx}}}\  = \ \frac{\pi}{3}$$

This looks intimidating. The right way to start is by asking "What is
the domain?" After discerning it is a quarter of a unit ball, using
linearity, one can split the integral into one term than computes
volume, and another that adds up the $x$-values. Ahha! The integral
seems to evaluate to the volume...why? Use this to recall even and odd
functions and how to simplify integrals based on symmetry. It is
important to think of integrals in terms of Riemann sums: you can
justify the cancellation of an odd integrand over a symmetric domain.

[QUESTION 9:]{.underline} What does it mean to compute the integral of
$f$ over a single point?\
Is it always zero? What additional information about $f\ $might you
need?

Get students to the point where they can imagine the integral of
$f:\mathbb{R}^{0} \rightarrow \mathbb{R}^{1}$as a Riemann sum (of
sorts)., arguing for the integral as being evaluation. If students have
trouble with this, give a follow-up question about the integral of
$f:\mathbb{R}^{2} \rightarrow \mathbb{R}^{1}$ over an interval (say,
along the $x$-axis). Why must this be zero?

[QUESTION 10:]{.underline} Compute the average of the function
$f(x,y,z) = x^{2} + y^{2} + z^{2}$ over the cube defined by
$- 1 \leq x,y,z \leq 1$.

What kinds of symmetries are present in this problem? Would it suffice
to average $x^{2}$ over the interval $0 \leq x \leq 1$? Why or why not?

[QUESTION 11:]{.underline} Compute the average of the function
$f = xy^{3}$ over the domain that lies outside the unit disc and inside
the square $0 \leq x,y \leq 2$.

This type of problem is important, as it emphasizes additivity of
integrals and weighted averages. Begin by asking how hard this would be
if the domain were simply the unit disc, and whether that could be
useful. This is a good problem for remembering that averages require
normalization by the volume (in this case area) of the domain.

[QUESTION 12:]{.underline} Compute the average dot product of two
unit-length vectors in $\mathbb{R}^{2}$.

Do you expect this average to be negative, zero, or positive? Why? Does
the problem become easier or harder if instead you restrict the vectors
to lie in the first quadrant? The very important issue is: what is the
domain of integration? If using angles as coordinates, the pair of unit
vectors in the first quadrant is determined by a point in the square
$\lbrack 0,\pi/2\rbrack \times \lbrack 0,\pi/2\rbrack$. Remember the
formula for the dot product in terms of an angle between the vectors?
Ahha... What would happen to the answer if instead of using angles, you
average over the $x$-coordinates of the vectors yielding the unit square
$\lbrack 0,1\rbrack \times \lbrack 0,1\rbrack$ and inferring the
$y$-coordinates by the unit length constraint? Why does this not give
the same answer? This is all a good foreshadowing of probability next
week.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Compute the volume of the domain in
$\mathbb{R}^{3}$ defined by inequalities
$0 \leq x \leq y^{2}\ ;\ \ 0 \leq y \leq 2\ \ \ ;\ \ 0 \leq z \leq y$.

[PROBLEM 2.]{.underline} Use a triple integral to compute the volume of
the domain in 3-D given by the inequalities:
$0 \leq x \leq 2\ \ \ :\ \  - 1 \leq y \leq 1\ \ \ :\ \ \ 1 \leq z \leq (6 - 2x - y^{3})$

[PROBLEM 3.]{.underline} Consider the double integral

$$\int_{- 1}^{8}{\int_{\sqrt[3]{y}}^{2}{\frac{y^{2}}{e^{x}}\ dx\ dy}}$$

A\) Draw a careful picture of the domain of integration.

B\) Reverse the order of integration (but do not evaluate the integral).

[PROBLEM 4.]{.underline} Compute the following double integral:

$$\int_{x = 0}^{2}{\int_{y = x^{2}}^{4}\frac{x^{3}}{\sqrt{x^{4} + y^{2}}}}\ dy\ dx$$

[PROBLEM 5.]{.underline} Consider the following triple integral.

$$\int_{0}^{1}{\int_{y^{2}}^{y}{\int_{y^{2}}^{x^{2}}{x^{2} + y^{2}\ dz\ dx\ dy}}}$$

A\) Evaluate it, showing all steps.

B\) Fill in the limits of integration if the order of integration is
changed like so:

$$\int_{\ \ }^{\ \ }{\int_{\ \ }^{\ \ }{\int_{\ \ }^{\ \ }{x^{2} + y^{2}\ dz\ dy\ dx}}}$$

[PROBLEM 6.]{.underline} Consider the triple integral

$$\int_{z = 0}^{2}{\int_{y = 0}^{\sqrt{2z}}{\int_{x = 2y}^{z + 4}{x - y\ dx\ dy\ dz}}}$$

A\) Do not evaluate the integral; rather, argue whether or not the value
of this integral is positive, negative, zero, or impossible to tell
(without evaluation).

B\) Draw a picture of the domain of integration projected to the $(y,z)$
plane.

C\) Determine the limits of integration of this integral under the
reordering:

$$\int_{\ \ \ }^{\ \ \ \ }{\ \ \ \int_{\ \ \ }^{\ }{\ \ \ \ \int_{\ \ \ }^{\ }{\ \ \ \ \ x - y\ dx\ dz\ dy}}}$$

[PROBLEM 7.]{.underline} Consider the following triple integral

$$\int_{- 2}^{4}{\int_{z/2}^{2}{\int_{- x}^{\ \ 2 - x^{2}}{y - 3x^{2} + 2z\ dy\ dx\ dz}}}$$

A\) Draw a picture of the domain of integration projected onto both the
$(x,y)$ and $(x,z)$ planes.

B\) Fill in the limits of integration if the order is changed like so:

$$\int_{\ \ \ \ \ \ }^{\ \ \ \ }{\int_{\ \ \ \ \ \ \ \ }^{\ \ \ \ \ \ }{\int_{\ \ \ \ \ \ \ }^{\ \ }{\ \ \ y - 3x^{2} + 2z\ dy\ dz\ dx}}}$$

[PROBLEM 8.]{.underline} Consider the 3-D domain whose projections to
the coordinate planes are:

Compute the volume of this solid domain
using a triple integral.

[PROBLEM 9.]{.underline} Consider the 3-D
domain whose projections to the coordinate planes are:

Compute the volume of this solid domain using a triple integral. Assume
a unit grid in the graphs above.

[PROBLEM 10.]{.underline} Evaluate carefully the following integral,
showing all steps.

$$\int_{x_{4} = - 1}^{1}{\int_{x_{3} = 1}^{2}{\int_{x_{2} = 0}^{x_{3}}{\int_{x_{1} = x_{2}}^{x_{3}}{x_{1} + x_{4}\ dx_{1}dx_{2}dx_{3}dx_{4}}}}}$$

over the 4-dimensional cube with corners at $(0,0,0,0)$ and $(2,2,2,2)$.

[PROBLEM 11.]{.underline} Compute the average of the function
$f(x,y) = x^{3} + 2y$ over the domain $D$ given by $1 \leq x + y \leq 2$
; $0 \leq x \leq 2$.

[PROBLEM 12.]{.underline} Consider the following triple integral.

$$\int_{- 3}^{3}{\int_{0}^{\sqrt{9 - y^{2}}}{\int_{0}^{\sqrt{9 - x^{2} - y^{2}}}{f(x,y,z)\ dz\ dx\ dy}}}$$

A\) Describe and draw a picture of the domain of integration of the
integral above.

B\) If this integral evaluates to $24\pi$, what is the average
$\overline{f}$ of $f$ over the domain?

[PROBLEM 13.]{.underline} Consider the function
$f:\mathbb{R}^{4}\mathbb{\rightarrow R}$ given by

$$f(x,y,u,v) = xu - y^{2}v^{2}$$

A\) Compute and simplify the average of $f$over the 4-D cube where each
of the four variables goes from $0$ to $2$.

B\) Does your answer make sense?

[PROBLEM 14.]{.underline} Consider the rectangle with corners at
$(1,1),\ (1,0),\ (3,0),(3,1)$.

A\) Compute the average of the function $f(x,y) = x^{2}e^{- 2y}$ over
this rectangle.

B\) Given an example of a non-constant function $g(x,y)$ that has an
average of zero on this rectangle*.*

[PROBLEM 15.]{.underline} Consider the 4-D domain $D$ given by the
inequalities

$$x_{1}^{2} + x_{2}^{2} \leq 4\ \ \ \ \ :\ \ \  - 1 \leq x_{3} \leq 1\ \ \ \ :\ \ \ 0 \leq x_{4} \leq 3$$

A\) Draw pictures of $D$ projected to the $(x_{1},x_{2})$ and
$(x_{2},x_{3})$ planes.

B\) Evaluate the integral of $f = x_{1} + x_{4}$ over $D$.

[PROBLEM 16.]{.underline} Consider the triple integral

$$\int_{x = 0}^{1}{\int_{y = \sqrt{1 - x^{2}}}^{1 - x}{\int_{z = x}^{y}{dz}dy\ dx}}$$

A\) Draw a careful picture of the projection of its domain onto the
$(x,y)$ plane

B\) Fill in the blanks to reverse the order of integration to the
following:

$$\int_{\ \ \ \ \ \ \ \ \ \ \ \ \ }^{\ \ \ \ \ \ \ \ \ \ \ \ }{\int_{\ \ \ \ \ \ \ \ \ \ \ \ \ }^{\ \ \ \ \ \ \ \ \ \ \ \ \ \ }{\int_{\ \ \ \ \ \ \ \ \ \ \ }^{\ \ \ \ \ \ \ \ \ \ \ }{dz\ dx\ dy}\ }}$$

C\) Without computing it, is the integral positive, negative, or zero?

[PROBLEM 17.]{.underline} For a certain $f = f(x,y)$, assume you are
given the following:

$$\int_{- 1}^{1}{\int_{x}^{1}{f(x,y)\ dy\ dx}} = 7\ \ \ \ \ :\ \ \ \ \int_{- 1}^{1}{\int_{- 1}^{1}{f(x,y)\ dx\ dy}} = 5$$

A\) Draw pictures of the domains of integration of the two integrals
above.

B\) Determine the value of the following integral, explaining your
answer.

$$\int_{- 1}^{1}{\int_{y}^{1}{f(x,y)dx\ dy}}$$

[PROBLEM 18.]{.underline} For a certain $f = f(x,y,z)$, assume you are
given the following:

$$\int_{- 1}^{1}{\int_{- 1}^{1}{\int_{- 1}^{y^{2}}{f\ dz\ dy\ dx}}} = 9\ \ \ \ \ :\ \ \ \ \int_{- 1}^{1}{\int_{- 1}^{1}{\int_{y^{2}}^{1}{f\ dz\ dy\ dx}}} = 6$$

Determine the average $\overline{f}$ over the cube where $x,\ y,\ z$
range from $- 1$ to $1$.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} the volume equals $4$

[PROBLEM 2.]{.underline} the volume equals $12$

[PROBLEM 3.]{.underline} changing order yields

$$\int_{x = - 1}^{2}{\int_{y = - 1}^{x^{3}}{\frac{y^{2}}{e^{x}}\ dy\ dx}}$$

[PROBLEM 4.]{.underline} reverse order and substitute
$u = x^{4} + y^{2}$ to get $4(\sqrt{2} - 1)$

[PROBLEM 5.]{.underline} A)
$\frac{1}{30} - \frac{1}{55} - \frac{1}{6} + \frac{1}{7}$ ; B)
$z = y^{2}\ldots x^{2};\ y = x\ldots\sqrt{x}\ ;x = 0\ldots 1$

[PROBLEM 6.]{.underline} A) positive ; C)
$x = 2y\ldots z^{2} + 1\ ;z = \frac{1}{2}y^{2}\ldots 2\ ;\ y = 0\ldots 2$

[PROBLEM 7.]{.underline}

$$\int_{x = - 1}^{2}{\int_{z = - 2}^{2x}{\int_{y = - x}^{\ \ 2 - x^{2}}{\ \ \ y - 3x^{2} + 2z\ dy\ dz\ dx}}}$$

[PROBLEM 8.]{.underline} the volume equals $8/3$

[PROBLEM 9.]{.underline} the volume equals $24$

[PROBLEM 10.]{.underline} the integral evaluates to $5/2$

[PROBLEM 11.]{.underline} compute

$$\frac{1}{2}\int_{0}^{2}{\int_{1 - x}^{2 - x}{x^{3} + 2y\ dy\ dx}} = 3$$

[PROBLEM 12.]{.underline} A) this is an upper-hemisphere of radius 3
intersecting the positive $z$-axis ; B)
$\overline{f} = 24\pi/36\pi = 2/3$

[PROBLEM 13.]{.underline}
$\overline{f} = \left( \frac{1}{16} \right)\left( - \frac{112}{9} \right) = - \frac{7}{9}$

[PROBLEM 14.]{.underline}
$\overline{f} = \left( \frac{1}{2} \right)\left( \frac{13}{3}\left( 1 - e^{- 2} \right) \right) = \frac{13}{6}\left( 1 - e^{- 2} \right)$

[PROBLEM 15.]{.underline} A) disc of radius 2; rectangle $4 \times 2$;
B) $36\pi$

[PROBLEM 16.]{.underline} B)
$z = x\ldots y\ ;x = \sqrt{1 - y^{2}}\ldots 1 - y\ ;\ y = 0\ldots 1$ ;
C) zero, by symmetry

[PROBLEM 17.]{.underline} B) by additivity, it equals $5 - 7 = - 2$

[PROBLEM 18.]{.underline} the two integral domains partition the cube;
compute the volumes of these domains to be $16/3$ and $8 - 16/3$, then
use additivity:

$$\overline{f} = \frac{1}{8}\left( 9\left( \frac{16}{3} \right) + 6\left( \frac{8}{3} \right) \right) = 8$$

## 

## Week 10 : Mass & Probability

### OUTLINE

MATERIALS: Calculus BLUE : Vol 3 : Chapters 6-12

TOPICS:

- Mass as the integral of a density integrand

- Centroids and centers of mass

- Moment of inertia of a 2-D or 3-D massive body about an axis

- Radius of gyration

- BONUS : The inertia matrix of a 3-D massive body; mixed moments

- BONUS : Basic solid body mechanics using the inertia matrix

- Multivariate probability density functions

- Probability as the integral of a probability density integrand

- Expectation and variance of a random variable

- Standard deviation

- Independent random variables; covariance

- Marginalization

- BONUS : Covariance matrices and their applications

LEARNING OBJECTIVES:

- Compute masses, centroids, and centers of mass via densities

- Set up moment of inertia elements and compute moments of inertia

- Use the Parallel Axis Theorem to simplify moment of inertia integrals

- Compute the radius of gyration given moment of inertia and total mass

- Use and interpret multivariate probability density functions

- Set up and compute probabilities given a density

- Apply additivity of integrals to compute probabilities

- Set up expectation and variance integrals of random variables

- Compute standard deviation given variance

- Marginalize a multivariate probability density via integration &
  Fubini

### PRIMER

This week is dedicated to applications of integrals involving mass in
various forms. It serves as motivation for and an opportunity to
practice computing integrals.

CENTROIDS AND CENTERS. The most common use of multivariate integrals is
not the computation of area or volume but rather the computation of
averages. In a geometric context, the centroid of a body
$D \subset \mathbb{R}^{n}$ is a point $\mathbf{x} \in \mathbb{R}^{n}$
whose coordinates
$\overline{\mathbf{x}} = ({\overline{x}}_{1},{\overline{x}}_{2},\ldots,\ {\overline{x}}_{n})$
are the average $x_{i}$ coordinates over $D$. The student who has seen
centroids covered in single variable calculus may have found it odd that
the single-integral formulae for centroid coordinates $\overline{x}$ and
$\overline{y}$ of the region between two graphs $y = f(x)$ and
$y = g(x)$ of area $A$ are structurally dissimilar:\
$$\overline{x} = \frac{1}{A}\int_{a}^{b}{f(x) - g(x)dx} = \frac{1}{A}\iint_{}^{}{x\ dA}\ \ \ :\ \ \overline{y} = \frac{1}{2A}\int_{a}^{b}{\left( f(x) - g(x) \right)^{2}dx} = \frac{1}{A}\iint_{}^{}{y\ dA}$$

The reason, of course, is that these are double integrals of $x$ and $y$
respectively.

The addition of a variable density $\rho$ simply changes the volume
element $dV$ to the mass element $dM = \rho\ dV$, though it can
complicate explicit computations. Centers of mass can be computed as
density-weighted averages.

MOMENT OF INERTIA. There is a form of rotational mass that registers
resistance to rotation about an axis in the same manner as (ordinary)
mass resists linear translation via Newton's 2^nd^ Law of Motion. This
rotational mass is the *moment of inertia*, $I$, and it depends not only
on the body but also on the axis. The moment of inertia $I$ is the
integral of *inertia element* $dI = r^{2}dM$ where $dM$ is the mass
element and $r$ is the (shortest/orthogonal) distance to the axis of
rotation. In 3-D $(x,y,z)$ coordinates, rotation about the $z$-axis has
element $dI = \left( x^{2} + y^{2} \right)dM$. A single point of mass
$M$ and distance $R$ to the axis has moment of inertia $I = R^{2}M$. As
such, it is sometimes worthwhile to compute for a body of total mass $M$
and total moment of inertia $I$ the *radius of gyration*, $R_{g}$,
representing the distance to the axis that would result from collapsing
the body to a single point of total mass $M\ $without changing its total
moment of inertia. This yields the formula $R_{G} = \sqrt{I/M}$.

One method of simplification is worth mentioning. If one knows the
moment of inertia $I_{0}$ through the center of mass of the object and
one wishes to change the axis of rotation to one that is parallel, then
the integrals can be simplified. The *Parallel Axis Theorem* states that
$I = I_{0} + MR^{2}$, where $R$ is the distance of the new axis from the
old one through the center of mass.

\[BONUS\] INERTIA MATRIX. Changing the orientation of the axis of
rotation is no mean feat via direct computation. Fortunately, linear
algebra comes to the rescue. The *inertia matrix* of a body in
$\mathbb{R}^{3}$ is a 3-by-3 matrix whose diagonal entries are the
moments of inertia about the $x$, $y$, $z$ axes respectively: denoted
$I_{xx},\ I_{yy},$ and $I_{zz}$:

$$\lbrack I\rbrack = \begin{bmatrix}
I_{xx} & I_{xy} & I_{xz} \\
I_{yx} & I_{yy} & I_{yz} \\
I_{zx} & I_{zy} & I_{zz}
\end{bmatrix}\ \ \ \ \ :\ \ I_{xx} = \int_{}^{}{y^{2} + z^{2}\ dM}\ \ :\ \ I_{xy} = \int_{}^{}{- xy\ dM}\ \ \ \lbrack etc\rbrack$$

(The term *inertia tensor* is more commonly used, but this is
superfluous and intimidating.) The off-diagonal terms are called *mixed
moments*, and they capture a type of asymmetry in how the mass is
distributed. This symmetric matrix serves as a coefficient matrix for
writing moment of inertia as a quadratic form: given any unit vector
$\mathbf{u}$**,** the moment of inertia $I_{uu}$ of the body about the
axis through the center of mass in the direction of $\mathbf{u}$ is
given by

$$I_{uu} = \mathbf{u}^{T}\lbrack I\rbrack\mathbf{u\ ,}$$

which makes sense given the diagonal entries of $\lbrack I\rbrack$.
This, together with the Parallel Axis Theorem allows one to compute all
possible moments of inertia.

\[BONUS\] SOLID BODY MECHANICS. Many of the most confusing topics in the
basic mechanics of solid bodies become much simpler with the addition of
linear algebra and the inertia matrix. Consider a solid body in 3-D with
inertia matrix $\lbrack I\rbrack$ rotating about some axis through its
center of mass. Given a small mass element $dM$, let the positionplana
vector $\mathbf{r}$ denote position relative to the center of mass, and
let $\mathbf{v}$ be its time derivative: the velocity vector of the mass
element. The following quantities encode the angular motion of the
rotating body.

- *Angular velocity* is a vector, $\mathbf{\omega}$, satisfying
  $\mathbf{v} = \mathbf{\omega} \times \mathbf{r}$**.**

- *Angular momentum* is a vector, $\mathbf{L}$, given by
  $\mathbf{L} = \lbrack I\rbrack\mathbf{\omega}$**.**

If you are familiar with the unusual properties of angular momentum --
how it jerks a spinning wheel away when the axis of rotation is changed
-- there is no mystery: there is only matrix-vector multiplication, with
no guarantee that $\mathbf{L}$ must be parallel to $\mathbf{\omega}$.
There is much more one can do with these vectors, such as defining
angular *torque* $\mathbf{\tau}$ and angular *kinetic energy* $K$ via:

$$\mathbf{\tau} = \frac{dL}{dt} = \lbrack I\rbrack\frac{d\mathbf{\omega}}{dt}\ \ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ K = \frac{1}{2}\mathbf{\omega}^{T}\lbrack I\rbrack\mathbf{\omega\ .}$$

The inertia matrix unlocks the ability to "see" the familiar
linear-motion formulae (torque is mass times acceleration; kinetic
energy is one-half mass times velocity-squared) in angular form.

◊ ◊ ◊ ◊ ◊ ◊ ◊ and now for a change in perspective ◊ ◊ ◊ ◊ ◊ ◊ ◊

PROBABILITY DENSITY. Mass density appeals to physical intuition and is a
clear use-case for double and triple integrals. Higher-dimensional
integrals are well-suited to applications in probability. Given a domain
$D \subset \mathbb{R}^{n},$ one speaks of choosing a point in $D$ "at
random". Whether this is done uniformly or with a bias towards certain
regions of $D$ is encoded in a probability density. A *probability
density* $\rho$ on a domain $D \subset \mathbb{R}^{n}$ is a nonegative
scalar-valued function which, like mass density, can be integrated to
obtain *probability mass*, with the condition that the total probability
mass of $D$ equals one. To determine if a point chosen at random in $D$
lies within a subset $A \subset D$, one integrates the probability
element $d\mathbb{P =}\rho\ dV$, where $dV$ is the volume element in
$\mathbb{R}^{n}$, like so:

$$\mathbb{P}\left( \mathbf{x} \in A \right) = \int_{A}^{}{d\mathbb{P}} = \int_{A}^{}{\rho\left( \mathbf{x} \right)\ dV}\ .$$

One notes that
$\mathbb{P}\left( \mathbf{x} \in D \right) = \int_{D}^{}\rho = 1$ means
that the odds of a randomly chosen point in $D$ lying in $D$ are 100%.
The additivity of integration means that one can convert logical
operations (*e.g.,* the probability that $\mathbf{x}$ lies in $A$ and
$B$ but not in $C$) to an integral over a domain governed by
set-theoretic operations (*e.g.,* integrate over the set
$(A \cap B) - C$).

Instead of thinking in terms of picking a point out of $D$ at random,
one often reasons in terms of the individual coordinates,
$x_{1},\ \ldots,\ x_{n}$, as random variables, here written with capital
letters $X_{1},\ldots,\ X_{n}$. The probability density is then referred
to as a *joint density*, relating the different random variables. This
is particularly useful in applications involving multiple dependent or
independent random processes. Again, one converts questions such as
"*What is the probability that* $X_{1} \leq X_{2}$" to the appropriate
domain over which to integrate.

Any joint probability density on a domain $D \subset \mathbb{R}^{n}$ can
be converted to a single-variable density via a process called
*marginalization*. Via the Fubini Theorem, one can integrate out all the
variables but one:

$$\rho_{X_{i}}\left( x_{i} \right) = \int_{D}^{}{\rho\ dx_{1}\ dx_{2}\cdots dx_{i - 1}\ dx_{i + 1}\cdots dx_{n}}\ .$$

One can check (Fubini again) that this is a probability density. This
yields a notion of independence of random variables -- a very useful
idea in Statistics. One says that two random variables $X$ and $Y$ are
*independent* if their joint probability density is the product of their
marginalizations:

$$\rho(x,y) = \ \rho_{X}(x)\rho_{Y}(y)\ .$$

This connotes the idea that determining probabilities on the random
variable $X$ does not depend on the value of the random variable $Y$.

EXPECTATION AND VARIANCE : The analogy between mass and probability
density continues to other features. We begin recollecting the
single-variable case. The generalization of the center of mass,
$\overline{x}$, to a random variable $X$ is called the *expectation*
$\mathbb{E(}X)$ or *mean*: it is simply the average of $x\ $with respect
to probability density. The generalization of moment of inertia about
the center of mass is called the *variance* of a random variable
$\mathbb{V(}X)$, and its square root (the analogue of radius of
gyration) is the familiar *standard deviation* $\sigma_{X}$:

$$\mathbb{E}(X) = \int_{}^{}{x\ d\mathbb{P}}\mathbb{\ \ \ \ \ :\ \ \ \ V}(X) = \int_{}^{}{\left( x - \mathbb{E}(X) \right)^{2}d\mathbb{P}}\ \ \ \ \ :\ \ \ \ \sigma_{X} = \sqrt{\mathbb{V}(X)}\ .$$

All of the ideas of single-variable probability extend quickly to the
multivariate case with a joint density $\rho$: the formulae for
$\mathbb{E,\ V,\ }$and $\sigma$ above can be used with a marginalized
probability density $\rho_{X}$. Or, as preferred, one can speak of the
full expectation $\mathbb{E}$ with coordinates
$\mathbb{E}\left( X_{i} \right)$, much as one speaks of the coordinates
of a center of mass.

These ideas -- besides being interesting on their own -- unlock a number
of avenues for exploration which must await a full probability course to
unwind. In short, one can define a covariance -- a degree of
non-independence -- of two random variables as follows:

$$cov(X,Y) = \ \int_{}^{}{\left( x - \mathbb{E}(X) \right)\left( y - \mathbb{E}(Y) \right)\ d\mathbb{P}}\ .$$

The covariance vanishes whenever $X$ and $Y\ $are independent. The
familiar concept of *correlation* in Statistics is a rescaled
covariance: $cor(X,Y) = cov(X,Y)/\sigma_{X}\sigma_{Y}$.

\[BONUS\] COVARIANCE MATRIX. In the same manner that one uses an inertia
matrix to encode the distribution of mass rotating about all possible
axes through the center of mass, there is a matrix for keeping track of
variances and covariances between all variables. The *covariance matrix*
is of immense importance in data science, machine learning, statistics,
and much more:

$$\left\lbrack \mathbb{V} \right\rbrack = \begin{bmatrix}
\mathbb{V(}X_{1}) & cov(X_{1},X_{2}) & \cdots & cov(X_{1},X_{n}) \\
cov(X_{2},X_{1}) & \mathbb{V(}X_{2}) & \cdots & cov(X_{2},X_{n}) \\
 \vdots & \vdots & \ddots & \vdots \\
cov(X_{n},X_{1}) & cov(X_{n},X_{2}) & \cdots & \mathbb{V(}X_{n})
\end{bmatrix}\ .$$

This matrix encodes all of the interdependencies between the random
variables while enabling matrix algebra in computations. For example,
when changing from one set of random variables
$\mathbf{X} = \left( X_{1},\ldots,X_{n} \right)^{T}$ to another set
$\mathbf{Y} = \left( Y_{1},\ldots Y_{m} \right)^{T}$ by means of a
linear transformation $A$, so that $\mathbf{Y} = A\mathbf{X}$, the
expectation and covariance transform linearly and quadratically
respectively:

$$\mathbb{E}_{Y} = A\mathbb{E}_{X}\ \ \ \ \ :\ \ \ \ \ \left\lbrack \mathbb{V}_{Y} \right\rbrack = A^{T}\left\lbrack \mathbb{V}_{X} \right\rbrack A\ .$$

Such is very useful in data science (see the Week 11 bonus material),
but it does exceed the bounds of a Calculus course.

### DISCUSSION

> These questions are divided into the physical & probabilistic

[QUESTION 1:]{.underline} Consider a solid uniform-density cylinder of
radius $r$ and height $h$, arranged so that the centroid is at the
origin and the height is aligned with the $z$-axis. Now imagine that you
slice off the top of this object with a tilted plane \[*draw figure*\].
Which coordinates of the centroid of the remaining solid object *must*
have changed*?*

*This is a good visualization challenge, and a reminder of symmetry
even/odd principles.*

[QUESTION 2:]{.underline} Find the centroid of the following shape: take
a solid cube of side-length four centered at the origin, remove a
concentric solid ball of radius two, then add a solid cylinder of radius
one along the $z$-axis, as $- 3 \leq z \leq 3$.

This is a strange problem that would be impossible to compute "by hand"
without using symmetry. What if anything changes when the density
switches from constant to $x^{2} + y^{2} + z^{2}$?

[QUESTION 3:]{.underline} Compute the moment of inertia of a solid cube
centered at the origin and rotated about any of the three principal axes
through the centroid with density function $f = x^{2} + y^{2} + z^{2}$.

This can and should be done explicitly, noting why the choice of axis
does not matter. Would things change if the axis was a skew axis through
the center? It makes no difference for a constant density, as per the
videos, but here?

[QUESTION 4:]{.underline} If you know that the disc of radius $R$ in the
plane has polar inertia (rotated about the center point)
$I_{0} = MR^{2}/2$, then what else can you do? Can you compute the
moment of inertia of a cylindrical shell? A solid cylinder? A solid
cone?

The cone is the key example. If you think of the moment of inertia of a
slice disc orthogonal to the rotation axis, then the general case of a
solid of revolution about that axis reveals itself. This can lead to all
sorts of interesting follow-up examples.

[QUESTION 5:]{.underline} Here are some facts: for constant-density
spherical balls (solid) and spherical shells (hollow) of radius $R$ and
total mass $M$, the moments of inertia are

$I_{b} = \frac{2}{5}MR^{2}$ versus $I_{s} = \frac{2}{3}MR^{2}$

How would you in practice compute these integrals? Try setting them up.

Begin with the solid ball. Students may be frustrated that this integral
seems so hard to compute when the answer is so clean. This foreshadows
coordinate changes in Week 11. As for the spherical shell, how would one
compute that? It needs a surface integral -- an integral with respect to
the surface area element. One learned how to compute this for a surface
of revolution (which this is) in single-variable calculus; but see the
next problem for a better way.

Now think about this physically -- the solid ball and the spherical
shell. Is it harder to rotate the solid ball or the shell? Can we even
compare? Are the units different? What depends on $M$?

[QUESTION 6:]{.underline} Recall from the previous problem the moments
of inertia for a uniform density solid ball versus a shell. Use
additivity of integrals to compute this moment of inertia.

Let $\epsilon > 0$ be a small number and consider removing the ball of
radius $R - \epsilon$ from the ball of radius $R$; computing the moment
of inertia as a difference; then taking a limit as
$x \rightarrow 0^{+}$. Wait a minute... this seems to give the wrong
answer since the difference in moments of inertia is

$$\frac{2}{5}MR^{2} - \frac{2}{5}M(R - \epsilon)^{2} = \frac{4}{5}MR\epsilon + \frac{2}{5}M\epsilon^{2}$$

That cannot be right! (It is not, in fact, right: why not? Ah, the $M$).
Replacing the mass with volume (assuming unit density) gives,

$$I_{s} = \frac{2}{5}\left( \frac{4}{3}\pi \right)\left( R^{5} - (R - \epsilon)^{5} \right) = \frac{8}{3}\pi R^{4}\epsilon + O\left( \epsilon^{2} \right) = \frac{2}{3}\left( 4\pi R^{2}\epsilon \right)R^{2} + O\left( \epsilon^{2} \right)\ .$$

Taking the limit as $\epsilon \rightarrow 0^{+}$ gives the desired
result, using the mass (volume) of the thin shell.

[QUESTION 7:]{.underline} For a solid spherical ball of radius $R$ and
mass $M$, the moment of inertia is $I_{b} = 2MR^{2}/5$. How could you
compute the moment of inertia of a hemi-spherical ball of uniform
density? Does it depend on the orientation of the axis (assumed to again
pass through the origin or center of the solid ball).

Sometimes it is surprising how much discussion this question can
generate. Ask probing questions about what happens when you put two
half-balls together and rotate that helps...

For a surprise, tell students that taking a wedge of this ball of angle
$\theta$ and rotating that about the axis/edge through the center of the
ball gives $I = 2MR^{2}/5$. This is a good chance to remind students
about what happens when you normalize by mass...

[QUESTION 8:]{.underline} Compute the moment of inertia of a unit
density solid cube of side length $s$ in $\mathbb{R}^{n}$ through the
centroid (using an axis-aligned rotation axis, of course).

This is good practice at working out the $r^{2}$ term, as students tend
to want to memorize a formula.

[QUESTION 9:]{.underline} \[OPTIONAL: inertia matrices\] For an
asymmetric rectangular axis-aligned prism, which axis through the
centroid maximizes or minimizes the moment of inertia?

This is a difficult though great problem, since the function to be
optimized is a quadratic form
$Q(x) = \mathbf{x}^{T}\lbrack I\rbrack\mathbf{x}$, with
$|\mathbf{x}| = 1$. The major and minor principal axes appear naturally.
What happens with the intermediate axis? Clever students can see that it
gives a natural example of a saddle point. At this point, one could lead
a tangential discussion of the classic Intermediate Axis Theorem by
acquiring a physical prismatic solid and rotating it in the air,
explaining the relation to stability. This is far from the main
storyline however.

[QUESTION 10:]{.underline} \[OPTIONAL: inertia matrices\] Compute the
2-by-2 case of an inertia matrix in 2-D for a rectangular plate of width
$w$ and height $h$. Then use this to rotate it about the diagonal.

Ex post, one should compare to the degenerate 3-D example from the video
for consistency.

[QUESTION 11:]{.underline} \[OPTIONAL: inertia matrices / angular
momentum\] In the case of a rectangular prism of side lengths
$a,\ b,\ c,$ what is the angular momentum if the angular velocity is
$\omega = (a,b,c)^{T}$?

[QUESTION 12:]{.underline} Recall the exponential probability density:
$\rho = \alpha e^{- \alpha x}$ defined on the domain
$D = \left\{ x \geq 0 \right\}.$ What is its expectation and variance?

This is meant as a review of 1-D probability, which many students may
need.

[QUESTION 13]{.underline}: Say
$D = \{ 0 \leq x \leq 3,\ 0 \leq y \leq 1\}$ and
$\rho = C(x^{2} + 2y^{2})$. What value of $C$ makes this a probability
density on $D$? What is the probability that $x \leq y$ for a randomly
chosen point $(x,y)$? How does this translate to a statement about the
random variables $X$ and $Y$? What are the integrals to compute
$\mathbb{E}$ and $\lbrack\mathbb{V}\rbrack$?

This is a straightforward but important problem, as many students
struggle with the basics of multivariate probability. In practice,
computing the expectation and variances (especially) are too involved to
do in class (or on a quiz). Setting them up is about as far as is
practical except for very simple cases.

[QUESTION 14:]{.underline} Consider the joint independent density:
$\rho = \Pi_{i}\alpha_{i}e^{- \alpha_{i}x_{i}}$ on $n$ random variables
$\{ X_{i}\}$. Say that this models a set of wait times for $n$ people
waiting in $n$ queues. Set up (and perhaps solve if you can?) the
following probabilities:

- Person 1 is done before person 2. *(In other words,*
  $X_{1} < X_{2}$*)*

- $X_{1} \leq X_{2}$ and all $X_{i} \leq 1$. *(Translate this &
  following into statements...)*

- $X_{1} \leq X_{i}$ for all $i$. *(This is very challenging!)*

- ($X_{1} \geq 1$ and $X_{2} \leq X_{3}$) or ($X_{1} \leq 1/2$ and
  $X_{2} \geq X_{3}$) *(or similar complex conditions)*

The goal of such convoluted examples is to suggest that in probability,
the logical or complex constraints can be translated into (the Boolean
algebra of) geometric domains of integration. It is possible to spin off
an interesting discussion of Boolean algebra/logic here.

[QUESTION 15:]{.underline} Given a joint pdf on two variables, what is
the difference between the probability $\mathbb{P(}X_{1} \leq X_{2})$
and the probability $\mathbb{P(}X_{1} < X_{2})$?

If this causes confusion, then perhaps back up to the 1-D question of
what is $\mathbb{P(}X = c)$ for any constant $C$? Encourage students to
set this up as an integral. The idea that the probability \[mass\] of
any single point vanishes but the probability \[mass\] of a domain is
positive is a key idea in integral Calculus.

[QUESTION 16:]{.underline} \[OPTIONAL : covariance matrices\] What is
the covariance matrix for the sums of three independent random variables
$(X,Y,Z)$ into sums of the form $(X + Y,Y + Z,X + Z)$?

Recall from the lectures that
$\mathbb{\lbrack V}(AX)\rbrack = A\lbrack\mathbb{V\rbrack}A^{T}$ for a
linear transformation $A$. What happens with more variables? Is there a
recognizable pattern?

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the bounded region $D$ cut out by the
graphs of $y = x^{1/3}$ and $y = x^{2}$.

A\) Compute $\overline{y}$, the $y$-coordinate of the centroid of $D$.

B\) Does your answer make sense? Why or why not?

[PROBLEM 2.]{.underline} Consider the bounded region $D$ in the plane
given by the inequalities $0 \leq y \leq \sqrt{x}\ $ and
$y \geq x^{3} \geq 0$. If the density of this domain $D$ is given by
$\rho(x,y) = x^{- 1/2}$, then compute $\overline{x}$, the $x$-coordinate
of the center of mass of $D$.

[PROBLEM 3.]{.underline} Consider the domain $D$ in the plane defined
logically by:

$$\left( |x| \leq 3\ \ and\ \ |y| \leq 5 \right)\ \ but\ not\ \ \left( |x| \leq 1\ \ and\ \ |y| \leq 3 \right)\ .$$

This looks like a 6-x-10 rectangle at the origin with a 2-x-6 rectangle
removed.

A\) Compute the mass of this object if the density is
$\rho(x,y) = x^{2} + y^{2}$.

B\) Where is the center of mass of this object?

[PROBLEM 4.]{.underline} Consider a rectangle given by $0 \leq x \leq A$
and $0 \leq y \leq B$ with density $\rho(x,y) = xy.\ $

A\) Compute the mass, $M$, of this object.

B\) Compute the moment of inertia $I$ of this object rotated about the
$y$-axis.

C\) Rewrite your answer to (B) using the mass: your answer should be of
the form $I = CMA^{2}$ for some constant $C$. What is it?

[PROBLEM 5.]{.underline} Consider a rectangular prism of *unit-density*
that has one corner at the origin and edges along the $x,\ y,\ $and $z$
axes. The length of this object along the $y$ and $z$ axes are $L$ and
the length along the $x$ axis is $2L.\ $

A\) What is the mass, $M$, of this object?

B\) Compute the moment of inertia $I$ of this object rotated *about the*
$y$*-axis*, by setting up and solving a triple integral.

C\) Rewrite your answer to (B) using the mass: your answer should be of
the form $I = CML^{2}$ for some constant $C$. What is it?

D\) Set up but ***do not solve*** an integral to compute $\overline{y}$,
the $y$-coordinate of the centroid (or center of mass : same thing in
this case).

[PROBLEM 6.]{.underline} Fact: the moment of inertia of a flat
unit-density circular disc of radius $r$ in the $(x,y)$ plane about the
$z$-axis equals $I = \pi r^{4}/2$.

A\) Use this fact to compute the moment of inertia of a solid
unit-density ball of radius $R$ at the origin, rotating about the
$z$-axis. (*Hint: slice orthogonally to the axis of rotation and
integrate*...)

B\) Rewrite your answer to (A) in terms of the mass $M$ of the solid
ball.

[PROBLEM 7.]{.underline} Recall that the moment of inertia of a
constant-density ball of radius $R$ through its center equals
$I_{0} = \frac{2}{5}MR^{2}.$

A\) What is the radius of gyration $R_{g}$ of this rotating solid ball?

B\) Use the Parallel Axis Theorem to compute the moment of inertia
$I_{g}$ of this ball rotated about a parallel axis that is a distance
$R_{g}$ from the center -- rotating the ball about the gyration axis.
Write your answer as $I_{g} = CMR^{2}$ for some $C$.

[PROBLEM 8.]{.underline} Consider a cube of side length $L$ that has one
corner at the origin and edges along the positive $x,\ y,\ $and $z$
axes. The cube has density $\rho = x$.

A\) What is the mass, $M$, of this cube?

B\) Compute the moment of inertia $I$ of this cube rotated about the
$z$-axis, by setting up and solving a triple integral.

C\) Compute the radius of gyration of this rotating cube, based on (A)
and (B).

[PROBLEM 9.]{.underline} Consider a unit density triangle in the plane
with vertices at $(0,0)$, $(2,0)$, and $(0,4)$.

A\) What is the mass, $M$, of this triangle?

B\) Compute the moment of inertia $I$ of this triangle rotated about the
$y$-axis.

C\) Without doing more computations, would the moment of inertia of this
triangle about the $x$-axis be greater than, less than, or the same as
that about the $y$-axis from part (B). Explain.

[PROBLEM 10.]{.underline} Consider a joint probability density function
on two variables, $X$ and $Y$, of the form
$\rho(x,y) = C(x^{2} + y^{3})$, where the domain $D$ is defined by
$0 \leq x \leq 2$ and $0 \leq y \leq 1$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) What is the probability that $X \geq Y$?

C\) Does your answer to part (B) make sense? Why or why not?

[PROBLEM 11.]{.underline} Consider a joint probability density function
on two random variables, $X$ and $Y$, of the form
$\rho(x,y) = C(x^{2}y + y^{2})$, where the domain $D$ is defined by
$- 1 \leq x \leq 1$ and $0 \leq y \leq 1$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) What is the probability that $X \leq 0$ and $Y \geq 1/2$?

[PROBLEM 12.]{.underline} Consider a joint probability density function
on two variables, $X$ and $Y$, of the form $\rho(x,y) = C(xy)$, where
the domain $D$ is defined by $0 \leq x \leq L$ and $0 \leq y \leq L$ for
some constant $L > 0$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) What is the probability that $X + Y \leq L$?

[PROBLEM 13.]{.underline} Consider the domain $D$ given by
$0 \leq x \leq 1$ and $0 \leq y \leq 4$, with joint probability density
on random variables $X,\ Y$ on $D$ given by

$$\rho(x,y) = \frac{9}{16}x^{2}\sqrt{y}$$

A\) Set up and compute the integral to find $\mathbb{E(}X)$, the
expected value of $X$.

B\) Does your answer to part (A) make sense?

[PROBLEM 14.]{.underline} Two stocks have performance modelled as random
variables, $X$ and $Y$, taking values in the interval
$\lbrack - 1,1\rbrack$. Their joint probability density is:

$$\rho(x,y) = \ C(2 - x^{2} - y^{4})$$

for $- 1 \leq x \leq 1,\  - 1 \leq y \leq 1$ and $C > 0$ some constant.

A\) What must the value of the constant $C$ be for $\rho$ to be a
probability density?

B\) Compute $\mathbb{P(}X > Y)$, the probability that stock $X$ performs
better than $Y$.

[PROBLEM 15.]{.underline} Consider the following joint probability
density function for random variables $X$ and $Y$ on the domain
$0 \leq x \leq 2$ and $1 \leq y < \infty$ in $\mathbb{R}^{2}$:

$$\rho = \frac{Cx^{2}}{y^{3}}\ $$

A\) For what value of constant $C$ is this a probability density?

B\) Using this value of $C$ for $\rho$, compute the probability that
$X \leq 1$ and $Y \geq 2$.

[PROBLEM 16.]{.underline} Consider the probability density function

$$\rho(x,y) = C(x^{2} + y^{2})$$

on the domain given by $0 \leq x \leq 1$ and $0 \leq y \leq 2x$.

A\) For what value of constant $C$ is this a probability density
function?

B\) Compute the probability that a randomly chosen point $(x,y)$ on this
domain with this probability density satisfies $x \geq \frac{1}{2}$ *.*

C\) Compute the marginal density function $\rho_{Y}(y)$.

[PROBLEM 17.]{.underline} Consider a joint probability density function
of the form $\rho(x,y,z) = C\left( x^{2} + y^{3} + z \right)\ $on a
domain $D$ defined by $0 \leq x \leq 2$, $0 \leq y \leq 1$, and
$0 \leq z \leq 3$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) What is the probability that a randomly chosen point in $D$ with
respect to this probability density satisfies $x \geq y$?

C\) What is the probability that $x \leq y$?

[PROBLEM 18.]{.underline} Consider a uniform probability density
function $\rho(x,y) = C$, on two variables, $X$ and $Y$, where the
domain $D$ is defined by $x/2 \leq y \leq \sqrt{2x}$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) Compute the marginal densities $\rho_{X}(x)$ and $\rho_{Y}(y)$. What
are their domains? Can you explain why these marginalized densities are
not uniform?

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} $\overline{y} = \frac{12}{25}$

[PROBLEM 2.]{.underline} $\overline{x} = \frac{7}{18}$

[PROBLEM 3.]{.underline} A) $M = 680 - 40 = 640$ ; B) at the origin,
thanks to symmetry

[PROBLEM 4.]{.underline} A) $M = \frac{A^{2}B^{2}}{4}$ ; B)
$I = \frac{A^{4}B^{2}}{8}$ ; C) $C = \frac{A^{2}}{2}$

[PROBLEM 5.]{.underline} A) $M = 2L^{3}$ ; B) $I = \frac{10L^{5}}{3}$ ;
C) $C = \frac{5}{3}$

[PROBLEM 6.]{.underline} $I = \frac{8}{15}\pi R^{5} = \frac{2}{5}MR^{2}$

[PROBLEM 7.]{.underline} A) $R_{g} = \sqrt{\frac{2}{5}}R$ ; B)
$I_{g} = \frac{2}{5}MR^{2} + \frac{2}{5}MR^{2} \Rightarrow C = \frac{4}{5}$

[PROBLEM 8.]{.underline} A) $M = \frac{L^{4}}{2}$ ; B)
$I = \frac{5L^{6}}{12}$ ; C) $R_{g} = L\sqrt{5/6}$

[PROBLEM 9.]{.underline} A) $M = 4$ ; B) $I = \frac{8}{3}$ ; C) larger
$I$ to rotate about the $x$-axis

[PROBLEM 10.]{.underline} A) $C = \frac{6}{19}$ ; B)
$\mathbb{P =}C\left( \frac{37}{12} - \frac{1}{5} \right) = C\frac{173}{60} = \frac{173}{190}\ $

[PROBLEM 11.]{.underline} A) $C = 1$ :-) B) $\mathbb{P} = 5/12$

[PROBLEM 12.]{.underline} A) $C = \frac{4}{L^{4}}$ ; B)
$\mathbb{P =}\frac{CL^{4}}{24}\  = \frac{1}{6}$

[PROBLEM 13.]{.underline} $\overline{x} = \frac{3}{4}$ and
$\overline{y} = \frac{12}{5}$

[PROBLEM 14.]{.underline} A) $C = \frac{15}{88}$ ; B)
$\mathbb{P =}C\left( \frac{44}{15} \right) = \frac{1}{2}$

[PROBLEM 15.]{.underline} A) $C = \frac{3}{4}$ ; B)
$\mathbb{P =}\frac{1}{32}$

[PROBLEM 16.]{.underline} A) $C = \frac{6}{7}$; B)
$\mathbb{P =}\frac{15}{16}$ ; C)
$\rho_{Y}(y) = \frac{2}{7} + \frac{6}{7}y^{2} - \frac{13}{28}y^{3}$

[PROBLEM 17.]{.underline} A) $C = \frac{2}{37}$ ; B)
$\mathbb{P =}\frac{154}{185}$ ; C) $\mathbb{P =}\frac{31}{185}$

[PROBLEM 18.]{.underline} A) $C = \frac{3}{16}$ ; B)
$\rho_{X} = \sqrt{2x} - x$ is on $\lbrack 0,8\rbrack$ ;
$\rho_{Y} = 2y - \frac{y^{2}}{2}$ is on $\lbrack 0,4\rbrack$.

## 

## Week 11 : Changing Coordinates

### OUTLINE

MATERIALS: Calculus BLUE : Vol 3 : Chapters 13-18

TOPICS:

- Polar and cylindrical coordinates: notation, area/volume form

- Gaussians and their applications

- Spherical coordinates: notation, volume form, applications

- Arbitrary coordinate changes

- The Change of Variables Theorem and its uses

- Methods for choosing coordinates

- Surface area and surface integrals

- BONUS : High-dimensional spheres and balls

- BONUS : Gaussians and the Kalman filter in data science

LEARNING OBJECTIVES:

- Demonstrate proper use of polar/cylindrical coordinates

- Demonstrate proper use of spherical coordinates

- Apply the volume elements for cylindrical/spherical coordinates

- Distinguish when to use cylindrical versus spherical coordinates

- Use the Change of Variables Theorem to transform integrals

- Discern the proper coordinate change to transform integrals

- Compute surface area via the surface area element

- Set up and compute surface integrals for implicit/parametrized
  surfaces

### PRIMER

Many of the applications of integrals from the previous week or two lead
quickly to seemingly impossible integration problems. The goal for this
week is understanding and using one integration technique that is
particularly helpful. In single variable calculus, you probably called
this *u-substitution*.

POLAR & CYLINDRICAL COORDINATES. For a simple example, consider the
polar coordinate transformation that exchanges Euclidean $(x,y)$ and
polar $(r,\theta)$ via:

$$P\begin{pmatrix}
r \\
\theta
\end{pmatrix} = \begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
r\cos\theta \\
r\sin\theta
\end{pmatrix}$$

What does this do to area? Perhaps you recall tricks in single-variable
calculus. The better approach is direct conversion of the area element:
$dA = dx\ dy = r\ dr\ d\theta$. Where does the $r$ come from? Often,
units are invoked to justify the additional $r$ paired with $d\theta$.
Examining the geometry of polar coordinates also helps. In 3-D, adding
the usual $z$-axis yields cylindrical coordinates, with corresponding
volume element $dV = dx\ dy\ dz = r\ dr\ d\theta\ dz$. For domains that
are well-suited to either of these coordinate systems, integrals can be
done much more simply.

SPHERICAL COORDINATES. Cylindrical coordinates are not the only useful
system available. Spherical coordinates come in competing notational
variants: in this course: $\rho$ is used for the radial coordinate;
$\theta$ is the usual polar angle from $0\ldots 2\pi$; and
$0 \leq \phi \leq \pi$ is the angle from the positive $z$-axis. The
corresponding spherical coordinate transformation is:

$$S\begin{pmatrix}
\rho \\
\theta \\
\phi
\end{pmatrix} = \begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
\rho\cos\theta\sin\phi \\
\rho\sin\theta\sin\phi \\
\rho\cos\phi
\end{pmatrix}$$

What does this do to volume? This is not so obvious. In spherical
coordinates, the volume element transforms as
$dV = dx\ dy\ dz = \rho^{2}\sin\phi d\rho\ d\phi\ d\theta$. Arguing via
geometry is much more difficult; arguing via units explains only the
$\rho^{2}$ term. What one cannot argue with is how singularly effective
spherical coordinates can be at solving very difficult integration
problems adapted to spherical geometry.

CHANGE of VARIABLES THEOREM. The answer to what happens in cylindrical
and spherical coordinates comes from a fundamental result. Given a
change of coordinates on $\mathbb{R}^{n}$ from
$\mathbf{u} = (u_{1},\ldots,\ u_{n})$ to
$\mathbf{x} = (x_{1},\ldots,\ x_{n})$ via $\mathbf{u} = F(\mathbf{x})$,
the volume element transforms as

$$d\mathbf{u} = \ du_{1}\ du_{2}\cdots du_{n} = \left| \det\lbrack DF\rbrack \right|\ dx_{1}dx_{2}\ldots dx_{n} = \left| \det\lbrack DF\rbrack \right|\ d\mathbf{x}$$

This is the essential part of the Change of Variables Theorem, and it is
the true multivariate version of the "u-subs" method learned in
elementary integration.

After verifying what happens with cylindrical and spherical coordinates,
this deep result is worth contemplation. Recall from Week 4 how we
interpreted the determinant as the change-in-volume induced by a linear
transformation? Recall from Weeks 5-7 how the derivative
$\lbrack DF\rbrack$ is a linear transformation and the linear
approximation to the nonlinear $F$? The Change of Variables Theorem ties
together so much of what has been learned this semester to give an
integration technique for difficult problems.

SURFACE INTEGRALS. There are several instances in which one might want
to compute surface area of a curved surface in 3-D. Instead of a volume
element, there is a surface area element which can be integrated. For
example, based on what we know of the volume element in spherical
coordinates, we can fix a sphere of radius $R$ and reduce the spherical
volume element to $R^{2}\sin\phi\ d\phi\ d\theta$.

Given a parametrized surface defined by
$G:\mathbb{R}^{2} \rightarrow \mathbb{R}^{3}$, the two column vectors of
the derivative $\lbrack DG\rbrack$ span an infinitesimal parallelogram
(the *surface area element* $d\sigma$) on the tangent plane to the
surface. Recall from Week 2 that the area of a parallelogram in 3-D
equals the length of the cross product of the spanning vectors. This
gives an effective formula for the surface area element:

$$d\sigma = \left| \frac{\partial G}{\partial s} \times \frac{\partial G}{\partial t\ } \right|ds\ dt\ .$$

In the case of an implicitly defined surface of the form $z = z(x,y)$,
one shows that the above formula specializes to

$$d\sigma = \sqrt{1 + \left( \frac{\partial z}{\partial x} \right)^{2} + \left( \frac{\partial z}{\partial y} \right)^{2}}\ dx\ dy\ .$$

The surface area element can be integrated to compute surface area; or
it can be used in *surface integrals* to compute centroids, moments, and
other features of surfaces in 3-D.

Both formulae above are unsatisfying, as they are explicitly
three-dimensional, relying on constructs like the cross product. A much
more general formula for the surface area element exists for
$G:\mathbb{R}^{2} \rightarrow \mathbb{R}^{n}$ which reveals the deep
connection to the Change of Variables Theorem:

$$d\sigma = \sqrt{\det\left( \lbrack DG\rbrack^{T}\lbrack DG\rbrack \right)}\ ds\ dt\ .$$

\[BONUS\] GAUSSIANS & DATA FUSION. The most iconic probability density
is a *Gaussian*. The standard zero-mean Gaussian on $n$ variables with
unit variances and all variables independent (pairwise covariances
vanish) is given by

$$g\left( \mathbf{x} \right) = \frac{1}{\sqrt{(2\pi)^{n}}}e^{- \frac{1}{2}\mathbf{x \cdot x}}\mathbf{\ ,}$$

which one can show to be of unit total mass on $\mathbb{R}^{n}$ by a
combination of Fubini and polar coordinates in the $n = 2$ case. It is
easy and elegant to change the mean to $\mathbb{E \in}\mathbb{R}^{n}$
and all the variances and covariances by using the covariance matrix
$\mathbb{\lbrack V\rbrack}$ from Week 10, obtaining the following
general Gaussian:

$$g\left( \mathbf{x} \right) = \frac{1}{\sqrt{(2\pi)^{n}\det\mathbb{\lbrack V\rbrack}}}e^{- \frac{1}{2}\left( \mathbf{x -}\mathbb{E} \right)^{\mathbf{T}}\left\lbrack \mathbb{V} \right\rbrack^{\mathbf{- 1}}\mathbf{(x -}\mathbb{E)}}\mathbf{\ .}$$

This is not a formula one memorizes, but general Gaussians are central
to modern applications in data science. Consider, *e.g.*, data filtering
and fusion, in which one estimates $n$ variables
$x_{1},x_{2},\ldots,\ x_{n}$ with a temporal evolution model and updates
at various times. Instead of trying to track the exact values of the
$x_{i}$, one estimates the probability density of these random
variables. Using a Gaussian is very efficient, as one need only keep
track of $\mathbb{E}$ and $\mathbb{\lbrack V\rbrack}$. A classic example
of data fusion is the *Kalman filter*, which consists of three steps:
(1) a model step, which predicts the next state based on the present
state, $\mathbf{y} = F(\mathbf{x})$; (2) a measurement step, which
estimates the mean and covariance matrix; and (3) a fusion step, which
takes the predicted and measured densities and multiplies them together.
Using the fact that the product of two general Gaussians is again (up to
rescaling) a generalized Gaussian, one has the following formula for the
Kalman filter acting on initial mean and covariance
$\left( \mathbb{E}_{0}\ ,\ \left\lbrack \mathbb{V}_{0} \right\rbrack \right)$
and measured mean and covariance
$\left( \mathbb{E}_{m}\ ,\ \left\lbrack \mathbb{V}_{m} \right\rbrack \right)$:

$$predicted\ \ :\ \mathbb{E}_{p} = F\left( \mathbb{E}_{0} \right)\ \ :\ \ \left\lbrack \mathbb{V}_{p} \right\rbrack = \lbrack DF\rbrack\left\lbrack \mathbb{V}_{0} \right\rbrack\lbrack DF\rbrack^{T}$$

$$fused\ mean\ \ :\ \ \mathbb{E}_{f} = \ \left\lbrack \mathbb{V}_{m} \right\rbrack\left( \left\lbrack \mathbb{V}_{p} \right\rbrack + \left\lbrack \mathbb{V}_{m} \right\rbrack \right)^{- 1}\mathbb{E}_{p}\  + \ \left\lbrack \mathbb{V}_{m} \right\rbrack\left( \left\lbrack \mathbb{V}_{p} \right\rbrack + \left\lbrack \mathbb{V}_{m} \right\rbrack \right)^{- 1}\mathbb{E}_{m}$$

$$fused\ covariance\ \ :\ \left\lbrack \mathbb{V}_{f} \right\rbrack = \left\lbrack \mathbb{V}_{m} \right\rbrack\left( \left\lbrack \mathbb{V}_{p} \right\rbrack + \left\lbrack \mathbb{V}_{m} \right\rbrack \right)^{- 1}\left\lbrack \mathbb{V}_{p} \right\rbrack$$

This looks complicated, but it is only a little beyond the bounds of
this course, and it is very much at the heart of modern applications in
control, estimation, and modelling.

\[BONUS\] THE GEOMETRY of DATA. Most students know the statistics of the
1-D Gaussian: the mass within 1, 2, and 3 standard deviations is \~68%,
\~95%, and \>99% respectively. This is *not* the case for a standard
higher-dimensional Gaussian, as can be seen with a little effort in 2-D
using integration and polar coordinates. Working in higher dimensions
requires a higher-dimensional version of spherical coordinates, which
has its own uses and fascinations. The answer to "*Where is the mass in
a Gaussian?*" takes an extensive detour into the geometry of balls and
spheres. With a lot of detailed work involving high-dimensional
spherical coordinates and the classic gamma function $\Gamma(z)$ from
single-variable calculus, we have the following formulae for the volume
of the radius $R$ ball $B_{n}(R)$ and the surface volume of its boundary
sphere $S_{n}(R)$ in $\mathbb{R}^{n}$:

$${vol}_{n}\left( B_{n}(R) \right) = \frac{2\pi^{\frac{n}{2}}R^{n}}{n\Gamma\left( \frac{n}{2} \right)}\ \ \ \ \ \ :\ \ \ \ \ {vol}_{n - 1}\left( S_{n}(R) \right) = \frac{2\pi^{\frac{n}{2}}R^{n - 1}}{\Gamma\left( \frac{n}{2} \right)}\ .$$

What is interesting about this is that the gamma function -- recall it
is a proxy for the factorial -- grows very rapidly, so that for any
fixed radius, the volumes are going to zero rapidly in dimension. This
means that, in the context of statistics, the probability of being
within distance $R$ of the mean of a standard Gaussian goes rapidly to
zero as the dimension increases. No matter how many standard deviations
away from the mean one looks, there is hardly any mass there in
sufficiently large dimensions.

This is a disquieting result, one of many such in high-dimensional data.
Resolving this paradox must await a more advanced course, in which one
can prove that a unit-variance Gaussian in dimension $n$ has most of its
mass accumulated near a sphere about the mean of radius $\sqrt{n}$.

### DISCUSSION

> This week's questions should echo mass/probability concepts...

[QUESTION 1:]{.underline} Where is the centroid of a solid
uniform-density hemisphere?

This is a good opening question. Get students to think in terms of the
right coordinate system. Where should the origin be? Where should the
axes be situated? What does symmetry tell you?

[QUESTION 2:]{.underline} Consider the uniform probability density on
the unit hemisphere with $z \geq 0$. What is this density? What is the
expectation $\mathbb{E}$ of the vector of random variables $(X,Y,Z)$?
(If Problem 1 has been done, then this is worth asking!) Think about the
expectation of the variable $Z$, $\mathbb{E(}Z)$, then, consider what
happens with the expected valued of the spherical coordinates. Of
course, $\mathbb{E(}\rho)$ makes sense, but what is it? Is it the same
as $\mathbb{E(}Z)$?

Students may guess that since the centroid is along the z-axis where
$\phi = 0$, the $\mathbb{E}(\rho)\mathbb{= E(}Z)$; however, this does
not hold, as can be seen by comparing the integrals of $\sin\phi\ d\phi$
and $\cos\phi\sin\phi\ d\phi$. This is a good chance to rethink level
sets where $z$ is constant (which have the largest mass?) and where
$\rho\ $is constant (now, which have the largest mass?). One can and
should spend a long time on this problem.

[QUESTION 3:]{.underline} Derive the results from last week's discussion
about the moment of inertia of a uniform density ball/shell of radius
$R$ rotated about the centroid:

$$I_{b} = \frac{2}{5}MR^{2}\ \ \ \ \ :\ \ \ \ I_{s} = \frac{2}{3}MR^{2}$$

That is by no means an easy problem -- the integrals require remembering
some trig formulae. However, it is clear that setting up these integrals
using spherical coordinates is helpful.

[QUESTION 4:]{.underline} Consider the probability density
$f = C(x^{2} + y^{2})$ on a domain $D$ given by a solid cylinder of unit
radius about the $z$-axis for $- 1 \leq z \leq 1$. Where is the mean
$\mathbb{E}$? Compute the variance $\mathbb{V}$ of this joint pdf.

Symmetry should be used to place the mean at the origin. For the
variance, this is a nice integral in cylindrical coordinates.

[QUESTION 5:]{.underline} Consider a cube in 3-D centered at the origin
whose side length is 2 and which has the following as its density
function:

$$f(x,y,z) = \frac{1}{\left( \sqrt{x^{2} + y^{2} + z^{2}} \right)^{\alpha}}$$

for some constant $\alpha > 0$. This is rather dangerous, as the density
becomes infinite at the center of the cube. For which values of $\alpha$
does the cube have a total mass that is finite?

This is a very good discussion question, since it suggests the use of
spherical coordinates for the density, but the limits of integration on
the cube are awful in spherical coordinates. Try to get the students to
come to the realization that the mass outside a small ball is finite;
thus, one can compute the mass of a ball about the origin. This then
becomes a good way of seeing how critical the volume element is in
dealing with this infinite density.

[QUESTION 6:]{.underline} The standard Gaussian in 2-D is a product of
standard Gaussians in 1-D; thus, via Fubini:

$$\iint_{\mathbb{R}^{2}}^{\ }{\frac{1}{2\pi}e^{- \left( x^{2} + y^{2} \right)}}dA\  = \ \ \left( \int_{\mathbb{R}}^{\ }{\frac{1}{\sqrt{2\pi}}e^{- x^{2}}}dx \right)\left( \int_{\mathbb{R}}^{\ }{\frac{1}{\sqrt{2\pi}}e^{- y^{2}}}dy \right) = 1$$

Use polar coordinates to compute the mass of a 2-D Gaussian within 1, 2,
and 3 standard deviations of the mean. What domains of integration are
these?

This is covered in the bonus videos on Gaussians but is very valuable to
do live. Be sure to remind students of the oft-memorized 68-95-99.7 rule
from statistics. Students will need to use a computer to get the
estimated values of the double integral over discs of radius 1, 2, and
3. For students who are curious about what happens in higher dimensions,
please refer them to the bonus videos.

[QUESTION 7:]{.underline} What coordinate change would you use to
evaluate

$$\iint_{}^{}{xy\ \left( x^{2} + y^{2} \right)\ dx\ dy}$$
over the domain given by $1 \leq xy \leq 4$ and
$1 \leq x^{2} - y^{2} \leq 3$? An alternative option is

$$\iint_{}^{}{x^{2}y^{2}\ \left( y^{2} - x^{2} \right)\ dx\ dy}$$
over the domain given by $1 \leq xy \leq 4$ and $1 \leq y - x \leq 3$.

Discuss that sometimes the coordinate transformation presents itself;
other times, not so much. This problem warns against trying too hard to
simplify the integrand before computing how the volume element changes.
(Students may wonder whether there's a typo with the integrand...)

[QUESTION 8:]{.underline} Use a change of coordinates to evaluate the
challenging integral

$$\int_{y = 0}^{1}{\int_{x = 0}^{1 - y}{e^{{(x + y)}^{2}}dx\ dy}}$$

This presents several difficulties -- the choice of $u = x + y$ and
$v = y\ $should not be too hard to see. However, the transformed
integrand appears impossible and requires changing the order from
$du\ dv$ to $dv\ du$. The other difficulty in this problem is the limits
of integration, which are not rectangular. If $u = x + y$ then the
limits on $u$ are from $- v$ to $1$. This is challenging.

[QUESTION 9:]{.underline} The following coordinate change gives
"toroidal" coordinates about a circle of radius 3 in the $(x,y)$ plane,
with a longitudinal angle about the circle, $\psi$, and cross-sectional
polar coordinates $(r,\phi)$:

$$\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
(3 + r\cos{\phi)\cos\psi} \\
\left( 3 + r\cos\phi \right)\sin\psi \\
r\sin\phi
\end{pmatrix}$$

What is the resulting volume element?

This is not hard to set up; however, the determinant computation is very
involved, and simplifying the resulting volume element is a challenge.
This is best left for the most eager students to pursue.

[QUESTION 10:]{.underline} What is the total charge on sphere when
surface charge density is of the form $\kappa z^{2}$ for a constant
$\kappa$?

Remember to start with the spherical volume element
$dV = \rho^{2}\sin\phi\ d\rho\ d\phi\ d\theta$ and reduce to the surface
area element $d\sigma = R^{2}\ d\phi\ d\theta$, spending lots of time on
this step if needed. Students often struggle with the surface area
element -- cylindrical and spherical coordinates are excellent special
cases that can assist with intuition.

[QUESTION 11:]{.underline} Compute $d\sigma$ for a torus of major radius
3 and minor radius 1, using the surface parametrization from Question 9:

$$\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
(3 + \cos{\phi)\cos\psi} \\
\left( 3 + \cos\phi \right)\sin\psi \\
\sin\phi
\end{pmatrix}$$

This is, as with most surface area elements, ugly, but with some decent
simplifications possible.

[QUESTION 12:]{.underline} Integrate the function $f = z(x^{2} + y^{2})$
over the parametrized surface given by
$x = u\cos v\ ;\ y = u\sin v\ ;\ z = u$ ; $0 \leq u,\ v \leq 1\ .$

[QUESTION 13:]{.underline} Compute the surface area of the cone
$z = \sqrt{x^{2} + y^{2}}$ for $z \leq 4$.

These types of surface integral problems are usually difficult to solve
explicitly. At least one implicit and one parametrized problem should be
practiced.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Use spherical coordinates to compute the volume
of the solid conical domain as shown.

[PROBLEM 2.]{.underline} Compute the volume of the solid $D$ given in
spherical coordinates by:

$$0 \leq \rho \leq 1 + \cos\phi\ \ \ \ \ :\ \ \ \ \ 0 \leq \phi \leq \pi\ \ \ \ \ \ :\ \ \ \ \ \ 0 \leq \theta \leq \pi/2$$

[PROBLEM 3.]{.underline} Convert the following integral to cylindrical
coordinates:

$$\int_{x = 0}^{1}{\int_{y = - \sqrt{1 - x^{2}}}^{\sqrt{1 - x^{2}}}{\int_{z = 0}^{\sqrt{4 - x^{2} - y^{2}}}{x^{2}\ dz\ dy\ dx}}}$$

Do not evaluate the integral: just set it up carefully.

[PROBLEM 4.]{.underline} Consider the region $R$ in the plane defined
by:

$$R_{1}^{2} \leq x^{2} + y^{2} \leq R_{2}^{2}$$

A\) Compute the moment of inertia of this unit-density plate rotated in
the $(x,y)$ plane about the origin.

B\) Compute the radius of gyration of this (unit density) plate.

[PROBLEM 5.]{.underline} Consider the region that is described in
spherical coordinates as

$$0 \leq \rho \leq 2\ \ \ \ \ :\ \ \ \ \frac{\pi}{2} \leq \phi \leq \pi\ \ \ \ \ :\ \ \ \ \ 0 \leq \theta \leq \frac{\pi}{2}$$

A\) Describe carefully and/or draw a careful picture of this domain.

B\) Compute the average of the function $f = 1/\rho$ on this domain.

[PROBLEM 6.]{.underline} Consider the unit-density solid $D$ (centered
at the origin) given by the hemispherical region between a ball of
radius 2 and a ball of radius 1, both with $z \geq 0$.

A\) Compute $\overline{z}$, the z-coordinate of the centroid of $D$.

B\) Does your answer to part (A) make sense?

[PROBLEM 7.]{.underline} Consider the bounded region $D$ in the plane
given by

$$x^{2} + y^{2} \leq R^{2}\ \ \ \ :\ \ \ x \geq 0\ \ \ \ :\ \ \ y \geq x$$

A\) What is the area of this region?

B\) Compute the centroid of $D$.

[PROBLEM 8.]{.underline} Consider a joint probability density function
on three variables, $X,\ Y$ and $Z$, of the form
$\rho(x,y,z) = e^{- Cz}(x^{2} + y^{2})$, where the domain $D$ is the
infinite cylinder where $x^{2} + y^{2} \leq 1$ and $z \geq 0$.

A\) What must the value of $C$ be so that $\rho$ is a probability
density function on $D$?

B\) What is the probability that $Z \leq 1$?

[PROBLEM 9.]{.underline} Consider the surface of a cone given by\
$$z^{2}\  = \frac{1}{3}\ \left( x^{2}\  + \ y^{2} \right)\ .$$

A\) Argue that this cone makes a $60^{\circ}$ angle with the positive
z-axis.

B\) What is the volume element in spherical coordinates?

C\) Compute the volume of the domain given by\
$$x \geq 0\ \ \ ,\ \ \ y \geq 0\ \ \ ,\ \ \frac{1}{4} \leq x^{2} + y^{2} + z^{2} \leq 1\ \ \ ,\ \ \ \ z^{2} \leq \frac{1}{3}(x^{2} + y^{2})$$

[PROBLEM 10.]{.underline} Compute the volume of the portion of the ball
of radius 5 centered at the origin that satisfies
$x^{2} + y^{2} \leq 9$, using cylindrical coordinates.

[PROBLEM 11.]{.underline} Use the change of variables

$$\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = F\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
x^{2} + y^{2} \\
y^{2} - 2x^{2} \\
1 - \frac{z}{3}
\end{pmatrix}$$

to convert to $(u,v,w)$ coordinates the triple integral

$$\iiint_{\ D}^{\ }{x^{3}y + xy^{3}\ dx\ dy\ dz}$$

Do not evaluate the transformed integral or worry about the limits of
integration. Do convert the integrand and volume element in full detail.

[PROBLEM 12.]{.underline} Consider the domain $D$ satisfying

$u \geq \ 0\ \ \ \ \ ;\ \ \ \ \ v \geq \ 0\ \ \ \ \ ;\ \ \ \ 0 \leq \ u - v \leq \ 2\ \ \ \ \ $and
$\ \ 4 \leq \ u^{2} + v^{2} \leq \ 9.$

A\) What would be an appropriate change of variables from $(u,\ v)$
coordinates to $(x,y)$ coordinates to make this domain *nice*?

B\) Compute the integral

$$\iint_{D}^{\ }{u^{2} - v^{2}\ du\ dv\ }$$

using the change of variables from part (A).

[PROBLEM 13.]{.underline} Consider the following variation of
cylindrical coordinates:

$$\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = F\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = \begin{pmatrix}
2u\cos v \\
3u\sin v \\
w
\end{pmatrix}$$

Use the Change of Variables Theorem to convert the volume element
$dV = dx\ dy\ dz$ into $(u,\ v,\ w)$ coordinates.

[PROBLEM 14.]{.underline} Use the change of variables

$$\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = F\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
x^{2} - y^{2} \\
z(x + y) \\
1 + 2z
\end{pmatrix}$$

to convert to $(u,v,w)$ coordinates the triple integral

$$\iiint_{\ D}^{\ }{z^{2}(x + y)^{3}(x - y)\ dx\ dy\ dz}$$

Do not evaluate the transformed integral or give the limits of
integration.

[PROBLEM 15.]{.underline} Use the change of variables

$$\begin{pmatrix}
u \\
v \\
w
\end{pmatrix} = F\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
z - x \\
y^{3} - x^{3} \\
y + z
\end{pmatrix}$$

to convert to $(u,v,w)$ coordinates the triple integral

$$\iiint_{\ D}^{\ }{(x^{2} + y^{2})(x - z)\ dx\ dy\ dz}$$

Do not evaluate the transformed integral or give the limits of
integration.

[PROBLEM 16.]{.underline} Consider the domain $D$ given by

$$3\ {+ \ x}^{3} \leq \ y \leq 5 + x^{3}\ \ \ \ \ and\ \ \ \ \ \ 1 \leq xy \leq 2\ ,\ \ \ \ \ \ x,y > 0.$$

A\) If we let $s = y - x^{3}$, what is a good choice for $t$ so that so
that the bounds on $s$ and $t$ for $D$ are all constants?

B\) Use the Change of Variables Theorem to express the area element
$ds\ dt$ in terms of $dx\ dy$.

C\) Use parts (A) and (B) to compute the following integral:

$$\iint_{D}^{\ }{\frac{3x^{2}}{y} + \frac{1}{x}\ dx\ dy}$$

[PROBLEM 17.]{.underline} Consider the domain $D$ given by

$$\ln x \leq \ y \leq 1 + \ln x\ \ \ \ \ and\ \ \ \ \ \ 1 \leq xy^{3} \leq 3\ ,\ \ \ \ \ \ x,y > 0.$$

A\) If we let $u = y - \ln x$, what is a good choice for $v$ so that so
that the bounds for $u$ and $v$ for $D$ are all constants?

B\) Use the Change of Variables Theorem to express the area element
$du\ dv$ in terms of $dx\ dy$.

C\) Compute the integral of $e^{y}(3y^{5} + y^{6})$ on $D$.

[PROBLEM 18.]{.underline} Compute the integral

$$\iint_{D}^{\ }{\frac{1}{x}\ dx\ dy}$$

where $D$ is the domain given by the inequalities $1 \leq y/x \leq 2$
and $1 \leq x + y \leq 4$.

[PROBLEM 19.]{.underline} Compute the area of the domain $D$ in the
plane given by the inequalities $- 1 \leq x^{2} - y^{2} \leq 1$ and
$1 \leq x + y \leq 10$.

[PROBLEM 20.]{.underline} Let $D$ be the surface in 3-D parametrized as
follows:

$$\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = S\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
2 + 2u \\
u - v \\
1 - 3v
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ 0 \leq u \leq 3\ \ ;\ \  - 1 \leq v \leq 1$$

A\) What is the surface area element $d\sigma$ of this surface?

B\) Use this to compute the surface area of $D$.

[PROBLEM 21.]{.underline} Let $S$ be the surface in 3-D parametrized as
follows:

$$\begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = F\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
1 - u + v \\
2 + 3u - 2v \\
5 + 2u - v
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ u^{2} + v^{2} \leq 1$$

A\) Compute and simplify the surface area element $d\sigma$ of this
surface.

B\) Use this to compute the surface area of $S$.

[PROBLEM 22.]{.underline} Consider the parametrized surface in 3-D given
by

$$S\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
v^{2} \\
u^{2} - v^{2} \\
u^{2}
\end{pmatrix}\ \ \ \ :\ \ \ \ 0 \leq u \leq 2\ \ \ \ :\ \ \ \ 0 \leq v \leq 3$$

A\) Compute and simply the surface area element $d\sigma$, showing all
steps.

B\) Set up but do not solve a surface integral to compute the
$z$-coordinate of the centroid of this surface.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} the volume is $8\pi/3$

[PROBLEM 2.]{.underline} the volume is $2\pi/3$

[PROBLEM 3.]{.underline}

$$\int_{x = 0}^{1}{\int_{y = - \sqrt{1 - x^{2}}}^{\sqrt{1 - x^{2}}}{\int_{z = 0}^{\sqrt{4 - x^{2} - y^{2}}}{x^{2}\ dz\ dy\ dx}}}\ \  = \int_{\theta = - \frac{\pi}{2}}^{\frac{\pi}{2}}{\int_{r = 0}^{1}{\int_{z = 0}^{\sqrt{4 - r^{2}}}{r^{4}\cos^{2}\theta\ dz\ dr\ d\theta}}}$$

[PROBLEM 4.]{.underline} A)
$I = \frac{1}{2}\pi\left( R_{2}^{4} - R_{1}^{4} \right) = \frac{1}{2}M\left( R_{2}^{2} + R_{1}^{2} \right)$
; B) $R_{g} = \sqrt{\frac{\left( R_{1}^{2} - R_{2}^{2} \right)}{2}}$

[PROBLEM 5.]{.underline} this is one-eighth of a ball of radius two,
with

$$\overline{f} = \frac{3}{4\pi}\int_{\ }^{\ }{\frac{1}{\rho}dV} = \frac{3}{4}$$

[PROBLEM 6.]{.underline}

$$\overline{z} = \frac{1}{V}\int_{D}^{\ }{z\ dV} = \left( \frac{3}{14\pi} \right)\left( \frac{15\pi}{4} \right) = \frac{45}{56}$$

[PROBLEM 7.]{.underline} A) $A = \pi R^{2}/8$ ; B)

$$\overline{x} = \frac{4R}{3\pi}\left( 2 - \sqrt{2} \right)\ \ \ :\ \ \overline{y} = \frac{4R}{3\pi}\sqrt{2}$$

[PROBLEM 8.]{.underline} A) $C = \pi/2$ ; B)
$\mathbb{P} = 1 - e^{- \pi/2}$

[PROBLEM 9.]{.underline} C)

$$V = \int_{0}^{\frac{\pi}{2}}{\int_{\frac{\pi}{3}}^{\frac{\pi}{2}}{\int_{\frac{1}{2}}^{1}{\rho^{2}\sin\phi d\rho\ d\phi\ d\theta}}} = \frac{7\pi}{96}$$

[PROBLEM 10.]{.underline}

$$V = \int_{0}^{2\pi}{\int_{0}^{3}{\int_{- \sqrt{25 - r^{2}}}^{\sqrt{25 - r^{2}}}{r\ dz\ dr\ d\theta}}} = \frac{244\pi}{3}$$

[PROBLEM 11.]{.underline} using $du\ dv\ dw = 4xy\ dx\ dy\ dz$,

$$\iiint_{\ }^{\ }{x^{3}y + xy^{3}\ dx\ dy\ dz} = \iiint_{\ }^{\ }{\frac{u}{4}\ du\ dv\ dw}$$

[PROBLEM 12.]{.underline} A) $x\  = \ u - v$ and $y = u^{2} + v^{2}$; B)
using $dx\ dy = 2|u + v|\ du\ dv$,

$$\iint_{D}^{\ }{u^{2} - v^{2}\ du\ dv} = \int_{4}^{9}{\int_{0}^{2}{\frac{x}{2}\ dx\ dy}} = 5$$

[PROBLEM 13.]{.underline} using the Change of Variables Theorem,

$$dx\ dy\ dz = \left| \det\begin{bmatrix}
2\cos v & - 2u\sin v & 0 \\
3\sin{v\ } & 3u\cos v & 0 \\
0 & 0 & 1
\end{bmatrix} \right| = 6|u|\ du\ dv\ dw$$

[PROBLEM 14.]{.underline} using the Change of Variables Theorem,

$$\iiint_{\ D}^{\ }{z^{2}(x + y)^{3}(x - y)\ dx\ dy\ dz} = \iiint_{\ }^{\ }{\frac{uv}{4}\ du\ dv\ dw}$$

[PROBLEM 15.]{.underline} using the Change of Variables Theorem,

$$\iiint_{\ D}^{\ }{(x^{2} + y^{2})(x - z)\ dx\ dy\ dz} = \iiint_{\ \ }^{\ }{- \frac{u}{3}\ du\ dv\ dw}$$

[PROBLEM 16.]{.underline} A) $t = xy$ ; B)
$ds\ dt\  = \ \left( 3x^{3} + y \right)\ dx\ dy$ ; C)

$$\iint_{D}^{\ }{\frac{3x^{2}}{y} + \frac{1}{x}\ dx\ dy} = \ \int_{1}^{2}{\int_{3}^{5}\frac{ds\ dt}{t}} = 2\ln 2$$

[PROBLEM 17.]{.underline} A) $v = xy^{3}$ ; B)
$du\ dv\  = \ \left( 3y^{2} + y^{3} \right)dx\ dy$ ; C)

$$\iint_{D}^{\ }{e^{y}\left( 3y^{5} + y^{6} \right)\ dx\ dy} = \int_{1}^{3}{\int_{0}^{1}{e^{u}v\ du\ dv}} = 4(e - 1)$$

[PROBLEM 18.]{.underline} with $u = y/x\ $and $v = x + y$, compute
$du\ dv = |x + y|/x^{2}dx\ dy$, yielding

$$\int_{1}^{4}{\int_{1}^{2}{\frac{1}{1 + u}\ du\ dv}} = 3\ln\frac{3}{2}$$

[PROBLEM 19.]{.underline} with $u = x^{2} - y^{2}\ $and $v = x + y$,
compute $du\ dv = 2|x + y|dx\ dy$, yielding

$$\int_{1}^{10}{\int_{- 1}^{1}{\frac{u}{2v}\ du\ dv}} = \ln 10$$

[PROBLEM 20.]{.underline} A) $d\sigma = 7\ du\ dv$ ; B) $42$

[PROBLEM 21.]{.underline} A) $d\sigma = \sqrt{3}\ du\ dv\ $; B)
$\pi\sqrt{3}$

[PROBLEM 22.]{.underline} A) $d\sigma = 4\sqrt{3}\ uv\ du\ dv$; B)

$$\overline{z} = \frac{\int_{0}^{2}{\int_{0}^{3}{4\sqrt{3}}\ u^{3}v\ dv\ du}}{\int_{0}^{2}{\int_{0}^{3}{4\sqrt{3}}\ uv\ dv\ du}}$$

## 

# VOLUME IV : FIELDS

## 

## Week 12 : Path Integrals

### OUTLINE

MATERIALS: Calculus BLUE : Vol 4 : Chapters 1-5

TOPICS:

- Scalar and vector fields

- 1-forms and 1-form fields

- Scalar path integrals

- Gradient 1-form fields

- Scalar path integrals

- Path integrals and 1-form fields

- Independence of Path Theorem and potential functions

- Work and flux 1-forms and path integrals

LEARNING OBJECTIVES:

- Identify different types of fields : scalar, vector, 1-form

- Set up and compute scalar path integrals

- Evaluate 1-forms and 1-form fields on $\mathbb{R}^{n}$

- Integrate 1-form fields over parametrized paths

- Identify gradient 1-form fields

- Determine a potential function from a gradient 1-form field

- Use the Independence of Path Theorem to compute path integrals

- Interpret work and flux 1-forms in the plane

### PRIMER

This begins the fourth quarter of our story, concerning *fields*.

FIELDS. A *scalar field* is nothing more than the usual scalar-valued
functions we have been working with all along:
$f:\mathbb{R}^{n}\mathbb{\rightarrow R}$. From now on, we are going to
think of this as a *field* -- an assignment of a scalar value to every
point in the domain. There are other types of fields, determined by what
type of object is assigned to every point in the domain. For example,
the gradient of $f$, $\nabla f$, is an example of a *vector field*: at
each point in space a vector is assigned. Vector fields are very useful
in Mathematics (geometry and differential equations in particular). They
also hold a prominent role in Physics: electric, gravitational, and
velocity fields are all important vector fields.

SCALAR PATH INTEGRALS. We begin the calculus of fields with integration.
Given a scalar field $f$ on $\mathbb{R}^{n}$, we already know how to
integrate it with respect to the volume element $dV$ over an
$n$-dimensional domain $D \subset \mathbb{R}^{n}$. Consider what happens
with a path $\gamma:\lbrack a,b\rbrack \rightarrow \mathbb{R}^{n}$. If
we try to integrate $f$ over the image of $\gamma$ with respect to $dV$,
we get zero (unless $n = 1$ of course). However, it should be possible
to integrate the scalar field $f$ over the path with respect to the
"internal" arclength element. This can be done by pulling back the
scalar field to the parameter domain of $\gamma$. We define the scalar
path integral of $f$ over $\gamma$ as follows, using the arclength
element $d\mathcal{l}$ from Chapter 2:

$$\int_{\gamma}^{\ }{f\ d\mathcal{l}}\  = \ \int_{t = a}^{b}{f\left( \gamma(t) \right)\left| \gamma'(t) \right|dt}\ .$$

The crucial result -- which is an immediate consequence of the Change of
Variables Theorem of last week -- is that this integral is *independent
of the parametrization* of the path (and orientation). Only the
geometric path matters.

1-FORMS. Recall from Week 7 that for a scalar field $f$ on
$\mathbb{R}^{n}$, we can interpret the *gradient* $\nabla f$ as a vector
field. One could likewise interpret the derivative $\lbrack Df\rbrack$
as a "matrix field," since the linear transformation depends on the
point of evaluation. There is one more type of object associated with
the derivative of a scalar field: recall also from Week 7 our use of
*differentials* and our suspicious lack of formal definitions. We
invoked "implicit differentiation" to make sense of

$$df = \frac{\partial f}{\partial x_{1}}dx_{1} + \frac{\partial f}{\partial x_{2}}dx_{2} + \ \cdots\  + \frac{\partial f}{\partial x_{n}}dx_{n}\ .$$

This is now to be classified as a new type of field: a *1-form field*.
On $\mathbb{R}^{n}$, the *basis 1-forms* $dx_{i}$ are objects that
accept a vector in $\mathbb{R}^{n}$ and return the *i^th^* component:
they are *projections* onto the $x_{i}$ axis. One can combine these
basis forms linearly, in the same way that one combines basis vectors.
By having coordinate-dependent coefficients in front of the basis
1-forms, one obtains a 1-form field that varies from point-to-point. The
gradient 1-form field $df$ is such an object; other 1-form fields are
not of the form $df$ for any function $f$ (see *Independence of Path*,
below).

INTEGRATING 1-FORMS. Of the many ways to think about 1-form fields -- as
objects like vector fields or (locally) like gradients -- there is one
pseudo-definition that is tied to integration: a 1-form field $\alpha$
is an object that wishes to be integrated over a path. It is no
coincidence that the integrands from single-variable calculus are 1-form
fields on $\mathbb{R}^{1}$ of the form $f(x)dx$, and we know how to
integrate such (via a limit of Riemann sums). For a general 1-form field
$\alpha$ on $\mathbb{R}^{n}$ and a given parametrized path
$\gamma:\lbrack a,b\rbrack \rightarrow \mathbb{R}^{n}$, one defines the
integral via:

$$\int_{\gamma}^{\ }\alpha\  = \ \int_{t = a}^{b}{\alpha\left. \  \right|_{\gamma(t)}\left( \gamma'(t) \right)dt}\ .$$

That is, at each point along the path, one feeds the velocity vector of
the path to the 1-form evaluated at that point. Add these values up
along the path, and this is the integral. This is independent of the
parametrization, thanks to the Change of Variables Theorem. Changing the
orientation (one could write $- \gamma$) is equivalent to reversing the
parameter $t \mapsto - t$ and, since $dt \mapsto - dt,\ $this yields a
minus sign.

INDEPENDENCE of PATH THEOREM. The first fundamental theorem of this
Volume ties together 1-form fields, derivatives, and integrals. It
states that for a gradient 1-form field $df\ $and a path
$\gamma:\lbrack a,b\rbrack \rightarrow \mathbb{R}^{n}$, the path
integral is

$$\int_{\gamma}^{\ }{df}\  = \ f\left( \gamma(b) \right) - f\left( \gamma(a) \right).$$

This is, of course, the Fundamental Theorem of Integral Calculus, writ
in terms of 1-form fields and path integrals.

When is a 1-form field a gradient? This is nontrivial. In
single-variable calculus, every \[integrable\] function $f(x)$ is the
derivative of the definite integral $\int_{a}^{x}{f(t)dt}$; thus, every
1-form field on $\mathbb{R}^{1}$ is a gradient 1-form. Beginning with
dimension two, there are (many!) 1-form fields $\alpha$ which cannot be
expressed as a gradient. The simple example $\alpha = y\ dx - x\ dy$ is
illustrative of the general case: if $\alpha = df$, then
$y = \partial f/\partial x$ and $- x = \partial f/\partial y$. However,
this cannot be, since then the second partial derivatives do not match:

$$- 1 = \frac{\partial}{\partial x}( - x) = \frac{\partial}{\partial x}\left( \frac{\partial f}{\partial y} \right) = \frac{\partial^{2}f}{\partial x\ \partial y} \neq \frac{\partial^{2}f}{\partial x\ \partial y} = \frac{\partial}{\partial y}\left( \frac{\partial f}{\partial x} \right) = \frac{\partial}{\partial y}(y) = 1$$

It is a fascinating result that this matching of partial derivatives is
an if-and-only-if obstruction to being a gradient. If $\alpha$ is a
1-form field all pairwise partial derivatives match, then $\alpha = df$
is the gradient of some scalar field \[a "*potential*"\]. The problem of
finding such a potential is an interesting challenge of computing the
antiderivatives of the components of $\alpha$ and trying to match them
up to a consistent scalar field. Is that approach better than taking an
educated guess and checking whether it works? Any potential that works
is a good potential.

WORK & FLUX. Why are we computing path integrals? There are two
motivations for doing so in the plane, where we will initially focus.
Given a planar vector field $\overrightarrow{F}$, consider the following
two strangely (skew-)symmetric 1-form fields:

- *Work 1-form* : $\alpha_{\overrightarrow{F}} = F_{x}dx + F_{y}dy$.

- *Flux 1-form* : $\phi_{\overrightarrow{F}} = F_{y}dx - F_{x}dy$.

Integrating the work 1-form $\alpha_{\overrightarrow{F}}$ over a curve
measures the work done by the vector field $\overrightarrow{F}$ along
the curve: think of how the wind impacts the time and fuel needs of a
cross-country flight. Integrating the flux 1-form
$\phi_{\overrightarrow{F}}$ across the curve measures how much the
vector field $\overrightarrow{F}$ "pushes" stuff across the curve (where
there is an orientation -- the signs on the flux 1-form are chosen so
that if you integrate $\phi_{\overrightarrow{F}}$ along a closed loop
(*e.g.*, a circle) you get the *outward* flux of the vector field across
the curve, measuring what flows from inside to outside. Work and flux
are primary motivations for integrating form fields. The work 1-form
extends naturally to vector fields on $\mathbb{R}^{n}$ via an analogous
work 1-form. The flux 1-form does not extend to a 1-form on
$\mathbb{R}^{3}$, since a curve does not separate space into an *inside*
and *outside* the way it does in the plane. The problem of how to
efficiently compute work and flux, as well as how to generalize flux
forms to 3-D, will be addressed next week.

### DISCUSSION

[QUESTION 1:]{.underline} \[*speaking to a full classroom*\] "Can you
collectively give an example of a vector field by using your arms as
vectors?"

Students will point in random directions. Ask if this is a continuous
vector field, whatever that might mean. Follow up by asking students to
work together to make a continuous vector field. They will likely all
point at the speaker; or perhaps all in a consistent direction. This is
a very good way to discuss continuity of vector fields without
belaboring the definitions.

[QUESTION 2:]{.underline} What is the simplest parametrization of a
straight line between two points in $\mathbb{R}^{3}$?

After figuring this out, redo the problem for arbitrary dimensions: how
much harder was that?

[QUESTION 3:]{.underline} Integrate the scalar field
$f = e^{- x^{2} - y^{2}}$ over the circle of radius $R$ about the
origin. What value of $R$ maximizes this integral?

This leads to some interesting discussions: can you maximize without
doing all the work explicitly? This foreshadows Green's theorem.

[QUESTION 4:]{.underline} Can you think of an example of a scalar field
whose integral would be finite on the hyperbola $xy = 1$ where
$x,\ y \geq 0$?

This is good for reviewing asymptotic thinking and the arclength
element. When students suggest using zero or mixed positive/negative
values, refine the question to strictly positive fields.

[QUESTION 5:]{.underline} Compute the centroid of the helical arc given
by

$$\gamma(t) = \left( \begin{array}{r}
\cos t \\
\sin t \\
t
\end{array} \right)\ \ \ \ \ :\ \ \ \ 0 \leq t \leq \pi$$

How much of this can be determined without doing the scalar path
integral?

[QUESTION 6:]{.underline} Compute the integral of $y^{2}dx$ over the
graph of $y = x^{2}$ as $- 1 \leq x \leq 1$. What happens if instead you
use the straight path between endpoints? Explain the difference between
the answers. Which was easier to compute?

This is a good problem for emphasizing path dependence, as well as
building intuition for what integration of a 1-form means: one is adding
up values of $y^{2}$ times the infinitesimal change in $x$.

[QUESTION 7:]{.underline} Integrate the 1-form field $x\ dy$ along a
path in the plane from $(0,0)$ to $(a,b)$ by (1) a straight line; and
(2) a horizontal line followed by a vertical line.

This has a bit of foreshadowing for Green's Theorem. It is very good to
dwell on the observation that the horizontal path is "invisible" to
$dy$. The next question is a good follow-up.

[QUESTION 8:]{.underline} Consider again the 1-form field $x\ dy$ and
think of it as a "sensor" that detects someone moving along a path.
Given any two points in the plane, is it possible to design a path
between them so that $x\ dy$ never detects the motion?

With the previous problem done, the horizontal component is clear, and
the vertical component along the y-axis is not hard to figure out
either. Ask students as a follow-up if such motion is possible for a
typical or even arbitrary 1-form field in the plane.

[QUESTION 9:]{.underline} Compute the integral of
$x^{2}dx + yz\ dy + \frac{1}{2}y^{2}dz$ over the path from the origin to
$(0,0,10)$ given by $\gamma(t)$ where $x(t) = e^{t}\sin{4\pi t}$;
$y(t) = \ t(t - 1)\cos^{2}t$, and $z(t) = 10t^{10}$ for $t = 0...1$.

Moral: whenever the integral looks impossible, look to a Theorem to get
you out of trouble...

[QUESTION 10:]{.underline} Integrate the 1-form field on
$\mathbb{R}^{n}$ given by $\sum_{i}^{}{x_{i + 1}dx_{i}}$ (cyclic
ordering) over the straight path from $0$ to $1$.

What does this integral measure? Does this integral exhibit path
independence?

[QUESTION 11:]{.underline} Is the following vector field a gradient?

$$\overrightarrow{V} = \left( e^{xy} + xye^{xy} \right)\widehat{i} + \left( x^{2}e^{xy} + ze^{- yz} \right)\widehat{j} + \left( ye^{- yz} \right)\widehat{k}$$

What is its potential field? Use this to practice partial-integration
versus outright guessing. In this case, it is probably easier to find
the potential than it is to do the six partial derivatives to check for
whether the potential exists.

[QUESTION 12:]{.underline} Compute the flux of the vector field
$\overrightarrow{F} = x\widehat{i} + y\widehat{j}$ across the loop given
by the square connecting (in order) the vertices
$(0,0),\ (L,0),\ (L,L),\ (0,L)$.

This is a good setup for Green's Theorem next week. This problem
emphasizes the need to break this loop into segments, as well as the
benefit in thinking before integrating, as the first and fourth path
segments have the field tangent to the path & thus have no flux. One can
also profitably compute the work along this loop as an exercise.

[QUESTION 13:]{.underline} Compute the work $W$ done by the vector field
$\overrightarrow{F}\  = \ xy\ \widehat{i}\  + \ yz\mathbf{\ }\widehat{j}\  + \ xz\ \widehat{k}$
along the straight-line path from $(1,2,0)$ to $(4,3, - 1)$.

Can one make sense of computing the flux of the field along this curve?
Why not? Why is it that work does make sense in 3-D or any dimension
with a single (work) 1-form? Can one make sense of computing the flux of
the field along this curve? Why not? Why is it that work makes sense in
3-D or any dimension by computing a single (work) 1-form?

[QUESTION 14:]{.underline} How do you remember the formulae for work and
flux 1-forms of a planar vector field
$\overrightarrow{F} = F_{x}\widehat{i} + F_{y}\widehat{j}$ ?

See if students can remember. The work is the easier of the two:
$\alpha_{\overrightarrow{F}} = F_{x}\ dx + F_{y}\ dy$. For the flux
1-form $\phi_{\overrightarrow{F}} = F_{x}\ dy - F_{y}\ dx$, point out
the lexicographic ordering of the $x$ and $y$ terms. Is this a trick?
No, it is not, as shall be seen after doing differential forms in
subsequent weeks.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Compute the scalar path integral

$$\int_{\gamma}^{\ }x^{2}d\mathcal{l}$$

where $\gamma$ is a path that follows the graph of the function
$y = \ln x$ for $0 < x \leq \sqrt{3}\ $.

[PROBLEM 2.]{.underline} Consider a unit-density semicircular wire
$\gamma\ $given by $x^{2} + y^{2} = R^{2}$ with $y \geq 0$. The moments
of inertia of this wire about the $x$- and $y$-axes are, respectively,

$$I_{x} = \int_{\gamma}^{\ }y^{2}d\mathcal{l\ \ \ \ \ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \ \ \ }I_{y} = \int_{\gamma}^{\ }x^{2}d\mathcal{l\ }$$

where $d\mathcal{l}$ is the arclength element. Compute both these scalar
path integrals and comment on which moment of inertia is larger, if
there is a larger one.

[PROBLEM 3.]{.underline} Consider the scalar field
$f(x,y,z) = x^{2} - y + z^{2}$.

A\) Compute the scalar path integral

$$\int_{\gamma}^{\ }{f\ d\mathcal{l}}\ \ \ \ \ \ :\ \ \ \ \ \gamma(t) = \begin{pmatrix}
\cos{2t} \\
 - 2t \\
\sin{2t}
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ t = 0\ldots\pi$$

B\) Explain what happens to the value of this integral if the path is
instead

$$\widetilde{\gamma}(t) = \begin{pmatrix}
\cos t \\
 - t \\
\sin t
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ t = 0\ldots 2\pi$$

[PROBLEM 4.]{.underline} Consider the scalar field in the plane
$f(x,y) = x^{2} + \frac{1}{2}y^{2} - 1$, and the scalar path integral
$I = \int_{\gamma}^{\ }{f\ d\mathcal{l}},\ $where $\gamma$ is the
straight-line path from the origin to the point $(3,\ 4)$.

A\) Given an explicit parametrization of $\gamma$, using a parameter
$t$.

B\) Compute the value of $I$.

C\) Is the value you found for $I$ positive, negative, or zero; and does
that make sense?

[PROBLEM 5.]{.underline} Compute the scalar path integral

$$\int_{\gamma}^{\ }{x + y^{2} + z^{3}\ d\mathcal{l}}$$

over the straight-line path from the origin to the point
$(6,\ 3,\  - 2)$.

[PROBLEM 6.]{.underline} Consider the scalar field
$f(x,y,z) = \sqrt{x} - y + z$.

A\) Compute the scalar path integral

$$\int_{\gamma}^{\ }{f\ d\mathcal{l}}\ \ \ \ \ \ :\ \ \ \ \ \gamma(t) = \begin{pmatrix}
t^{2} \\
1 - 2t^{2} \\
2t^{2}
\end{pmatrix}\ \ \ \ \ \ :\ \ \ \ \ t = 0\ldots 1$$

B\) Explain what happens to the value of this integral if you reverse
the direction of the path; e.g., by letting $t$ go from $1$ to $0$
instead.

[PROBLEM 7.]{.underline} Consider the scalar field
$f(x,y,z) = x^{2} + y^{2} + 3z$.

A\) Compute the scalar path integral

$$\int_{\gamma}^{\ }{f\ d\mathcal{l}}\ \ \ \ \ \ :\ \ \ \ \ \gamma(t) = \begin{pmatrix}
2\cos t \\
 - 2\sin t \\
3t
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ t = 0\ldots\pi$$

B\) Explain what happens to the value of this integral if you reverse
the direction of the path; e.g., by letting $t$ go from $\pi$ to $0$
instead.

[PROBLEM 8.]{.underline} Compute directly the work done by the vector
field

$$\overrightarrow{F} = \ \left( x^{2}z \right)\widehat{i} + \left( yz^{2} \right)\widehat{j} + (xyz)\widehat{k}$$

Along the path $\gamma$ given by

$$\gamma(t) = \begin{pmatrix}
x \\
y \\
z
\end{pmatrix} = \begin{pmatrix}
t^{1/2} \\
t^{2} \\
t^{3/2}
\end{pmatrix}\ \ \ \ \ \ :\ \ \ \ \ 0 \leq t \leq 2$$

[PROBLEM 9.]{.underline} Use the Independence of Path Theorem to compute
the work done by the gradient vector field

$$\nabla f = \left( yz(y - z) \right)\widehat{i} + \left( xz(2y - z) \right)\widehat{j} + \left( xy(y - 2z) \right)\ \widehat{k}$$

over the path $\gamma$ given by

$$\gamma(t) = \begin{pmatrix}
x(t) \\
y(t) \\
z(t)
\end{pmatrix} = \begin{pmatrix}
t^{2} - 5t + 1 \\
4t - 1 \\
t^{2} - 2
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ 0 \leq t \leq 1$$

[PROBLEM 10.]{.underline} Compute the integral

$$\int_{\gamma}^{\ }{y^{2}dx + \left( 2xy + y^{2} + z \right)dy + y\ dz}\ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \gamma(t) = \begin{pmatrix}
t^{2} \\
\sqrt{t} \\
1 + t
\end{pmatrix}\ \ \ :\ \ 0 \leq t \leq 4$$

[PROBLEM 11.]{.underline} Consider the constant vector field
$\overrightarrow{F} = 2\ \widehat{i} - 3\ \widehat{j}$.

A\) Write down the work 1-form $\alpha_{\overrightarrow{F}}$ and the
flux 1-form $\phi_{\overrightarrow{F}}$ associated to
$\overrightarrow{F}$.

B\) Let $\gamma$ be the typical counterclockwise unit circle in the
plane. Compute both the flux of the field $\overrightarrow{F}$ across
$\gamma$ and the work done by $\overrightarrow{F}$ along $\gamma$.

C\) Explain briefly why both the work and the flux were zero.

[PROBLEM 12.]{.underline} Consider the vector field
$\overrightarrow{F} = (y - 3x)\widehat{i} + (x - 5y)\widehat{j}$.

A\) What is the work 1-form $\alpha_{\overrightarrow{F}}$ associated to
$\overrightarrow{F}$?

B\) Let $\gamma$ be a path that traces out a clockwise circle of radius
2 centered at the origin. Give an explicit parametrization of this path
$\gamma$, using $t$ as a parameter.

C\) Compute the work done by the field $\overrightarrow{F}$ along the
path $\gamma$.

[PROBLEM 13.]{.underline} Consider the following 1-form field:

$$\alpha = y\ dx + xz\ dy + x^{2}\ dz$$

A\) What is the value of $\alpha$ at $(2,\ 3,\  - 1)$ evaluated on the
vector $v = \begin{pmatrix}
 - 2 \\
4 \\
1
\end{pmatrix}$?

B\) Integrate $\alpha$ over the path $\gamma(t) = \begin{pmatrix}
t \\
t^{2} \\
t^{3}
\end{pmatrix}\ :\ 0 \leq t \leq 1$.

C\) This $\alpha$ is the work 1-form of what vector field on
$\mathbb{R}^{3}$?

[PROBLEM 14.]{.underline} Consider the vector field
$\overrightarrow{F} = \left( x^{2} - \frac{y^{2}}{x} \right)\widehat{i} + (xy)\widehat{j}$.

A\) Let $\gamma$ be the path which goes from $(0,\ 0)$ to $(1,\ 1)$
along the graph of $y = x^{k}$, where $k > 0$ is a constant. Give an
explicit parametrization of this path, using $t$ as a parameter.

B\) Compute the work done by the field $\overrightarrow{F}$ along the
path $\gamma$ (both as above).

C\) What is the limit of this work as $k \rightarrow \infty$ ?

[PROBLEM 15.]{.underline} Consider the following 1-form fields:

$$\alpha_{1} = 2y\ dx - dy + x\ dz\ \ \ \ \ :\ \ \alpha_{2} = z\ dx - x^{2}dy + y^{2}dz\ $$

A\) Evaluate both these 1-form fields$\ $at the point $( - 1,\ 2,\ 3)$.

B\) Which of these 1-form fields has the smaller integral over the
straight-line path from the origin to the point $(1,\ 2,\  - 2)$?

[PROBLEM 16.]{.underline} Consider the 1-form field

$$\alpha = 3y\ dx + (3x + 2y)dy + 2z\ dz$$

A\) What is the value of $\alpha$ at $(2,\ 1,\  - 1)$ evaluated on the
vector $v = \begin{pmatrix}
1 \\
2 \\
0
\end{pmatrix}$?

B\) Find a potential function $f$ for $\alpha$, so that $\alpha = df$.

C\) Compute the integral of $\alpha\ $along the straight-line path from
$(0,1,2)$ to $( - 1,2,3)$.

[PROBLEM 17.]{.underline} Compute -- directly or via finding a potential
-- the integral

$$\int_{\gamma}^{\ }{z\ dx - 2yz\ dy + \left( 1 - y^{2} + x \right)\ dz}$$

over the straight-line path $\gamma$ from $(0,\ 1,\ 2)$ to
$(1,\ 2,\ 3)$.

[PROBLEM 18.]{.underline} Consider the planar vector field
$\overrightarrow{F} = y\ \widehat{i} - x\ \widehat{j}$.

A\) Draw a picture of this vector field and describe what it looks like.

B\) What is the circulation of this vector field along the
(counterclockwise) unit circle centered at the origin?

C\) Explain why the flux of this vector field along the
(counterclockwise) unit circle centered at the origin is exactly zero.

[PROBLEM 19.]{.underline} Compute the integral

$$\int_{\gamma}^{\ }{ye^{xy}\ dx + \left( xe^{xy} - ze^{- yz} \right)dy + (e^{z} - ye^{- yz})\ dz}\ \ \ \ \ \ :\ \ \ \ \gamma(t) = \begin{pmatrix}
t \\
2t \\
3t
\end{pmatrix}\ \ \ :\ \ 0 \leq t \leq 1$$

[PROBLEM 20.]{.underline} Compute the integral

$$\int_{\gamma}^{\ }{(z - y^{2})dx + (2y - 2xy)dy + x\ dz}\ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \gamma(s) = \begin{pmatrix}
s^{2/3} \\
\sqrt{s/2} \\
s - 4
\end{pmatrix}\ \ \ :\ \ 0 \leq s \leq 8$$

[PROBLEM 21.]{.underline} Consider the following planar vector field

$$\overrightarrow{F} = \left( C^{2}x - y \right)\widehat{i} + (2Cx - Cy)\widehat{j}$$

which depends on some constant $C$. Which value of $C$ produces zero
work done by this field over the path given by the graph of $y = x^{2}$
for $- 1 \leq x \leq 1$?

[PROBLEM 22.]{.underline} Consider the following vector field on the
$(x,y)$ plane:

$$\overrightarrow{F} = \left( x^{2} + 1 \right)\widehat{i} + (x + y)\widehat{j}\ $$

A\) What is the value of $\overrightarrow{F}$ at the point $(3,\ 1)$?

B\) What is the work 1-form $\alpha_{\overrightarrow{F}}$ associated
with $\overrightarrow{F}$?

C\) What is the flux 1-form $\phi_{\overrightarrow{F}}$ associated with
$\overrightarrow{F}$?

D\) If you walk along a straight-line path from the origin to the point
$(3,\ 1)$, is the amount of work done by the field along the path
positive, negative, zero, or undetermined?

[PROBLEM 23.]{.underline} Consider the planar 1-form field $\alpha$ and
vector field $\overrightarrow{V}$ given by

$$\alpha = 2y\ dx - x^{2}dy\ \ \ \ \ \ \ \ :\ \ \ \ \ \ \ \ \overrightarrow{V} = - x\ \widehat{i} + y\ \widehat{j}\ $$

A\) Evaluate both $\alpha$ and $\overrightarrow{V}$ at the point
$(2, - 1)$.

B\) Is $\alpha$ the gradient 1-form of a potential function $f(x,y)$?

C\) Draw a picture of $\overrightarrow{V}$ near the origin.

[PROBLEM 24.]{.underline} Use the Independence of Path Theorem to
integrate

$$\left( \frac{2x}{y} - 1 \right)dx + \left( 3y^{2} - \frac{x^{2}}{y^{2}} \right)dy$$

over the path $\gamma$ given by

$$\gamma(t) = \begin{pmatrix}
x(t) \\
y(t)
\end{pmatrix} = \begin{pmatrix}
1 + \arctan{(t^{2} - t)} \\
t + {2\ cos}{3\pi t}
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ 0 \leq t \leq 1$$

[PROBLEM 25.]{.underline} Consider the vector field
$\overrightarrow{F} = \left( x^{2} + y \right)\widehat{i} + (2xy - 1)\widehat{j}$.

A\) What is the work 1-form $\alpha_{\overrightarrow{F}}$ associated to
$\overrightarrow{F}$?

B\) Let $\gamma$ be the path in the plane which goes from $( - 1,0)$ to
$(1,0)$ along the graph of $y = 1 - x^{2}$. Give an explicit
parametrization of this path, using $t$ as a parameter.

C\) Compute the work done by the field $\overrightarrow{F}$ along the
path $\gamma$.

[PROBLEM 26.]{.underline} Use the Independence of Path Theorem to
compute the work done by the vector field

$$\overrightarrow{F} = \left( 1 - 2\sqrt{yz} \right)\widehat{i} + \left( 2 - \frac{x\sqrt{z}}{\sqrt{y}} \right)\widehat{j} + \left( 3 - \frac{x\sqrt{y}}{\sqrt{z}} \right)\widehat{k}$$

over the path $\gamma$ given by

$$\gamma(t) = \begin{pmatrix}
x(t) \\
y(t) \\
z(t)
\end{pmatrix} = \begin{pmatrix}
(t - 1)^{2} \\
1 + 3t \\
4 + 5t
\end{pmatrix}\ \ \ \ \ :\ \ \ \ \ 0 \leq t \leq 1$$

[PROBLEM 27.]{.underline} Consider the vector field
$\overrightarrow{F} = \left( x^{2} + y^{2} \right)\widehat{i} + (2xy - 1)\widehat{j}$.

A\) What is the flux 1-form $\phi_{\overrightarrow{F}}$ associated to
$\overrightarrow{F}$?

B\) Let $\gamma$ be the path in the plane which traces out a
counterclockwise circle of radius 2 centered at the origin. Give an
explicit parametrization of this path, using $t$ as a parameter.

C\) Compute the flux done by the field $\overrightarrow{F}$ across the
path $\gamma$.

[PROBLEM 28.]{.underline} Consider the vector field
$\overrightarrow{F} = (x + y)\widehat{i} + (x - y)\widehat{j}$.

A\) What is the work 1-form $\alpha_{\overrightarrow{F}}$ associated to
$\overrightarrow{F}$?

B\) Let $\gamma$ be the straight-line path in the plane which goes from
$( - 1,2)$ to $(3,0)$. Give an explicit parametrization of this path,
using $t$ as a parameter.

C\) Compute the work done by the field $\overrightarrow{F}$ along the
path $\gamma$.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} $7/3$

[PROBLEM 2.]{.underline} $I_{x} = I_{y} = \frac{1}{2}\pi R^{3}$

[PROBLEM 3.]{.underline} A) $2\sqrt{2}\pi(1 + \pi)$; B) same path, same
integral

[PROBLEM 4.]{.underline} A) $\gamma(t) = \left( \begin{array}{r}
3t \\
4t
\end{array} \right),\ 0 \leq t \leq 1$ ; B) $I = 70/3$

[PROBLEM 5.]{.underline} $d\mathcal{l =}7\ dt$ and
$\int_{\gamma}^{}{x + y^{2} + z^{3}\ d\mathcal{l}} = 28$

[PROBLEM 6.]{.underline} $d\mathcal{l = \ }6t\ dt$ and
$\int_{\gamma}^{}{\sqrt{x} - y + z\ d\mathcal{l}} = 5$

[PROBLEM 7.]{.underline} $d\mathcal{l =}\sqrt{13}\ dt$ and
$\int_{\gamma}^{}{x^{2} + y^{2} + 3z\ d\mathcal{l}} = \sqrt{13}\ \pi\left( 4 + \frac{9}{2}\pi \right)$

[PROBLEM 8.]{.underline} net work equals
$\int_{\gamma}^{}{x^{2}z\ dx + yz^{2}dy + xyz\ dz} = \frac{4}{3} + \frac{256}{7} + \frac{96}{11}\sqrt{2}$
(ugh)

[PROBLEM 9.]{.underline} $36 - 6 = 30$, using the potential function
$f = xyz(y - z)$

[PROBLEM 10.]{.underline}
$\int_{\gamma}^{\ }{y^{2}dx + \left( 2xy + y^{2} + z \right)dy + y\ dz} = 230/3$

[PROBLEM 11.]{.underline} work and flux both zero since constant field

[PROBLEM 12.]{.underline} work equals zero: this is a gradient 1-form
and the path is a loop

[PROBLEM 13.]{.underline} A) $- 10\ $; B) $19/15$ ; C)
$y\widehat{i}\  + \ xz\widehat{j} + x^{2}\widehat{k}$

[PROBLEM 14.]{.underline} work equals
$\frac{1}{3} - \frac{1}{2k} + \frac{k}{2k + 1}$ which limits to
$\frac{5}{6}$ as $k \rightarrow \infty$

[PROBLEM 15.]{.underline} A) $\alpha_{1} = 4\ dx - dy - dz$ ;
$\alpha_{2} = 3\ dx - dy + 4\ dz$ ; B) both equal $1/2$

[PROBLEM 16.]{.underline} A) $19\ $; B) $f = 3xy + y^{2} + z^{2}$ ; C)
$2$

[PROBLEM 17.]{.underline} $- 6$, directly, or via potential
$f = z(x - y^{2} + 1)$

[PROBLEM 18.]{.underline} A) ; B)

[PROBLEM 19.]{.underline} $e^{2} + e^{- 6} + e^{3} - 3$, via potential
$f = e^{xy} + e^{- yz} + e^{z}$

[PROBLEM 20.]{.underline} $4,$ via potential $f = z - xy^{2} + y^{2}$

[PROBLEM 21.]{.underline} $C = 1/4$ since integral evaluates to
$\frac{1}{3}(8C - 2)$

[PROBLEM 22.]{.underline} A)
$\overrightarrow{F} = 10\widehat{i} + 4\widehat{j}$ ; B)
$\alpha_{\overrightarrow{F}} = \left( x^{2} + 1 \right)dx + (x + y)dy$ ;
C) $\phi_{\overrightarrow{F}} = \left( x^{2} + 1 \right)dy - (x + y)dx$
; D) positive

[PROBLEM 23.]{.underline} A) $\alpha = - 2\ dx - 4\ dy$ ;
$\overrightarrow{V} = - 2\widehat{i} - \widehat{j}$ ; B) nope

[PROBLEM 24.]{.underline} $- 21/5$, via potential
$f = x^{2}/y\  - x\  + \ y^{3}$

[PROBLEM 25.]{.underline} the net work equals
$\int_{\gamma}^{}{\left( x^{2} + y \right)dx + (2xy - 1)dy} = 14/15$

[PROBLEM 26.]{.underline} $24$, via potential
$f = x - 2x\sqrt{yz} + 2y + 3z$

[PROBLEM 27.]{.underline} the net flux equals
$\int_{\gamma}^{}{\left( x^{2} + y^{2} \right)dy - (2xy - 1)dx} = 0$

[PROBLEM 28.]{.underline} the net work equals
$\int_{\gamma}^{}{(x + y)dx + (x - y)dy} = 8$

## 

## Week 13 : Differential Forms

### OUTLINE

MATERIALS: Calculus BLUE : Vol 4 : Chapters 6-8

TOPICS:

- Green's Theorem in the plane

- Work versus flux versions of Green's Theorem

- Path-dependence and orientation-dependence in Green's Theorem

- Curl and divergence of planar vector fields

- Curl and divergence of vector fields in 3-D

- Basis differential forms on $\mathbb{R}^{3}$ via determinants

- Differential form fields on $\mathbb{R}^{3}$

- Definition of the flux 2-form of a vector field in 3-D

- The wedge product $\land$ on forms and form fields

- The exterior derivative operator $d$ on form fields

- Vanishing theorems for grad, curl, and div

LEARNING OBJECTIVES:

- Use Green's Theorem to compute 1-form path integrals

- Use Green's Theorem to compute work/flux of vector fields

- Orient boundaries of planar domains

- Interpret and compute curl and divergence of vector fields

- Evaluate basis $k$-forms on vectors via determinants

- Relate basis $2$-forms to oriented projected areas

- Evaluate arbitrary form fields at points

- Compute and simplify the derivative $d$ of a $k$-form field

- Compute and simplify the product $\land$ of a pair of form fields

- Use the vanishing theorems $\nabla \times \nabla f = 0$ and
  $\nabla \cdot \nabla \times \overrightarrow{F} = 0$

### PRIMER

This is a critical week in which the first of our Fundamental Theorems
-- the Independence of Path Theorem -- is generalized.

GREEN's THEOREM. Some of the more unusual coincidences that arise when
computing integrals of 1-form fields over a loop. When integrating,
e.g., $x\ dy$ over any simple closed loop in the plane, one seems to
always obtain the area (up to sign, depending on the orientation of the
curve). This points to a deep result.

Green's Theorem: If $\gamma$ is the boundary of a domain
$D \subset \mathbb{R}^{2}$, and $f$ and $g$ are $C^{1}$ (continuously
differentiable) functions on $D$, then

$$\int_{\gamma}^{\ }{f\ dx + g\ dy}\  = \ \iint_{D}^{\ }{\frac{\partial g}{\partial x} - \frac{\partial f}{\partial y}\ \ dA}$$

Of course, in the case of $x\ dy$, the right-hand integrand evaluates to
$dA$, yielding the area as double integral. It will be convenient to
describe the relationship between the domain $D$ and its boundary curve
$\gamma$ symbolically. For multiple reasons, we use the following
cryptic shorthand: $\gamma = \partial D$, read "$\gamma$ is the boundary
of $D$." In the special case of a rectangular domain, the proof seems
closely related to the Fundamental Theorem of Integral Calculus.

In the special case where we are computing circulation or flux of a
planar vector field
$\overrightarrow{F} = F_{x}\widehat{i} + F_{y}\widehat{j}\ $, then
Green's Theorem takes on two subtly symmetrical forms:

$$circulation\  = \ \int_{\gamma}^{\ }{F_{x}\ dx + F_{y}\ dy}\  = \ \iint_{D}^{\ }{\frac{\partial F_{y}}{\partial x} - \frac{\partial F_{x}}{\partial y}\ \ dA}$$

$$flux\  = \ \int_{\gamma}^{\ }{F_{x}\ dy - F_{y}\ dx}\  = \ \iint_{D}^{\ }{\frac{\partial F_{x}}{\partial x} + \frac{\partial F_{y}}{\partial y}\ \ dA}$$

The dual integrands on the right-hand sides are two very special types
of derivatives associated to a planar vector field. $\ $

GRAD, CURL, DIV. Given a scalar field $f$ on $\mathbb{R}^{n}$, we can
understand its derivative in multiple ways: as the field of linear
transformations $\lbrack Df\rbrack$; as the vector field $\nabla f$; or
as the 1-form field $df$. What does it mean to take the derivative of a
vector field or a 1-form field? Let's begin with a planar vector field
$\overrightarrow{F} = F_{x}\widehat{i} + F_{y}\widehat{j}$. There are
two distinct ways of differentiating such a field, each yielding a
scalar field on $\mathbb{R}^{2}$.

$$curl\ \overrightarrow{F} = \ \frac{\partial F_{y}}{\partial x} - \frac{\partial F_{x}}{\partial y}\ \ \ \ \ \ \ \ :\ \ \ \ \ \ div\ \overrightarrow{F}\  = \ \frac{\partial F_{x}}{\partial x} + \frac{\partial F_{y}}{\partial y}$$

These have, thanks to Green's Theorem, interpretations in terms of
infinitesimal circulation \[curl\] and infinitesimal flux
\[divergence\]. Rotational or spinning vector fields have a nonzero curl
(with sign denoting the orientation of the rotation). A positive
divergence means that a vector field is locally expansive; local
contraction is indicated by a negative divergence. With this new
language, Green's Theorem is a local-to-global result: net circulation
along a boundary is the sum of all the infinitesimal circulations (curl)
over the interior; net flux across a boundary is the sum of all
infinitesimal fluxes (divergence) over the interior.

Next week, we will lift Green's Theorem to 3-D. To do so, we need
notions of curl and divergence for vector fields
$\overrightarrow{F} = F_{x}\widehat{i} + F_{y}\widehat{j} + F_{z}\widehat{k}$.
These are a bit more complex, as the curl is a vector field, but the
divergence is a scalar field:

$$curl\ \overrightarrow{F}\  = \ \nabla \times \overrightarrow{F}\  = \left( \frac{\partial F_{z}}{\partial y} - \frac{\partial F_{y}}{\partial z}\  \right)\widehat{i} + \left( \frac{\partial F_{x}}{\partial z} - \frac{\partial F_{z}}{\partial x}\  \right)\widehat{j} + \left( \frac{\partial F_{y}}{\partial x} - \frac{\partial F_{x}}{\partial y}\  \right)\widehat{k}\ $$

$$div\ \overrightarrow{F}\  = \ \nabla \cdot \overrightarrow{F}\  = \ \frac{\partial F_{x}}{\partial x} + \frac{\partial F_{y}}{\partial y} + \frac{\partial F_{z}}{\partial z}$$

The interpretations of div and curl in 3-D are similar to the 2-D case.
Divergence measures the expansion (positive) or contraction (negative)
of volumes induced by the vector field. The curl has three components,
each being a 2-D rotation in the three coordinate planes, as seen from
the formula. Together, the curl vector gives an axis of infinitesimal
rotation (the direction of the curl vector) and a strength of rotation
(the length of the curl). The symbolic notation used is a holdover from
Physics, where one imagines $\nabla$ as a "*vector of partial
differentiation operators*",

$$\nabla\  = \begin{pmatrix}
\frac{\partial}{\partial x} \\
\frac{\partial}{\partial y} \\
\frac{\partial}{\partial z}
\end{pmatrix}\ ,$$

with which one can dot or cross. It is good to know the notation, though
a more modern approach exists and has several advantages.

DIFFERENTIAL FORMS. Why is it that we call 1-forms, *1-forms*? The
prefix is a premonition of a deeper algebra of forms. For this and the
next week, we will restrict attention to Euclidean $\mathbb{R}^{3}$ with
a fixed basis of coordinates $(x,\ y,\ z)$, sometimes working with
$(x_{1},x_{2},x_{3})$ for generality.

The basis 1-forms are $dx$, $dy$, and $dz$ (or $dx_{i}$ in general):
these take in a vector and return a scalar by projecting the vector to
the $x$, $y$, or $z$ axis respectively. Basis 1-forms generate (linear)
1-forms at a point and (nonlinear) 1-form fields on $\mathbb{R}^{3}$.

We define basis 2-forms on $\mathbb{R}^{3}$ using a *wedge product*
notation: $dx \land dy$, $dy \land dz$, and $dz \land dx$. These objects
eat an ordered pair of vectors and return in scalar in a manner best
defined using determinants and indexed coordinates
$(x_{1},x_{2},x_{3})$:

$$\left( dx_{i} \land dx_{j} \right)\left( \mathbf{u},\ \mathbf{v} \right) = \det\begin{bmatrix}
u_{i} & v_{i} \\
u_{j} & v_{j}
\end{bmatrix}\ .$$

One thinks of the ${dx}_{i}$ terms as meaning "return the $x_{i}$
coordinate" and the wedge product $\land$ as "build a determinant". This
definition yields a few simple algebraic rules for the wedge:

- Antisymmetry : $dx_{i} \land dx_{j} = - dx_{j} \land dx_{i}$ for all
  $i,\ j$ ;

- Nilpotency : $dx_{i} \land dx_{i} = 0$ for all $i$.

Two-forms, being based on determinants, have geometric meaning: they
capture *oriented projected area*. Given a pair of vectors $\mathbf{u}$
and $\mathbf{v}$, the value of $dx \land dy$ on the pair
$\left( \mathbf{u},\ \mathbf{v} \right)$ is precisely the area of the
parallelogram spanned by these vectors and projected to the $(x,y)$
plane, with a +/- sign depending on orientation -- which vector comes
first.

Basis 2-forms in $\mathbb{R}^{3}$ lead to 2-form fields that vary from
point-to-point. What would this be good for? Consider the problem of
computing *flux* in 3-D. Given, say, a horizontal vector field
$\overrightarrow{F} = F_{x}\widehat{i}$, one can compute the flux
through a small window of area; however, this is only detected by the
projection of this window to the $(y,z)$ plane. As well, since
inside-vs-outside is also an orientation, flux in this case is a 2-form
$F_{x}dy \land dz$. For a general vector field $\overrightarrow{F}$ in
3-D, the *flux 2-form*,
$\Phi_{\overrightarrow{F}} = F_{x}dy \land dz + F_{y}dz \land dx + F_{z}dx \land dy$,
captures the flow of a vector field $\overrightarrow{F}$ as a function
of location, intensity, and the infinitesimal area patch defined by a
pair of vectors $\left( \mathbf{u},\ \mathbf{v} \right)$ at a point. We
will compute flux in the next Chapter.

Is that all? Almost. Following the pattern we have established, any
$k$-form vanishes for $k > 3$, since one of the basis 1-forms must be
repeated and the resulting determinant will have identical rows.
However, one can make sense of a 0-form as being an object that eats
zero vectors and returns a scalar -- such an object is itself simply a
scalar. A *0-form field* is a cognate of a scalar field.

ALGEBRA & CALCULUS of FORMS. Our goal is to do calculus with form
fields. To do so, we must first master their algebra. Any two $k$-form
fields on $\mathbb{R}^{3}$ can be added or subtracted. Multiplication is
subtly oriented. Based on how we defined basis 2- and 3-forms via the
$\land$ symbol, we define the wedge product $\alpha \land \beta\ $of two
form fields $\alpha$ and $\beta$ via extension from basis forms. A
$j$-form wedged with a $k$-form yields a $(j + k)$-form. One must be
careful with signs, recalling that for 1-forms the wedge is
antisymmetric: it is not necessarily so for other dimensions. Wedging
with a 0-form field is the usual multiplication by a scalar field.

Differentiation and integration are the missing ingredients for a
calculus of forms. The implicit differentiation operator $d$ we have
used throughout Calculus now ascends to a greater place.

There are a few interesting patterns which appear to tie together the
gradient, curl, and divergence operators. One can check via direct
computation that, for any scalar field $f$ and any vector field
$\overrightarrow{F}$ on $\mathbb{R}^{3}$,

$$\nabla \times \nabla f = 0\ \ \ \ \ \ \ \ and\ \ \ \ \ \ \ \nabla \cdot (\nabla \times \overrightarrow{F}) = 0\ .$$

Is this a coincidence? Rewriting in terms of form fields deepens the
mystery: for any 0-form field $f$ and any 1-form field $\alpha$,

$$d(df) = 0\ \ \ \ \ \ \ \ and\ \ \ \ \ \ \ d(d\alpha) = 0\ .$$

This is often written in shorthand notation as $d^{2} = 0$, where the
superscript denotes composition of differentiation, and the "$0$" means
the vanishing form field in the appropriate dimension. Such a
simplification of complicated combinations of partial derivatives points
to something deeper than coincidence.

### DISCUSSION

[QUESTION 1:]{.underline} What is the flux 1-form of
$\overrightarrow{F} = \left( x^{2} + 4y \right)\widehat{i}\  + \ \left( x + y^{2} \right)\widehat{j}\ $?
Use this to compute the flux of this vector field across the square in
the plane with corners at $( - 1,2)$ and $(2,5)$.

*Try doing it without Green's Theorem and then with it... Beware of
orientations! Which way is easier?*

[QUESTION 2:]{.underline} Compute the circulation of a fluid with
velocity field
$\overrightarrow{V} = (xy + y^{2})\widehat{i} + (x - y)\widehat{j}$
along the counterclockwise curve bounded by the graphs of $y = x^{2}$
and $x = y^{2}$.

This would require two path integrals, if you solved it with a path
integral.

[QUESTION 3:]{.underline} Integrate
$\alpha = \left( \cos x + {3x}^{2}y - 2y \right)dx + \left( x^{3}y + 4x - e^{2y} \right)dy$
over a counterclockwise circle in the plane centered at
$(x_{0},\ y_{0})$.

When a problem seems unreasonably difficult (as this would be to compute
directly), look for a Big Theorem to assist. The fact that the circle is
arbitrarily positioned is a hint that the right hand side of Green's
Theorem will have a simple -- in this case constant -- integrand.

[QUESTION 4:]{.underline} Use Green\'s Theorem to show that the centroid
of a region $D \subset \mathbb{R}^{2}$ is given by the path integrals

$$\overline{x}\  = \ \frac{1}{2A}\int_{\partial D}^{}{x^{2}\ dy}\ \ \ \ \ \ \ :\ \ \ \ \ \overline{y}\  = \ \frac{1}{2A}\int_{\partial D}^{}{- y^{2}\ dx}\ \ $$

where $A$ is the area of $D$.

Students find this challenging, depending on how it is phrased. Write
out the formulae for the centroid coordinates as double integrals; then
use Green\'s Theorem to convert the putative right hand side path
integrals. Then, look back at the original formulation. Why would anyone
want to compute the centroid this way? Consider the problem of
determining the centroid of a large complex parcel of land (say a dense
forest) without satellite imagery. One could have a drone circumnavigate
the boundary, keeping track of coordinates via GPS. Oh, but how can you
get the area to normalize? Try to see if students can figure out how to
use Green's Theorem to compute the area in a similar manner. This
foreshadows Volume 4 Chapter 15 on applications to data science.

[QUESTION 5:]{.underline} Draw a very complicated multiply-connected
region in the plane and ask for the appropriate orientations at various
points.

This leads to a lot of questions about how orientation works, as well as
some misery.

[QUESTION 6:]{.underline} (for students curious about the previous
problem) Draw a very *very* convoluted simple closed curve in the plane.
Pick a point somewhere deep inside the maze and ask "Is this on the
inside or the outside?"

Hopefully, students will ask whether the question is well-defined. For
students who can quickly navigate a maze in their head, ask what they
would do if the maze-like curve were 10 or 100 times as large and
complex. With a little prompting, students can figure out the algorithm
of drawing a transverse curve to the outside and counting intersections
mod 2. This is a good time to advertise other areas of Mathematics which
curious students may wish to investigate.

[QUESTION 7:]{.underline} Consider the planar vector field
$\overrightarrow{F} = (ax + by)\widehat{i} + (ay - bx)\widehat{j}$, for
$a$ and $b$ constants. What are the curl and divergence of this field?

It is perhaps best to begin with the case where $b = 0$ or $a = 0$ and
see what these special cases entail. This does not have an "ahha"
solution -- one explores until satisfied. As a follow-up question, what
changes upon the addition of $+ \ cz\ \widehat{k}$ to the vector field
(for $c$ a constant)?

[QUESTION 8:]{.underline} Consider the 2-form field
$\beta = \left( x^{2} - 2y \right)dy \land dz + (3y - 2z)dx \land dy$.

- Find a point (not at the origin) at which $\beta$ vanishes. *(What
  does that mean?)*

- Find a point at which $\beta$ is a positive multiple of $dx \land dy$.

- Find a point at which $\beta$ is a multiple of $dx \land dz$. *(This
  is impossible: why?)*

- Find a point at which $\beta$ equals $dy \land dz + dx \land dy$.

- Find a point at which $d\beta = dx \land dy \land dz$.

[QUESTION 9:]{.underline} Consider the scalar fields $f = 3x + 2y - z$
and $g = 5y - 4z$. These have constant derivative 1-forms $df,\ dg$.
What does the wedge $df \land dg$ mean?

This is very open-ended as worded. One can do a computation and move on;
or, with more careful observation, one considers the wedge 2-form as the
flux 2-form of some vector field. With a little hint, one sees the cross
product emerge, and an answer to the old question from Week 2 about why
the cross product only works in 3-D (but the wedge product works in all
dimensions).

[QUESTION 10:]{.underline} Compute the derivative of the 1-form

$$\alpha_{\overrightarrow{F}} = (x - y)dx + (y - z)dy + (z - x)dz$$

This 2-form field is, like every 2-form field, a flux 2-form field for
some vector field $\overrightarrow{V}$: what is this
$\overrightarrow{V}$?

It is, as per the lectures, the curl of $\overrightarrow{F}$; but it
helps to do the computations directly.

[QUESTION 11:]{.underline} Explain what is meant by the following claim:
*every basis* $k$*-form is a determinant*.

It's best to start with $k = 3$, it which case the claim is clear. Then
do $dx \land dy$: this eats a pair of vectors in $\mathbb{R}^{3}$, rips
out the $x$ and $y$ components, then stacks them in a 2-by-2 matrix. How
many ways are there to turn a pair of vectors into such a matrix?

End with the idea that the basis 1-forms are, trivially, determinants as
well.

[QUESTION 12:]{.underline} Follow-up to previous question: how do you
interpret the wedge $\land$ in terms of determinants?

This is best begun in the context of going from 1-forms to 2-forms; then
2-forms to 3-forms. Try to get students to interpret the algebraic rules
for $\land$ in terms of what we've learned about determinants in Week 4.
In 3-D, it's a bit trivial: a clever student may ask whether you can
wedge together 2-forms on $\mathbb{R}^{4}$ to get a nonzero 4-form.

[QUESTION 13:]{.underline} Why is it that there are no nonzero 4-forms
on $\mathbb{R}^{3}$?

Thinking in terms of determinants is a good idea, as ever.

[QUESTION 14:]{.underline} How do you remember Green's Theorem? How is
it related to what we have learned about 2-forms?

This is an essential problem to do live, since this is intentionally
left out of the videotext. Get students to work through the derivative
of the 1-form $\alpha = f\ dx + g\ dy$. What is the relationship between
the 2-form $dx \land dy = - dy \land dx$ and the area form
$dA = dx\ dy = dy\ dx$ ? Get to the point of being able to write down
$\int_{\partial D}^{\ }\alpha = \int_{D}^{\ }{d\alpha}$ and hint that
this is very close to the climax of the story. This is an excellent
setup for Week 14.

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Consider the vector field
$\overrightarrow{F} = \left( x^{4} - 2y \right)\widehat{i} + \left( y - y^{3} + x^{2} \right)\widehat{j}$.
Use Green's Theorem to compute the circulation of $\overrightarrow{F}$
along the circle given by the equation $x^{2} + (y + 3)^{2} = 9$. Assume
a counterclockwise orientation to the curve.

[PROBLEM 2.]{.underline} Use Green's Theorem to compute the work done by
the planar vector field
$\overrightarrow{F} = \left( x^{2} - 3y \right)\widehat{i} + \left( 2x + y^{3} \right)\widehat{j}$
along the path that:

> 1\) starts at the origin;\
> 2) follows the curve $y = x^{2}$ to $(1,\ 1)$;\
> 3) then follows the curve $y = \sqrt{x}\ $back to the origin.

[PROBLEM 3.]{.underline} State a version
of Green's Theorem (any version is fine, as long as you explain what the
various terms are).

B\) Use Green's Theorem to rewrite

$$\int_{\gamma}^{\ }{e^{xy}\ dx + \cos^{2}{3y}\ dy}\ \ \ \ \ \ \ \ $$

as an explicit double integral with careful bounds, for the curve
$\gamma$ as shown. Do not solve the double integral.

[PROBLEM 4.]{.underline} Consider the vector field
$\overrightarrow{F} = \left( x^{3} - 3y \right)\widehat{i} + \left( x^{2} - y^{3} \right)\widehat{j}$.
Use Green's Theorem to compute the circulation of $\overrightarrow{F}$
along the parametrized curve given by

$$\gamma(t) = \begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
3\cos{(t^{2})} \\
3\sin{(t^{2})}
\end{pmatrix}\ \ \ :\ \ \ 0 \leq t \leq \sqrt{2\pi}$$

[PROBLEM 5.]{.underline} Use Green's theorem to compute the integral

$$\int_{\gamma}^{\ }{\left( x^{2} + y^{2} - 2y \right)dx + (2xy)dy}$$

where $\gamma$ is the (counterclockwise oriented) curve given by the
boundary of the rectangle $0 \leq x \leq 2$, $- 1 \leq y \leq 3$.

[PROBLEM 6.]{.underline} Use Green's Theorem to compute the integral of
the 1-form field

$$\alpha = \left( x\cos x - y\left( 2 - 3x^{2} \right) \right)dx + \left( x^{3} - 3y^{4} \right)dy$$

over the closed curve $\gamma$ given by the boundary of the triangle
with corners at $( - 1,\ 0)$, $(3,\ 0)$, and $(3,\ 5)$, following the
points in that order.

[PROBLEM 7.]{.underline} Use Green's Theorem to compute the integral of
the 1-form field
$\alpha = \left( \cos x + 2y \right)dx + (x - \sin y)dy$ over the curve
$\gamma$ parametrized as

$$\gamma(t) = \begin{pmatrix}
x \\
y
\end{pmatrix} = \begin{pmatrix}
1 + 3\cos{(t)} \\
3\sin{(t)} - 2
\end{pmatrix}\ \ \ :\ \ \ 0 \leq t \leq 2\pi$$

[PROBLEM 8.]{.underline} Use Green's Theorem to compute the circulation
of the planar vector field
$\overrightarrow{F} = \left( x^{2}y - x^{5} \right)\widehat{i}\  + \ \left( y^{3} - xy^{2} \right)\widehat{j}$
along the (counterclockwise) circle of radius two centered at the
origin.

[PROBLEM 9.]{.underline} Use Green's Theorem to compute the integral of
the 1-form field $\alpha = {(e}^{x} - 2y)dx + (x^{2} - y^{2})dy$ over
the closed curve $\gamma$ given by the rectangle with corners at
$( - 1, - 2)$, $(3, - 2)$, $(3,\ 1)$, and $( - 1,\ 1)$, following the
points in that order.

[PROBLEM 10.]{.underline} Compute the flux of the vector field

$$\overrightarrow{F} = (3xy - 5)\widehat{i} + \left( x^{2} + 4y \right)\widehat{j}$$

across the boundary of the domain in the plane satisfying
$0 \leq x \leq 1 - y^{2}$. Assume a counterclockwise boundary.

[PROBLEM 11.]{.underline} Consider the planar vector field

$$\overrightarrow{F} = x\left( 1 - x^{2} - y^{2} \right)\widehat{i} + y\left( 1 - x^{2} - y^{2} \right)\widehat{j}$$

A\) Let $\gamma$ be the path which goes from $(R,\ 0)$ to $( - R,0)$
along a semicircle of radius $R$ in the upper half-plane where
$y \geq 0$. Use Green's theorem to compute the flux of the field
$\overrightarrow{F}$ across the path $\gamma$ (using usual orientation).

B\) For what value of $R > 0$ is this flux zero?

[PROBLEM 12.]{.underline} Consider the vector field in 3-D

$$\overrightarrow{F} = \left( 2x - y^{2} \right)\widehat{i} + \left( z + x^{2} \right)\widehat{j} + \left( xy^{2} \right)\widehat{k}$$

A\) Compute and simplify the divergence
$\nabla \cdot \overrightarrow{F}$ of this vector field.

B\) What does the divergence you computed in part (A) tell you?

C\) Compute and simplify the curl $\nabla \times \overrightarrow{F}$ of
this vector field.

[PROBLEM 13.]{.underline} Consider the vector field in 3-D

$$\overrightarrow{F} = (x - y)\widehat{i} + (z + x)\widehat{j} + z\ \widehat{k}$$

A\) Compute the divergence $\nabla \cdot \overrightarrow{F}$ of this
vector field.

B\) Compute the curl $\nabla \times \overrightarrow{F}$ of this vector
field.

C\) What does your answer in (B) tell you about the field?

[PROBLEM 14.]{.underline} Consider the following 1-form field

$$\alpha = \left( e^{x} + y^{2} \right)dx + \left( z - \sin x \right)dy - y\ dz$$

A\) Compute and simplify the derivative $d\alpha$.

B\) Compute the integral of $\alpha$ over the unit circle in the
$(y,\ z)$ plane.

[PROBLEM 15.]{.underline} Let $\alpha$ and $\beta$ be the following form
fields on $\mathbb{R}^{3}$:

$$\alpha = xy\ dx - 2z^{3}dy + (x - y)^{2}dz\ \ \ \ \ \ :\ \ \ \ \beta = - 3x\ dy \land dz + \frac{1}{z}dx \land dz$$

A\) What is the value of $\beta$ at the point $(2,5, - 1)$?

B\) What is the value of $\alpha$ at the point $(1,1,1)$ evaluated on
the vector $v = \begin{pmatrix}
1 \\
3 \\
 - 2
\end{pmatrix}$?

C\) Compute and simplify $d\alpha$, the derivative of $\alpha$.

D\) Compute and simplify the product $\alpha \land \beta$.

[PROBLEM 16.]{.underline} Let $\alpha$ and $\beta$ be the following form
fields on $\mathbb{R}^{3}$:

$$\alpha = \ 3\ dx + z\ dy + y^{2}dz\ \ \ \ \ \ \ :\ \ \ \ \ \beta = 2\ dy \land dz$$

A\) What is the value of $\alpha$ at the point $(1,\ 2,\ 3)$?

B\) Compute and simplify $d\alpha$, the derivative of $\alpha$.

C\) What does $\beta$ measure?

D\) Compute and simplify the product $\alpha \land \beta$.

[PROBLEM 17.]{.underline} Consider the following fields on
$\mathbb{R}^{3}$:

$${f = \ xy^{2} - yz^{2}
}{\overrightarrow{V} = \left( x^{2} \right)\widehat{i} + (xyz)\widehat{j} + \left( z^{2} - 3xy \right)\widehat{k}}$$

A\) Compute and simplify $\nabla \cdot \nabla f$, the divergence of the
gradient of $f$.

B\) Compute and simplify $\ \nabla(\nabla \cdot \overrightarrow{V})$,
the gradient of the divergence of $\overrightarrow{V}.$

C\) Compute and simplify $\nabla \times \nabla f$, the curl of the
gradient of $f$.

[PROBLEM 18.]{.underline} Consider the vector field

$$\overrightarrow{F} = (x + z)\widehat{i} + (2y)\widehat{j} + (y - z)\widehat{k}$$

A\) Write out carefully the work 1-form $\alpha_{\overrightarrow{F}}$
associated with the field $\overrightarrow{F}$.

B\) Write out carefully the flux 2-form $\Phi_{\overrightarrow{F}}$
associated with the field $\overrightarrow{F}$.

C\) Compute the derivative ${d(\alpha}_{\overrightarrow{F}})$ of the
work 1-form $\alpha_{\overrightarrow{F}}$ associated with the field
$\overrightarrow{F}$.

D\) What is the relationship between $\overrightarrow{F}$ and
$d\alpha_{\overrightarrow{F}}$?

[PROBLEM 19.]{.underline} Consider the following fields on
$\mathbb{R}^{3}$:

$${g = \ xz - xyz^{2}
}{\overrightarrow{F} = \left( x^{2}y \right)\widehat{i} + \left( y^{2}z \right)\widehat{j} + \left( x^{3} + y^{3} + z^{3} \right)\widehat{k}}$$

A\) Compute and simplify $\nabla \cdot \nabla g$, the divergence of the
gradient of $g$, if possible.

B\) Compute and simplify $\ \nabla(\nabla \cdot \overrightarrow{F})$,
the gradient of the divergence of $\overrightarrow{F}$, if possible.

C\) Compute and simplify $\nabla \times \nabla g$, the curl of the
gradient of $g$, if possible.

D\) Compute and simplify $\nabla \times g$, the curl of $g$, if
possible.

[PROBLEM 20.]{.underline} Consider the following 1-form fields on
$\mathbb{R}^{3}$:

$$\alpha = \ dx + \left( 3x^{2} - 4y \right)dy + e^{z}\ dz\ \ \ \ \ \ :\ \ \ \ \widetilde{\alpha} = 2y\ dx + dy + x\ dz$$

A\) Is there any point at which these two 1-forms are equal?

B\) Compute and simplify the derivative $d\alpha$.

C\) Compute and simplify the product $\widetilde{\alpha} \land d\alpha$.

D\) Why is $d\alpha \land d\widetilde{\alpha} = 0$?

[PROBLEM 21.]{.underline} Consider the following vector fields on
$\mathbb{R}^{3}$:

$\overrightarrow{F} = (x - y)\widehat{i} + z\ \widehat{j} + (z - x)\widehat{k}$
:
$\overrightarrow{G} = \left( x^{2}(y - z) \right)\widehat{i} - \left( xy^{2} - z^{2} \right)\widehat{j} + \left( 3xz^{2} \right)\widehat{k}$

A\) Compute and simplify $\nabla \cdot \overrightarrow{G}$, the
divergence of $\overrightarrow{G}$.

B\) Compute and simplify $\ \nabla \times \overrightarrow{F}$, the curl
of $\overrightarrow{F}$.

C\) Compute and simplify $d\Phi_{\overrightarrow{G}}$ , the derivative
of the flux 2-form field of $\overrightarrow{G}$.

[PROBLEM 22.]{.underline} Consider the following scalar fields on
$\mathbb{R}^{3}$:

$$f = \ \ x^{2}y - y\ \ \ \ \ :\ \ \ \ g = 2xz + y^{2}$$

A\) Compute the gradient 1-form fields $df$ and $dg$.

B\) Compute and simplify the product $df \land dg$ and show that it
equals:

$$df \land dg = \left( 4xy^{2} - 2x^{2}z + 2z \right)\ dx \land dy + (2x^{3} - 2x)\ dy \land dz - 4x^{2}y\ dz \land dx$$

C\) Find a point not at the origin at which $df \land dg = 0$.

D\) The product $df \land dg$ is the flux 2-form of some vector field
$\overrightarrow{V}$. What is $\overrightarrow{V}$?

[PROBLEM 23.]{.underline} Consider the 1-form fields

$$\alpha_{1} = (y - e^{y^{2}})dx + x^{3}\cos ydy\ \ \ \ \ \ :\ \ \ \ \ \alpha_{2} = - 3x^{2}\sin ydx\  + \ 2xye^{y^{2}}dy$$

A\) Compute the derivatives $d\alpha_{1}$ and $d\alpha_{2}$.

B\) Integrate $\alpha_{1}$ over the straight-line path from $(0,0)$ to
$(1,0)$.

C\) Let $\gamma$ be the counterclockwise path around a circle of radius
five centered at the origin. Which is bigger:
$\int_{\gamma}^{}\alpha_{1}$ or $\int_{\gamma}^{}\alpha_{2}$ ?

[PROBLEM 24.]{.underline} Let

$$f = xz^{2} - y^{2}\ \ \ :\ \ \ \alpha = 3\ dx - x^{2}dy\ \ \ :\ \ \ \beta = \ x^{2}y\ dy \land dz + xy^{2}\ dz \land dx + dx \land dy$$

A\) Fill in the blanks:

1\) $f$ is a \_\_\_\_-form\
2) $df$ is a \_\_\_\_-form\
3) $d\beta$ is a \_\_\_\_-form\
4) $df \land \alpha$ is a \_\_\_\_-form

B\) Calculate and simply $d\beta$ as much as possible, showing work.

C\) Calculate and simplify $df \land \alpha$ as much as possible,
showing all steps below.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x^{4} - 2y \right)dx + \left( y - y^{3} + x^{2} \right)dy} = \iint_{D}^{\ }{2x + 2}\ dA = 18\pi$$

[PROBLEM 2.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x^{2} - 3y \right)dx + \left( 2x + y^{3} \right)dy} = \iint_{D}^{\ }5\ dA = \frac{5}{3}$$

[PROBLEM 3.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{e^{xy}dx + \cos^{2}{3y}dy} = \int_{- \frac{3\pi}{4}}^{\frac{\pi}{4}}{\int_{\sin x}^{\cos x}{- x\ e^{xy}\ dA}}$$

[PROBLEM 4.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x^{3} - 3y \right)dx + \left( x^{2} - y^{3} \right)dy} = \iint_{D}^{\ }{2x + 3}\ dA = 27\pi$$

[PROBLEM 5.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x^{2} + y^{3} - 2y \right)dx + (2xy)dy} = \iint_{D}^{\ }{2x + 2 - 3y^{2}}\ dA = - 16$$

[PROBLEM 6.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x\cos x - y\left( 2 - 3x^{2} \right) \right)dx + \left( x^{3} - 3y^{4} \right)dy} = \iint_{D}^{\ }2\ dA = 20$$

[PROBLEM 7.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( \cos x + 2y \right)dx + \left( x - \sin y \right)dy\ } = \iint_{D}^{\ }{- 1}\ dA = - 9\pi$$

[PROBLEM 8.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( x^{2}y - x^{5} \right)dx + \left( y^{3} - xy^{2} \right)dy\ } = \iint_{D}^{\ }{- (x^{2} + y^{2})}\ dA = - 8\pi$$

[PROBLEM 9.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }{\left( e^{x} - 2y \right)dx + \left( x^{2} - y^{2} \right)dy\ } = \iint_{D}^{\ }{2x + 2}\ dA = 48$$

[PROBLEM 10.]{.underline} by Green's Theorem

$$\int_{\partial D}^{\ }\phi_{\overrightarrow{F}} = \int_{\partial D}^{\ }{(3xy - 5)dy - \left( x^{2} + 4y \right)dx\ } = \iint_{D}^{\ }{3y + 4}\ dA = 4A = \frac{64}{3}$$

[PROBLEM 11.]{.underline} complete the arc to the half-disc, using a
path along the $x$-axis from $( - R,0)$ back to $(R,0)$; there is zero
flux along this line since $y = 0$; by Green's Theorem

$$\int_{\partial D}^{\ }\phi_{\overrightarrow{F}} = \int_{\partial D}^{\ }{\left( y\left( 1 - r^{2} \right) \right)dy - \left( x\left( 1 - r^{2} \right) \right)dx\ } = \iint_{D}^{\ }{2 - 4r^{2}}\ dA = \pi R^{2}\left( 1 - R^{2} \right)$$

which is zero along the unit-circular arc where $R = 1$.

[PROBLEM 12.]{.underline} A) $\nabla \cdot \overrightarrow{F} = 2$ ; C)
$\nabla \times \overrightarrow{F} = \ (2xy - 1)\widehat{i} - y^{2}\widehat{j} + 2(x + y)\widehat{k}$

[PROBLEM 13.]{.underline} A) $\nabla \cdot \overrightarrow{F} = 2$ ; B)
$\nabla \times \overrightarrow{F} = - \widehat{i} + 2\widehat{k} = \left( \begin{array}{r}
 - 1 \\
0 \\
2
\end{array} \right)$ constant

[PROBLEM 14.]{.underline} A)

$${d\alpha = \left( e^{x}\ dx + 2y\ dy \right) \land dx + \left( dz - \cos xdx \right) \land dy - dy \land dz
}{\ \ \ \ \ \ \  = \left( - 2y - \cos x \right)dx \land dy - 2dy \land dz}$$

B\) by Green's Theorem in the $(y,z)$ plane,

$$\int_{\partial D}^{\ }{\left( e^{x} + y^{2} \right)dx + \left( z - \sin x \right)dy - y\ dz} = \iint_{D}^{\ }{- 2}\ dy\ dz = - 2\pi$$

[PROBLEM 15.]{.underline} A) $- 6\ dy \land dz - dx \land dz$ ; B) $- 5$
; C/D)

$$d\alpha = - x\ dx \land dy + \left( 6z^{2} - 2(x - y) \right)dy \land dz + 2(x - y)dx \land dz$$

$$\alpha \land \beta = \left( 2z^{2} - 3x^{2}y \right)dx \land dy \land dz$$

[PROBLEM 16.]{.underline} A) $3\ dx + 3\ dy + 4\ dz$ ; B)
$d\alpha = (2y - 1)dy \land dz$ ; C) twice the oriented projected area
in the $(y,z)$ plane ; D) $\alpha \land \beta = 6\ dx \land dy \land dz$

[PROBLEM 17.]{.underline}

$$\nabla \cdot \nabla f = \nabla \cdot \left( y^{2}\widehat{i} + \left( 2xy - z^{2} \right)\widehat{j} - 2yz\widehat{k} \right) = 2(x - y)$$

$$\nabla \cdot \overrightarrow{V} = 2x + xz + 2z$$

$$\nabla\left( \nabla \cdot \overrightarrow{V} \right) = (2 + z)\widehat{i} + (2 + x)\widehat{k}$$

$$\nabla \times \nabla f = 0$$

[PROBLEM 18.]{.underline}

$$\alpha_{\overrightarrow{F}} = (x + z)dx + (2y)dy + (y - z)dz$$

$$\Phi_{\overrightarrow{F}} = (x + z)dy \land dz + (2y)dz \land dx + (y - z)dx \land dy$$

$$d\alpha_{\overrightarrow{F}} = dy \land dz - dx \land dz = \Phi_{\nabla \times \overrightarrow{F}}$$

[PROBLEM 19.]{.underline}

$$\nabla \cdot \nabla g = \nabla \cdot \left( \left( z - yz^{2} \right)\widehat{i} - xz^{2}\widehat{j} + (x - 2xyz)\widehat{k} \right) = - 2yz$$

$$\nabla\left( \nabla \cdot \overrightarrow{F} \right) = \nabla\left( 2xy + 2yz + 3z^{2} \right) = (2y)\widehat{i} + (2x + 2z)\widehat{j} + (2y + 6z)\widehat{k}$$

$$\nabla \times \nabla g = 0$$

[PROBLEM 20.]{.underline} A) $\alpha = \widetilde{\alpha}$ at
$\left( 1,\ \frac{1}{2},\ 0 \right)$ ; B) $d\alpha = 6x\ dx \land dy$ ;
C) $\widetilde{\alpha} \land d\alpha = 6x^{2}dx \land dy \land dz$ ; D)
no nonzero 4-forms on $\mathbb{R}^{3}$

[PROBLEM 21.]{.underline}

$$\nabla \cdot \overrightarrow{G} = 2x(y - z) - 2xy + 6xz$$

$$\nabla \times \overrightarrow{F} = - \widehat{i} + \widehat{j} + \widehat{k} = \left( \begin{array}{r}
 - 1 \\
1 \\
1
\end{array} \right)$$

$$d\Phi_{\overrightarrow{G}} = (\nabla \cdot \overrightarrow{G})dx \land dy \land dz = \left( 2x(y - z) - 2xy + 6xz \right)dx \land dy \land dz$$

[PROBLEM 22.]{.underline} A) $df = 2xy\ dx + (x^{2} - 1)dy$ ;
$dg = 2z\ dx + 2y\ dy + 2x\ dz$ ; C) anything of the form $(0,\ C,\ 0)$
or $(1,\ 0,\ C)$ ; D)
$\overrightarrow{V} = \left( 2x^{3} - 2x \right)\widehat{i} - 4x^{2}y\widehat{j} + \left( 4xy^{2} - 2x^{2}z + 2z \right)\widehat{k}\ $

[PROBLEM 23.]{.underline} A)

$$d\alpha_{1} = \left( 3x^{2}\cos y + 2ye^{y^{2}} - 1 \right)\ dx \land dy\ \ \ \ :\ \ \ d\alpha_{2} = \left( 3x^{2}\cos y + 2ye^{y^{2}} \right)\ dx \land dy$$

B\) $\int_{\ }^{}\alpha_{1} = - 1$ ; C) by Green's Theorem and linearity

$$\int_{\gamma = \partial D}^{}{\alpha_{2} - \alpha_{1}} = \iint_{D}^{\ }{d\alpha_{2} - d\alpha_{1}} = \iint_{D}^{\ }{dx\ dy} = 25\pi > 0$$

thus the integral of $\alpha_{2}$ is larger

[PROBLEM 24.]{.underline} A) 0, 1, 3, 2;

$$d\beta = 4xy\ dx \land dy \land dz$$

$$df \land \alpha = \left( - x^{2}z^{2} + 6y \right)dx \land dy + 6xz\ dz \land dx + 2x^{3}z\ dy \land dz$$

## 

## Week 14 : The Fundamental Theorem

### OUTLINE

MATERIALS: Calculus BLUE : Vol 4 : Chapters 9-12

TOPICS:

- Integration of 2-form fields over parametrized surfaces

- Flux of a vector field across a surface

- Gauss' Theorem for flux across a boundary surface

- Stokes' Theorem for circulation along a boundary loop

- Orientation and induced orientation on a boundary

- The differential forms version of Green/Gauss/Stokes/IoP

- The relationship between the Fundamental Theorems of Calculus

LEARNING OBJECTIVES:

- Integrate a 2-form field over a parametrized surface

- Interpret the integral of a 2-form as flux across an oriented surface

- Use Gauss' Theorem to simplify/compute integrals

- Use Stokes' Theorem to simplify/compute integrals

- Determine the induced boundary orientation of an oriented surface

- Choose the correct Fundamental Theorem to solve a given integral

### PRIMER

This is the end of our story. The main results of this final week begin
as generalizations of Green's Theorem from 2-D to 3-D. In the same way
that there are two interpretations of Green's Theorem -- work and flux
-- there are two distinct generalizations to 3-D. These entwine the
parallel notions of curl and divergence, while tying together all we
have learned about differential forms.

INTEGRATING 2-FORM FIELDS. In Week 12, we interpreted 1-form fields as
measuring the work done by a vector field $\overrightarrow{F}$ in a
particular direction. Given an oriented path $\gamma$, the 1-form takes
in the tangent vector along the curve and returns a scalar -- the work
done by $\overrightarrow{F}$ along the tangent vector to the path. Thus,
1-form fields are integrated over paths and return net work.

Working with 2-form fields is similar. In 3-D, we interpret a 2-form
field as the flux 2-form $\Phi_{\overrightarrow{F}}$ of a vector field
$\overrightarrow{F}$. We do not compute flux across a curve in 3-D;
rather, flux is computed across an oriented surface $S$. In the vector
notation of Physics, one computes the flux of $\overrightarrow{F}$
across $S$ as a surface integral (*cf.* Week 11) using the surface are
element $d\sigma$ and a field $\widehat{n}$ of unit vectors *normal*
(this is, orthogonal) to the surface. The dot product between
$\overrightarrow{F}$ and $\widehat{n}$ gives infinitesimal flux, which
can then be integrated over the surface. This will be our motivation to
integrate 2-form fields:

$$flux = \ \iint_{S}^{\ }{\overrightarrow{F} \cdot \widehat{n}\ d\sigma}\  = \ \ \int_{S}^{\ }\Phi_{\overrightarrow{F}}$$

In practice, is can be difficult to compute both $\widehat{n}$ and
$d\sigma$, and the direct integration of the flux 2-form field is often
cleaner. For a parametrized surface given by
$S:\mathbb{R}^{2} \rightarrow \mathbb{R}^{3}$ with parameters $s$ and
$t$, the tangent plane to the surface is spanned by the two columns of
$\lbrack DS\rbrack$ -- the partial derivatives $\partial S/\partial s$
and $\partial S/\partial t$. These are the vectors which (in that order)
are taken in by the flux 2-form $\Phi_{\overrightarrow{F}}$, returning
an infinitesimal flux across the surface at that point. By integrating
this over the parameter plane $(s,t)$ we obtain the net flux. The Change
of Variables Theorem implies that only the surface matters, not the way
in which it is parametrized (so long as orientations match).

There is no need to cast things exclusively in terms of flux. In the
same way that work serves as motivation to define the integral of a
1-form field $\alpha$ over a path $\gamma$, one can define the integral
of *any* 2-form field $\beta$ over a surface parametrized by
$S = S(s,t)$ by feeding the columns of the derivative
$\lbrack DS\rbrack$ into $\beta$ and integrating with respect to the
area element $dA = ds\ dt$ in the parameter plane:

$$\int_{S}^{\ }\beta\  = \iint_{}^{}{\beta_{S}\lbrack DS\rbrack\ dA}\  = \ \iint_{s,\ t}^{\ }{\beta_{S(s,t)}\left( \frac{\partial S}{\partial s},\frac{\partial S}{\partial t} \right)}\ ds\ dt$$

This looks more complex than it is -- one simply integrates the values
of $\beta$ over the parameter plane. A single integral sign is used to
denote the integral of a $k$-form field, as opposed to the single,
double, or triple integral used on the parameter space when evaluating
the integral explicitly.

THE GAUSS THEOREM. The flux form of Green's Theorem can be lifted to
3-D. The Physics/vector notation version of the theorem states that for
a vector field $\overrightarrow{F}$ on $\mathbb{R}^{3}$, the flux of
$\overrightarrow{F}$ across the (oriented) boundary of a solid domain
$D$ is the net divergence of $\overrightarrow{F}$ within the interior of
$D$:

$$flux\  = \iint_{\partial D}^{\ }{\overrightarrow{F} \cdot \widehat{n}\ d\sigma}\  = \ \ \iiint_{D}^{\ }{\nabla \cdot \overrightarrow{F}}\ dV\  = \ net\ div\ .$$

This replacement of a surface integral with a triple integral is often
advantageous, given the complexities we have seen with surface area
elements. One tradeoff comes in keeping track of orientations. For a
solid domain $D$, the boundary surface $\partial D$ has an
*outward-pointing* unit normal vector field $\widehat{n}$. This can be
an issue with more complicated solid bodies having tunnels or interior
cavities: the normal vector should always point from the inside to the
outside.

When written in the language of differential forms, Gauss' Theorem
becomes much easier to write and remember. Recall from last week that
the derivative of a flux 2-form is the divergence of the vector field
time the usual volume 3-form. The left-hand side of the Gauss Theorem
becomes the integral of a flux 2-form field, and the right-hand side
becomes the integral of its derivative:

$$\int_{\partial D}^{}\Phi_{\overrightarrow{F}}\ \  = \ \ \int_{D}^{}{d\Phi_{\overrightarrow{F}}}\ .$$

It is not a coincidence that this is the same as the differential forms
version of Green's Theorem but in 3-D with 2-forms instead of 1-forms.

THE STOKES THEOREM. The circulation form of Green's Theorem can be
lifted to 3-D. In Physics/vector notation, the theorem is stated thusly:

$$circ\  = \int_{\partial D}^{}{\overrightarrow{F} \cdot d\mathbf{x}}\  = \ \ \iint_{D}^{}{(\nabla \times \overrightarrow{F}) \cdot \widehat{n}\ d\sigma}\  = \ flux\ of\ curl\ $$

This has a curious corollary. If you are tasked with computing the flux
of the curl of a vector field (see next week's Epilogue for why this
might happen in fluid dynamics or electromagnetics) across a complicated
surface, you can simplify the surface as you wish, so long as the
boundary remains the same. This is similar in spirit to the Independence
of Path Theorem -- integrating a derivative over the interior is the
same as evaluating the principal on the boundary, so that only the
boundary matters. In the case of Stokes' Theorem, we might call this
feature an "*Independence of Surface*" result.

In the language of differential forms, we once again have a vast
simplification:

$$\int_{\partial D}^{}\alpha_{\overrightarrow{F}}\ \  = \ \ \int_{D}^{}{d\alpha_{\overrightarrow{F}}}\ .$$

Unlike Gauss' Theorem (which usually goes in one direction -- replacing
a flux integral with a net divergence integral), Stokes' Theorem can be
useful in either direction, to compute a circulation via a surface
integral, or to compute a flux of a curl as a circulation. One must, as
always, worry about the correct orientation.

**THE** FUNDAMENTAL THEOREM. Written in the language of differential
forms, all the major theorems of vector calculus have the same form:

$$\int_{\partial D}^{}\omega\  = \ \ \int_{D}^{}{d\omega}\ .$$

This, the Generalized Stokes' Theorem, holds for any $k$-form field
$\omega$ defined on an oriented $(k + 1)$-dimensional domain $D$ with
oriented boundary $\partial D$. This is the end of our story and the
goal to which we have worked. With the proper language of differential
forms, the unity of Green, Gauss, Stokes, and the Independence of Path
Theorems is manifest. They are all -- in substance and in proof -- the
Fundamental Theorem of Integral Calculus:

$$\ \int_{\lbrack a,b\rbrack}^{}{df} = \ \int_{\partial\lbrack a,b\rbrack}^{}f\  = \ \ f(b) - f(a).$$

By interpreting the integral of a $0$-form field $f$ over a
$0$-dimensional point $x$ as the evaluation $f(x)$; and by using $+ / -$
orientation as a sign on the integral (as always done), the integral of
$f$ over the boundary of an interval $\lbrack a,b\rbrack$ is the
evaluation at the "positive" endpoint $b$ minus the evaluation at the
negative endpoint $a$. This, the FTIC, is both the prototype of the
generalized Stokes' Theorem and the core ingredient of its proof.

### DISCUSSION

[QUESTION 1:]{.underline} Compute the integral of the 2-form
$\beta = z\ dx \land dy\  - \ x^{2}\ dy \land dz$ over the surface given
by $z\  = \ 4 - x^{2} - y^{2}$ with $z \geq 0$. Use the positive
$z$-axis to orient.

Begin with parametrizing the surface? That is one approach and makes for
a good exercise in the definitions; but consider the symmetry of this
object and what the $dx \land dy$ versus $dy \land dz$ terms do on this
particular surface. This can be integrated very nicely without explicit
parametrization or worrying about integrating 2-forms at all. Why is the
final answer related to an enclosed volume?

[QUESTION 2:]{.underline} What is the integral of the 2-form field
$\beta = e^{- z}\ dx \land dy$ over the paraboloid given by
$z\  = \ x^{2} + y^{2}$? Use the positive $z$-axis to orient.

This seems impossible, as the paraboloid extends infinitely; however,
this gives a finite integral that can be easily computed either via
polar coordinates or via remembering basic Gaussians.

[QUESTION 3:]{.underline} What is the flux of
$\overrightarrow{F} = x\ \widehat{i}\  + \ y^{2}\ \widehat{j} + (z + y)\widehat{k}$
across the boundary of the cylindrical solid within
$x^{2} + y^{2} \leq \ 4$, below $z = 8$, and above $z = x$? Use an
outward-pointing normal.

This includes all three boundary components and is clearly set up to be
a Gauss Theorem problem. A little bit of reasoning with symmetry
suffices to make short work of the triple integral.

[QUESTION 4:]{.underline} What is the flux of the field
$\overrightarrow{F} = y^{2}\widehat{i} - x^{2}\widehat{j} + 2\widehat{k}$
across the upper hemisphere of radius $R$ at the origin? Orient it with
the positive $z$-axis.

This is one of the few flux problems in which computing the unit normal
field and taking the dot product is perhaps helpful, though this problem
can be done in multiple ways. Try doing it with the direct computation
(using the surface are element from spherical coordinates if needed);
then, try with Gauss' Theorem by completing to a solid hemisphere,
noting that the divergence is zero, then computing the flux across the
disc in the $(x,y)$ plane.

Since the divergence of $\overrightarrow{F}$ is zero, we can conclude
that it is a curl:
$\overrightarrow{F} = \nabla \times \overrightarrow{V}$ for some
$\overrightarrow{V}$. Can you figure out what that field would be? (This
is perhaps not so easy...) If so, one could get the flux of
$\overrightarrow{F}$ by computing the circulation of
$\overrightarrow{V}$ along the circle of radius $R$ in the $(x,y)$
plane.

[QUESTION 5:]{.underline} Compute the following circulation:

$$\int_{\gamma}^{}{\left( e^{z} + 3y\cos x \right)dx + \left( 3\sin x \right)dy + \left( 2 + xe^{z} \right)dz}\ \ $$

where the curve $\gamma$ is parametrized as

$$\ \ \gamma = \left( \begin{array}{r}
\left( 5 + \cos{3t}\  \right)\cos t \\
\left( 5 + \cos{3t} \right)\sin t \\
\sin t
\end{array} \right)\ \ ;\ \ 0 \leq t \leq 2\pi$$

The first step is to notice that a direct computation of this
circulation is practically impossible. This suggests trying Stokes'
Theorem. However, that too seems daunting, as the curve is not easily
seen to be the boundary of a "nice" surface. Perhaps some trick
involving the independence of surface property of Stokes? No. The lesson
here is to go ahead and compute $d\alpha$ (or the curl of the associated
vector field) and see that it vanishes, obviating the need for dealing
with the geometry of the surface.

[QUESTION 6:]{.underline} Compute the circulation of the field
$e^{- x}\widehat{i} + e^{x}\widehat{j} + e^{z}\widehat{k}$ along the
triangle in the first octant (where $x,\ y,\ z \geq 0$) cut out by the
plane $2x + y + 2z = 2$, oriented by traversing the vertices as they go
from $x$-axis to $y$-axis to $z$-axis, then back to the $x$-axis.

Clearly a problem meant to suggest Stokes' Theorem, though it would be
possible to do this directly, by parametrizing three straight-line
segments and substituting. However, Stokes' is the superior play, as the
curl of this vector field is $e^{x}\widehat{k}$ (or, perhaps better, the
derivative of the work 1-form equals $e^{x}dx \land dy$). In either
case, it is clear that one must compute the flux across this triangular
surface. Try to reason out why it suffices to integrate over the
projection of this triangle to the $(x,y)$ plane. Unlike many
Green/Gauss/Stokes problems, the resulting integral is not immediately
trivial to compute -- the $e^{x}$ along with the triangular domain is a
good review of double integrals.

[QUESTION 7:]{.underline} What is the flux of the field

$$\overrightarrow{F} = \left( \cos z + xy^{2} \right)\widehat{i} + \left( x\ e^{- z} \right)\widehat{j} + \left( \sin y + x^{2}z \right)\widehat{k}$$

out of the surface given by the paraboloid $z = x^{2} + y^{2}$
satisfying $z \leq 4$.

This is a difficult problem since it is not set up directly for Gauss.
Try computing the flux across the top disc at $z = 4$ directly (not
trivial, but do use symmetry to eliminate the odd $\sin y$ term); then
use Gauss to get the total flux across both boundary components. As the
divergence is $x^{2} + y^{2}$, this problem uses cylindrical coordinates
in an essential way and gives a good review.

[QUESTION 8:]{.underline} Compute the flux of the curl of

$$\overrightarrow{F}\  = \ \left( y + x\sin x^{2} \right)\widehat{i} + \ \left( x^{2} + e^{y^{2} - 5y} \right)\widehat{j} + \left( x^{2} + y^{2} \right)\widehat{k}$$

across the graph of the function
$z = \cos^{3}\left( \pi\left( x^{2} + y^{2} \right)/2 \right)$ for
$x^{2} + y^{2} \leq 1$, oriented by the positive $z$-axis.

The phrase "flux of the curl" is a trigger for Stokes' Theorem to be
used. However, in this case, computing the circulation of
$\overrightarrow{F}$ along the boundary is not going to work (as
students should discover the hard way, time permitting). Computing the
derivative of the work 1-form of $\overrightarrow{F}$ gives

$$d\alpha_{\overrightarrow{F}} = (2x - 1)dx \land dy + 2x\ dx \land dz + 2y\ dy \land dz\ .$$

This is helpful. Using the independence-of-surface property, one can
integrate over the unit disc in the $(x,y)$ plane. Symmetry and area
give a quick clean answer.

[QUESTION 9:]{.underline} Recall Week 13 Question 4, in which Green's
Theorem was used to compute centroids of a 2-D domain based on
information along the boundary:

$$\overline{x}\  = \ \frac{1}{2A}\int_{\partial D}^{}{x^{2}\ dy}\ \ \ \ \ \ :\ \ \ \ \overline{y}\  = \ \frac{1}{2A}\int_{\partial D}^{}{- y^{2}\ dx}\ \ \ \ \ :\ \ \ \ A = \int_{\partial D}^{}{x\ dy} = - \int_{\partial D}^{}{y\ dx}$$

Can you generalize this to a 3-D domain $D$ using its 2-D boundary
$\partial D$?

This is not a difficult problem: a little pattern-matching and
guess-and-check suffices. The difficult parts of this are the ambiguity
(there are multiple integrands that will work) and the motivation (why
would one want to compute a centroid in this way). For students who
think that the Gauss Theorem always goes in one direction -- from a
difficult double integral to a cleaner triple integral -- this is a good
problem. For motivation, see the discretized version in Volume 4 Chapter
15 and its uses in medical imaging.

[QUESTION 10]{.underline}: Computing induced orientations on boundary
curves can be hard to visualize: here is an exercise. Take a sphere of
radius 2 centered at the origin, oriented with an outward-pointing
normal. From this, remove the three unit-radius solid cylinders about
the $x$, $y$, and $z$ axes. How many boundary components does the
resulting surface have, and what are their orientations?

This is challenging both to explain in words and to draw pictures of:
see what students come up with in terms of clever ways to think about
this problem.

[QUESTION 11]{.underline}: What is Gauss's Theorem good for? Recall
Archimedes and the old story of the tub... *Eureka!* Remember the
principle that the weight of a floating body equals the weight of the
fluid displaced by it. What is the buoyant force on a floating body and
how does it relate to weight and volume and *forms*?

*This requires a little bit of Physics. A fluid has particles bouncing
around at random, applying pressure to a submerged surface that acts
orthogonal to the surface itself. The force applied by the fluid is this
pressure times the surface area element. Since the floating body does
not fly off, this means that the horizontal components of the fluid
force all cancel, and the vertical component equals the weight of the
body. Assuming the water is of constant density, the magnitude of the
pressure force in the z-direction is* $\rho z$ *times the projected area
in the* $(x,y)$ *plane, where* $\rho$ *is the density. Try to lead
students to the observation that the buoyant force field is really a
2-form field* $\beta = \rho z\ dx \land dy$ *(with the proper
orientation). Next step: what is the net buoyant force? Ah, that's an
integral over the portion of the surface that is submerged. Be sure to
emphasize to students that the resulting integral gives the weight of
the displaced liquid, and uniform density of the floating body is never
assumed. This can be paired with a physical demo & makes a great
application of the divergence theorem. What about the case where the
density of the fluid varies? It is not pretty, but it is doable.*

### ASSESSMENT PROBLEMS

[PROBLEM 1.]{.underline} Integrate directly the 2-form field
$\beta = x\ dx \land dy + z\ dy \land dz$ over the parametrized surface
$G$ given by

$$G\begin{pmatrix}
s \\
t
\end{pmatrix} = \begin{pmatrix}
s^{2} + t \\
s - t^{2} \\
s^{2} - t^{2}
\end{pmatrix}\ \ \ \ \ :\ \ \  - 1 \leq s \leq 1\ \ \ \ ,\ \ \ 0 \leq t \leq 1$$

[PROBLEM 2.]{.underline} Integrate directly the 2-form field
$\beta = x\ dy \land dz + y\ dz \land dx$ over the parametrized surface
$S$ given by

$$S\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
uv \\
u + v \\
u - v
\end{pmatrix}\ \ \ \ \ :\ \ \ u^{2} + v^{2} \leq 4$$

[PROBLEM 3.]{.underline} Use the Gauss \[divergence\] theorem to compute
the flux of the vector field
$\overrightarrow{F} = \ \left( x^{3} - z^{5} \right)\widehat{i} + \left( y^{3} + z^{4} \right)\widehat{j} + \left( 1 + z^{3} \right)\widehat{k}$
across the full boundary of the solid hemisphere of radius $2$ opening
up along the $z$-axis. Assume an outward-pointing normal and that the
hemisphere boundary includes the bottom disc in the $(x,y)$ plane.

[PROBLEM 4.]{.underline} Use the Gauss \[divergence\] theorem to compute
the flux of the vector field
$\overrightarrow{F} = \ \left( x^{3} - z^{5} \right)\widehat{i} + \left( y^{3} + z^{4} \right)\widehat{j} + \left( x^{5} - y^{3} \right)\widehat{k}$
across the boundary of the cylinder of radius 2 opening up along the
$z$-axis from $z = 0$ to $z = 4.$ Assume an outward-pointing normal.

[PROBLEM 5.]{.underline} Consider the 3-d cube with corners at $(0,0,0)$
and $(2,2,2)$. If I tell you that the flux of the vector field

$$\overrightarrow{F} = \ \left( x^{2} - y^{2} \right)\widehat{i} + \left( x^{2} + y^{2} \right)\widehat{j} + \left( z^{2} \right)\widehat{k}$$

across the top face of the cube (where $z = 2$) is equal to $16$, then
use Gauss' Theorem to compute the flux out of the four sides of the cube
where $0 < z < 2$.

[PROBLEM 6.]{.underline} Use the Gauss \[divergence\] theorem to compute
the flux of

$$\overrightarrow{V} = \left( xy^{2} \right)\widehat{i} + \left( yz^{2} + x^{3} \right)\widehat{j} + \left( x^{2}\left( z - y^{2} \right) \right)\widehat{k}$$

across a sphere of radius two centered at the origin. Assume an outward
pointing normal.

[PROBLEM 7.]{.underline} Let $S$ be the closed surface which forms the
boundary of the solid domain $0 \leq z \leq 9 - x^{2} - y^{2}$. Use the
Gauss \[divergence\] theorem to compute the flux of the vector field
$\overrightarrow{F} = xy^{2}\ \widehat{i} + x^{2}y\ \widehat{j} + \left( z - e^{xy} \right)\widehat{k}\ $across
this surface $S$, using an outward-pointing normal.

[PROBLEM 8.]{.underline} Consider the surface $S$ which forms the
boundary of the solid domain
$x^{2} + y^{2} - 4 \leq z \leq 4 - x^{2} - y^{2}$. Compute the flux of
the vector field
$\overrightarrow{F} = - ye^{z}\ \widehat{i} + xe^{z}\ \widehat{j} + (x + y + z)\ \widehat{k}$
across this surface $S$, using an outward-pointing normal.

[PROBLEM 9.]{.underline} Compute the flux of the vector field

$$\overrightarrow{V} = \left( x - y^{2} \right)\widehat{i} + \left( z^{2} + y^{2} \right)\widehat{j} + (xy + z)\widehat{k}$$

across the cube in 3-D with opposite corner points at $( - 1, - 2, - 3)$
and $(3,\ 2,\ 1)$. Assume an outward pointing normal.

[PROBLEM 10.]{.underline} Consider the solid upper hemisphere $H$ given
by $x^{2} + y^{2} + z^{2} \leq 1\ \ :\ \ z \geq 0\ .\ $ Let $S$ denote
the upper hemispherical surface of $H$. Let $D$ denote the bottom of $H$
: that is, the unit disc in the $(x,y)$ plane. Let $\overrightarrow{V}$
denote the vector field
$\overrightarrow{V} = y^{2}\widehat{i} + z^{2}\widehat{j} + (x^{2} + y^{2})\widehat{k}$
.

A\) Compute the flux of $\overrightarrow{V}$ across the entire boundary
of $H$; that is, across both $S$ and $D$, using the usual
outward-pointing normals.

B\) Compute the flux of $\overrightarrow{V}$ across the upper hemisphere
$S$. Use upward-pointing normals.

[PROBLEM 11.]{.underline} Use Stokes' Theorem to compute the flux of the
curl of the vector field

$$\overrightarrow{F} = \left( x^{2}z - y \right)\widehat{i} + \left( x - yz^{2} \right)\widehat{j} + \left( \sqrt{xyz} \right)\widehat{k}$$

across the upper hemisphere

$$x^{2} + y^{2} + z^{2} = 9\ \ \ :\ \ \ \ z \geq 0$$

oriented via the positive $z$-axis.

[PROBLEM 12.]{.underline} Use Stokes' Theorem to compute the flux of the
curl of the vector field

$$\overrightarrow{F} = \left( z^{2} - y \right)\widehat{i} + \left( x - z^{2} \right)\widehat{j} + \left( x^{2} + y^{2} + z^{2} \right)\widehat{k}$$

across the surface given by the graph of

$$z = x^{2} + y^{2} - 4\ \ \ \ :\ \ \ \ z \leq 0$$

oriented via the positive $z$-axis.

[PROBLEM 13.]{.underline} Use Stokes' Theorem to compute the flux of the
curl of the vector field
$\overrightarrow{F} = ( - zy)\widehat{i} + (zx)\widehat{j} + \left( {xy\ \cos^{2}}z \right)\widehat{k}\ $
across the surface $S$ given by the formula
$z = \sqrt{5 - x^{2} - y^{2}}\ $ and $x^{2} + y^{2} \leq 1$, using an
upward-pointing normal (along the $z$-axis).

[PROBLEM 14.]{.underline} Use Stokes'
Theorem to compute the flux of the curl of the vector field
$\overrightarrow{F} = y\widehat{i} - x\widehat{j} + z(x^{3} - y^{3})\ \widehat{k}$
across the surface $S$ parametrized by

$$S\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
u \\
v \\
1 - u^{2} - v^{2}
\end{pmatrix}\ \ \ \ \ :\ \ \ \ u^{2} + v^{2} \leq 1$$

[PROBLEM 15.]{.underline} Consider the 1-form field

$$\alpha = \left( x^{2} - 1 \right)dx + (z + 1)dy + (2 - y)dz\ .$$

Consider the surface $S$ given by the unit sphere in the positive corner
of $\mathbb{R}^{3}$ where $x,\ y,\ z \geq 0$. Let $\gamma$ be the closed
curve given by the boundary of $S$, as shown.

A\) Of which vector field is $\alpha$ the work 1-form?

B\) Compute the integral of $\alpha$ along $\gamma$, the boundary of
$S$.

C\) Is $\alpha\ $the gradient 1-form of some potential function $f$?
*Why or why not?*

[PROBLEM 16.]{.underline} Consider the surface $S$ given by

$$x^{2} + y^{2} + z^{2} = 25\ \ \ \ :\ \ \ \ z \leq 0\ \ \ \ :\ \ \ \ x^{2} + y^{2} \leq 4$$

This can be described as a circular region about the *south pole* of a
sphere of radius 5 where $z \leq - 3$. Compute the flux of the curl of
the vector field

$$\overrightarrow{F} = - yz^{2}\ \widehat{i} + xz^{2}\ \widehat{j} + e^{- xyz}\ \widehat{k}$$

across this surface $S$, using an upward-pointing normal (along the
$+ z$ axis).

[PROBLEM 17.]{.underline} Compute the flux of the curl of the vector
field

$$\overrightarrow{F} = \left( x^{3} - 4y \right)\widehat{i} + y^{2}\widehat{j} - z^{5}\ \widehat{k}$$

across the oriented surface given by the following parametrization:

$$S\begin{pmatrix}
u \\
v
\end{pmatrix} = \begin{pmatrix}
u \\
v \\
u^{2} - v^{2}
\end{pmatrix}\ \ \ :\ \ \ u^{2} + v^{2} \leq 4$$

[PROBLEM 18.]{.underline} Compute the circulation of the vector field

$$\overrightarrow{F} = (4y - xy)\widehat{i} + y^{3}\widehat{j} + \cos z\ \widehat{k}$$

along the loop given by the intersection of the cylinder
$x^{2} + y^{2} = 4$ and the plane $z = x$, oriented any way you prefer.

### ANSWERS & HINTS

[PROBLEM 1.]{.underline}

$$\int_{G}^{}\beta = \int_{0}^{1}{\int_{- 1}^{1}{\left( s^{2} + t \right)( - 4st - 1) + \left( s^{2} - t \right)( - 2s + 4st)\ ds\ dt}} = - \frac{5}{3}$$

[PROBLEM 2.]{.underline}

$$\int_{S}^{}\beta = \iint_{u^{2} + v^{2} \leq 4}^{\ }{u^{2} + v^{2}\ du\ dv} = \int_{0}^{2\pi}{\int_{0}^{2}{r^{3}\ dr\ d\theta}} = 8\pi$$

[PROBLEM 3.]{.underline} by Gauss's Theorem

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{F}} = \iiint_{D}^{\ }{3\rho^{2}\ dV} = \int_{0}^{2\pi}{\int_{0}^{\pi/2}{\int_{0}^{2}{3\rho^{4}\sin\phi d\rho\ d\phi\ d\theta}}} = \frac{192}{5}\pi$$

[PROBLEM 4.]{.underline} by Gauss's Theorem

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{F}} = \iiint_{D}^{\ }{3r^{2}\ dV} = \int_{0}^{2\pi}{\int_{0}^{2}{\int_{0}^{4}{3r^{3}\ dz\ dr\ d\theta}}} = 96\pi$$

[PROBLEM 5.]{.underline} by Gauss's Theorem, the flux out of the full
cube boundary is

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{F}} = \iiint_{D}^{\ }{2(x + y + z)\ dV} = 2\int_{0}^{2}{\int_{0}^{2}{\int_{0}^{2}{x + y + z\ dx\ dy\ dz}}} = 48$$

as the flux on the bottom face is zero and the top is $16$, the side
flux is $32$

[PROBLEM 6.]{.underline} by Gauss's Theorem

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{V}} = \iiint_{D}^{\ }{\rho^{2}\ dV} = \int_{0}^{2\pi}{\int_{0}^{\pi}{\int_{0}^{2}{\rho^{4}\sin\phi d\rho\ d\phi\ d\theta}}} = \frac{128}{5}\pi$$

[PROBLEM 7.]{.underline} by Gauss's Theorem

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{F}} = \iiint_{D}^{\ }{1 + r^{2}\ dV} = \int_{0}^{2\pi}{\int_{0}^{3}{\int_{0}^{9 - r^{2}}{r + r^{3}\ dz\ dr\ d\theta}}} = 162\pi$$

[PROBLEM 8.]{.underline} by Gauss's Theorem

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{F}} = \iiint_{D}^{\ }{\ dV} = \int_{0}^{2\pi}{\int_{0}^{2}{\int_{r^{2} - 4\ }^{4 - r^{2}}{r\ dz\ dr\ d\theta}}} = \frac{64}{3}\pi$$

[PROBLEM 9.]{.underline} by Gauss's Theorem, the flux out of the full
cube boundary is

$$\int_{\partial D}^{\ }\Phi_{\overrightarrow{V}} = \iiint_{D}^{\ }{2 + 2y\ dV} = 2V = 128$$

[PROBLEM 10.]{.underline} A) by Gauss's Theorem, the flux out of the
full boundary is zero, since the divergence of $\overrightarrow{V}$
vanishes ; B) integrate $r^{2}$ over $D$ to get $\pi/2$

[PROBLEM 11.]{.underline} By Stokes' Theorem using the boundary circle
where $z = 0$,

$$\int_{D}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{\partial D}^{\ }\alpha_{\overrightarrow{F}} = \int_{\partial D}^{\ }{- y\ dx + x\ dy} = 18\pi$$

[PROBLEM 12.]{.underline} By Stokes' Theorem using the boundary circle
where $z = 0$

$$\int_{D}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{\partial D}^{\ }\alpha_{\overrightarrow{F}} = \int_{\partial D}^{\ }{- y\ dx + x\ dy + \left( x^{2} + y^{2} \right)dz} = 8\pi$$

[PROBLEM 13.]{.underline} By Stokes' Theorem using the boundary circle
where $z = 2$

$$\int_{S}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{\partial D}^{\ }\alpha_{\overrightarrow{F}} = \int_{\partial D}^{\ }{2y\ dx - 2x\ dy} + \left( xy\cos^{2}2 \right)dz = - 4\pi$$

[PROBLEM 14.]{.underline} By Stokes' Theorem using the boundary circle
where $z = 0$,

$$\int_{S}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{\partial S}^{\ }\alpha_{\overrightarrow{F}} = \int_{\partial S}^{\ }{y\ dx - x\ dy} = - \pi$$

[PROBLEM 15.]{.underline} A) ; B) By Stokes' Theorem

$$\int_{\partial S}^{\ }\alpha = \int_{S}^{\ }{d\alpha} = \int_{S}^{\ }{dz \land dy - dy \land dz} = - \frac{\pi}{2}$$

using the projected oriented area of the surface onto the quarter-disc
in the $(y,z)$ plane; this $\alpha$ is not a gradient since its
derivative is nonzero (curl of grad is always zero)

[PROBLEM 16.]{.underline} By Stokes' Theorem using the boundary circle
where $z = - 3$,

$$\int_{S}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{\partial S}^{\ }\alpha_{\overrightarrow{F}} = \int_{\partial S}^{\ }{- 9y\ dx + 9x\ dy + e^{3xy}dz} = 72\pi$$

[PROBLEM 17.]{.underline} Compute the curl as $4\widehat{k}$ directly or
use forms to integrate

$$\int_{S}^{\ }\Phi_{\nabla \times \overrightarrow{F}} = \int_{S}^{\ }{d\left( \left( x^{3} - 4y \right)dx + y^{2}dy - z^{5}dz \right)} = \int_{S}^{\ }{4\ dx \land dy} = 16\pi$$

[PROBLEM 18.]{.underline} By Stokes' Theorem,

$$\int_{\partial D}^{\ }{(4y - xy)dx + y^{3}dy + \cos z\ dz} = \int_{S}^{\ }{(4 - x)\ dx \land dy} = 16\pi$$

## 

# EPILOGUE : BEYOND CALCULUS

### OUTLINE

MATERIALS: Calculus BLUE : Vol 4 : Chapters 13-18

TOPICS:

- Differential forms in geometric data analysis

- Differential forms in fluid dynamics

- Differential forms in electromagnetics

- Differential forms and calculus on $\mathbb{R}^{n}$

- The general Stokes Theorem on $\mathbb{R}^{n}$

- Integration by parts via differential forms

- Differential forms in time series analysis

- Mathematics beyond Calculus

LEARNING OBJECTIVES:

- *Inspiration*

### PRIMER

There's more to the Story than has been told. For those called to
explore past the bounds of this course, there are several chapters of
bonus material to give an idea of what differential forms are good for
and how the calculus of fields and forms extends beyond 3-D.

\[BONUS\] FORMS & SPATIAL DATA. One of the best, simplest applications
of Green's Theorem is in geometric data analysis in 2-D. Given a domain
$D$ in the plane, one can compute its area by adding up changes in the
positions of points along the boundary curve $\gamma = \partial D$:

$$A = \int_{D}^{\ }{dx \land dy} = \frac{1}{2}\int_{\gamma}^{\ }{x\ dy - y\ dx}.$$

This is of considerable practical use: say if a medical technician
traces the outline of a domain on an ultrasound, or a drone traces the
boundary about a hazardous interior region in the plane. In practice,
the boundary curve is most likely to be discretized -- sampled along a
cyclic sequence of points. How can the path integral best be
approximated?

Consider breaking the boundary curve $\gamma$ into a sequence of
straight-line paths $\{\gamma^{i}\}$ for $i = 1\ldots N$ with each
segment consistently oriented having endpoints from start
$(x_{1}^{i},\ y_{1}^{i})$ to finish $(x_{2}^{i},\ y_{2}^{i})$. By
parametrizing each segment and integrating, one derives the following
useful approximation to the area of $D$:

$$A \approx \sum_{i = 1}^{n}\left( \frac{1}{2}\int_{\gamma^{i}}^{\ }{x\ dy - y\ dx} \right) = \frac{1}{2}\sum_{i = 1}^{n}\left( x_{1}^{i}y_{2}^{i} - y_{1}^{i}x_{2}^{i} \right)\ .$$

This is a subtle formula. What about the case of 3-D data, where a
point-cloud approximates the boundary $\partial D$ of a solid body $D$?
Triangulate the surface with a collection of oriented triangles $T^{i}$,
each with three boundary points located at
$(x_{j}^{i},y_{j}^{i},z_{j}^{i})$ for $j = 1\ldots 3$ and
$i = 1\ldots N$. In the case of a smooth boundary $\partial D$, the
theorem of Gauss states that the enclosed volume is computed via:

$$V = \int_{D}^{\ }{dx \land dy \land dz} = \frac{1}{3}\int_{\partial D}^{\ }{x\ dy \land dz + y\ dz} \land dx + z\ dx \land dy.$$

This latter surface integral can be discretized into the sums of
integrals over the triangles $T^{i}$ to yield:

$${V \approx \sum_{i = 1}^{n}\left( \frac{1}{3}\int_{T^{i}}^{\ }{x\ dy \land dz + y\ dz \land dx + z\ dx \land dy} \right)
}{\ \ \ \  = \frac{1}{6}\sum_{i = 1}^{n}\left( x_{1}^{i}y_{2}^{i}z_{3}^{i} + x_{2}^{i}y_{3}^{i}z_{1}^{i} + x_{3}^{i}y_{1}^{i}z_{2}^{i} - x_{1}^{i}y_{3}^{i}z_{2}^{i} - x_{2}^{i}y_{1}^{i}z_{3}^{i} - x_{3}^{i}y_{2}^{i}z_{1}^{i} \right)}$$

This is, as in the 2-D case, a determinant. The above formulae are
classics and well-known. Less familiar are analogous formulae for
computing centroids and moments of inertia using boundary point data and
differential forms. The videotext details an entirely novel application
involving estimation of surface area on the surface of the earth
(assumed spherical) using boundary point data and differential forms in
spherical coordinates: the key is the following computation of surface
area on a radius $R$ sphere:

$$S = \int_{D}^{\ }{R^{2}\sin\phi d\theta \land d\phi} = \int_{\partial D}^{\ }{- R^{2}\cos\phi d\theta}\ .$$

This follows not from Green's or Gauss's Theorem, but from Stokes's, and
is just the beginning of a beautiful story of using differential forms
in geometric data analysis.

\[BONUS\] FORMS & FLUIDS. A mathematical *fluid* is a field of particles
that flows according to its *velocity field*, $\overrightarrow{V}$, the
vector field that registers the rate of change of position of each fluid
particle as a function of position and time, meaning that
$\overrightarrow{V} = \overrightarrow{V}(x,y,z,t)$. The *vorticity* of
the fluid, $\overrightarrow{W} = \nabla \times \overrightarrow{V}$, is
the curl of the velocity and measures how rotational the fluid is at a
given location and time. The nicest examples of fluids are the *perfect
fluids*, which have no friction (viscosity): these satisfy the *Euler
equations* of motion, written using the *material derivative* from the
bonus material in Week 6. The velocity field $\overrightarrow{V}$ is
that of a perfect fluid if its work 1-form and flux 2-form satisfy

$$\frac{D\alpha_{\overrightarrow{V}}}{Dt} = - dh\ \ \ \ :\ \ \ d\Phi_{\overrightarrow{V}} = 0\ ,$$

for some scalar field $h$, which encodes pressure and other physical
aspects of the fluid. The latter equation means that the fluid is
*incompressible* (volume-preserving or zero-divergence), and the former
equation means that the time-derivative of the work 1-form is a gradient
for some potential $h$.

The fundamental theorems of Weeks 12-14 can be immediately put to work.
The first major theorem of perfect fluids -- *Kelvin's Theorem* --
concerns the circulation of a fluid along a loop. Kelvin's Theorem
states that, in a perfect fluid, the circulation $C_{\gamma}$ along a
loop of particles $\gamma$ does not change over time. Remember, the
particles and thus the loop are changing over time, so that:

$$\frac{d}{dt}C_{\gamma} = \frac{d}{dt}\int_{\gamma(t)}^{\ }\alpha_{\overrightarrow{V}} = \int_{\gamma(t)}^{\ }\frac{D\alpha_{\overrightarrow{V}}}{Dt} = \int_{\gamma(t)}^{\ }{- dh} = 0$$

This is a combination of the Euler equation with the Independence of
Path Theorem.

The 2^nd^ major result of mathematical fluid dynamics is *Helmholtz's
Theorem*, which combines the vorticity 2-form
$\omega = d\alpha_{\overrightarrow{V}}$ and Stokes' Theorem with the
idea of a *vortex tube* that measures how the fluid twists: see the
videotext for details.

\[BONUS\] FORMS & ELECTROMAGNETICS. Vector fields are the traditional
language for electromagnetic fields and forces, all connected by the
familiar operations of grad, curl, and div, thanks to *Maxwell's
equations*. In 3-D, one has the following (time-varying) vector and
scalar fields:

- The *electric field* $\overrightarrow{E}$

- The *magnetic field* $\overrightarrow{B}$

- The *current field* $\overrightarrow{J}$

- The *charge density* $\rho$

Maxwell equations relates these quantities via the following formulae:

$$\nabla \times \overrightarrow{E} = - \frac{\partial\overrightarrow{B}}{\partial t}\ \ \ \ \ ;\ \ \ \ \nabla \cdot \overrightarrow{E} = \rho\ \ \ \ \ ;\ \ \ \ \nabla \times \overrightarrow{B} = \overrightarrow{J} + \frac{\partial\overrightarrow{E}}{\partial t}\ \ \ \ \ ;\ \ \ \ \nabla \cdot \overrightarrow{B} = 0\ .$$

The theorems of Gauss and Stokes then translate into statements such as:

> The flux of the electric field through a closed surface\
> equals the net charge enclosed by the surface.
>
> There is no magnetic charge : the flux of the magnetic field\
> across a closed surface is always zero.
>
> The circulation of the electric field along a loop\
> depends on the flux of how the magnetic field changes over time.

This is fine, but there is a better approach using modern terminology.
If one converts the electric and magnetic data into 2-forms and the
current/charge into a 3-form, a great simplification arises.

Let
$\mathcal{F =}\alpha_{\overrightarrow{E}} \land dt + \Phi_{\overrightarrow{B}}$
be the *Faraday form* and
$\mathcal{M =}\alpha_{\overrightarrow{B}} \land dt - \Phi_{\overrightarrow{E}}$
the *Maxwell form.* These are 2-forms on the 4-dimensional space-time
with coordinates $(\mathbf{x},t)$. The current-charge form is the 3-form
given by
$\mathcal{J =}\Phi_{\overrightarrow{J}} \land dt - \rho\ d\mathbf{x}$.
Then, Maxwell's equations can be written compactly as

$$d\mathcal{F} = 0\ \ ;\ \ d\mathcal{M = J\ .}$$

The utility of this formulation (besides concision) is the connection to
the geometry of spacetime, as $\mathcal{F}$ can be seen as the curvature
of the electromagnetic potential. This is the beginning of a much longer
story in Physics.

\[BONUS\] BEYOND 3-D. The above applications to fluids and
electromagnetics are both working with time-dependent fields, and the
resulting differential forms are on $\mathbb{R}^{4}$. This prompts the
question of how to extend the calculus of form fields from
$\mathbb{R}^{3}$ to $\mathbb{R}^{n}$. Many features are a simple
translation: 0-forms are scalar fields, and 1-forms are generated from
the basis $dx_{1},\ dx_{2},\ldots,\ dx_{n}$. The basis $k$-forms (for
any $k)$ are generated from the wedge product $\land$ and a few simple
rules involving determinants. A basis $k$-form
$dx_{i_{1}} \land \cdots \land dx_{i_{k}}$ eats an ordered $k$-tuple of
vectors $(\mathbf{v}_{1},\mathbf{v}_{2},\cdots,\mathbf{v}_{k})$ in
$\mathbb{R}^{n}$, stacks them into an $n$-by-$k$ matrix, then selects
$k$ rows (based on the $i_{1}\cdots\ i_{k}$ indices), and takes the
determinant of the resulting $k$-by-$k$ matrix. Based on this, these
basis $k$-forms are linear functions of each input vector and they are
antisymmetric:

- switching any two $dx_{*}$ terms reverses the sign; and

- repeating any two $dx_{*}$ terms yields zero.

This implies that all $k$-forms on $\mathbb{R}^{n}$ for $k > n$ vanish.
The basis $k$-forms are used to build $k$-form fields. Work is the
natural motivation for $1$-form fields; flux arises as an $(n - 1)$-form
field, where, for a vector field
$\overrightarrow{F} = \sum_{i}^{}{F_{i}\ {\widehat{e}}_{i}}$ on
$\mathbb{R}^{n}$,

$$\Phi_{\overrightarrow{F}} = \sum_{i = 1}^{n}{F_{i}\ dx_{i + 1} \land}\cdots \land dx_{n} \land dx_{1} \land \cdots \land dx_{i - 1}\ .$$

Form fields can be differentiated, with $d$ taking a $k$-form field
$\omega$ to $(k + 1)$- form field, $d\omega$, following the usual
pattern of $d(f\varepsilon) = df \land \varepsilon$ for any basis form
$\varepsilon$. The differentiation operator $d$ satisfies a product rule
given by:

$$d(\alpha \land \beta) = d\alpha \land \beta + ( - 1)^{p}a \land d\beta\ ,$$

where $p$ is the degree of $\alpha$; that is, $\alpha$ is a $p$-form.
This asymmetry comes from the fact that for any $p$-form $\alpha$ and
$q$-form $\beta$, $\alpha \land \beta = ( - 1)^{pq}\beta \land \alpha$.
Derivatives on form fields satisfy the all-important lemma:

$$d^{2} = d \circ d = 0\ ,$$

generalizing the 3-D results about curl-of-grad and div-of-curl
vanishing.

The Fundamental Theorem -- *Stokes' Theorem* -- uses derivatives and
integrals of form fields over generalizations of curves and surfaces in
$\mathbb{R}^{n}$. These generalized surfaces are called *manifolds* and
are the subject of a great deal of more advanced Mathematics. For the
present, the term "$k$-*dimensional domain*" will refer to such an
object, a generalized surface that locally "looks like" $\mathbb{R}^{k}$
in the same way that a curve is everywhere locally a line, or a surface
is locally a plane -- no kinks or singularities.

Integration of form fields proceeds following the pattern in
$\mathbb{R}^{3}$: 1-form fields are integrated over 1-dimensional
domains; 2-form fields are integrated over 2-form fields; $k$-form
fields are integrated over $k$-dimensional domains. Such domains are
parametrized by functions $G:\mathbb{R}^{k} \rightarrow \mathbb{R}^{n}$
of $k$ parameters $t_{!},\ \ldots,\ t_{k}$. To integrate a $k$-form
field $\omega$ over $G$ one feeds the columns of the derivative
$\lbrack DG\rbrack$ into $\omega$ to obtain a scalar field, which is
integrated over the domain in $\mathbb{R}^{k}$. The usual Change of
Variables application reveals that the integral is independent of the
parametrization, so long is the orientation is consistent. With these
tools in place, one can state the Fundamental Theorem:

THE GENERALIZED STOKES' THEOREM

$$\int_{\partial D}^{}\omega\  = \ \ \int_{D}^{}{d\omega}\ .$$

This holds for any $k$-form field $\omega$ defined on an oriented
$(k + 1)$-dimensional domain $D$ with oriented $k$-dimensional boundary
$\partial D$.

This is the end of the Story, but not the last word. This Fundamental
Theorem is as useful as it is beautiful, and it is very beautiful. The
applications are not obvious and can require additional background; to
that end, we close with a few elementary applications of forms in higher
dimensions.

\[BONUS\] PARTS. The only integration technique we dwelt on in this
course was substitution -- this was the content of the Change of
Variables Theorem in Week 11. Among the other integration techniques
that you have seen in the past, there is one that is directly related to
Stokes' Theorem: integration by parts. As with the single variable
version, one integrates the product formula for derivatives: for
$\alpha$ a $p$-form field and $\beta$ a $q$-form field; for $D$ a
$(p + q + 1)$-dimensional domain with $(p + q)$-dimensional boundary,

$$\int_{\partial D}^{\ }{\alpha \land \beta} = \int_{D}^{\ }{d(\alpha \land \beta)} = \int_{D}^{\ }{d\alpha \land \beta + ( - 1)^{p}a \land d\beta}\ .$$

This has numerous applications in partial differential equations and
analysis in the guise of so-called *Green's Identities*.

\[BONUS\] GEOMETRIC OPTICS. Why would one care about form fields in
higher dimensions? One can argue for 4-D calculus based on space-time,
as in the case of fluids and electromagnetics; higher dimensions are not
as obvious. One class of examples that is both physical and easily
imagined comes from optics and the study of light rays, used in both
Cosmology and Computer Graphics. Consider a ray of light -- a straight
line in 3-D space, coordinatized as a $z$-axis, with an orthogonal
$(x,y)$ plane that coordinatizes translations of the light ray. Changing
the directions of light rays is controlled by two angles
$(\varphi,\psi)$ which rotate in the $(x,z)$ and $(y,z)$ planes
respectively. Then, the space of rays is a 5-dimensional space.

Differential forms provide a convenient language for working with the
geometry of light rays. One can describe a distribution of light rays
using a *Hamiltonian* scalar field $H(x,y,\varphi,\psi)$ via the
following 2-form field:

$$\Omega = d\varphi \land dx + d\psi \land dy + dz \land dH\ .$$

The *brightness* of a ray distribution is a 4-form field
$B\Omega \land \Omega$ for $B$ a scalar field. The integral of this
brightness 4-form over a 4-D domain that represents a lens gives the
*throughput* or *étendue* of the lens. Other fundamental concepts in
geometric optics overlap with Hamiltonian mechanics and dynamics, for
which differential forms are foundational.

\[BONUS\] STOKES' THEOREM & TIME SERIES DATA. The following is a novel
application of differential forms and Stokes' Theorem in
$\mathbb{R}^{n}$ to data analysis, based on work with Yuliy Baryshnikov.
Consider a collection of time-series data -- one can think of
$N\ $signals $x_{i}(t)$ for $i = 1\ldots N$ as functions of time $t$.

Assume that the signals $x_{i}$ are measuring phenomena that are
time-periodic, such as the swinging of a pendulum or the boom-bust
business cycle or any number of biological signals based on circadian
rhythms. If we have several such signals with the same period,
$x_{i}(t + P) = x_{i}(t)$, then one might want to know whether one is a
*leading* or a *lagging* indicator -- this is especially useful in
Economics and in Biology.

For purely periodic functions, there are many ways to discern this order
(*harmonic analysis* is one such). Those methods are, however, very
sensitive to time axis reparameterizations. Many real-life phenomena are
*cyclic* without being rigidly *periodic*. Cardiac rhythms,
musculo-skeletal movements exercised during a gait, population dynamics
in closed ecosystems, business cycles, neural responses, and more are
examples of cyclic yet aperiodic processes.

The key insight is that each pair of (roughly) time-periodic signals
$\left( x_{i}(t),\ x_{j}(t) \right)$ traces out a closed curve in the
$\left( x_{i},x_{j} \right)$ plane whose oriented area reveals leading
or lagging behaviors and the degree thereof. This oriented area is
measure by the 2-form $dx_{i} \land dx_{j}$ in the full signal space
$\mathbb{R}^{n}$. A simple application of Stokes' Theorem means that one
can compute this oriented projected area by integrating over the
boundary curve:

$$A_{ij} = \int_{D}^{\ }{dx_{i} \land dx_{j}} = \frac{1}{2}\int_{\partial D}^{\ }{x_{i}\ dx_{j} - x_{j}\ dx_{i}}.$$

The curve is automatically time-parametrized by $x_{i}(t)$ and
$x_{j}(t)$.

For temporally discretized data (such as might occur in experimental or
social sciences) an estimate based on piecewise-linear paths as at the
beginning of this Epilogue is effective and robust with respect to
nonuniformities in sampling of points. One can assemble all these path
integrals into an antisymmetric lead-lag matrix
$A = \lbrack A_{ij}\rbrack$ with $A_{ji} = - A_{ij}$ that correlates
signed lead-lag behaviors, all independent of time parametrization,
thanks to Stokes's Theorem.

\[BONUS\] BEYOND CALCULUS. Differential forms point the way to a
universe of mathematical subjects of current and enduring interest in
research:

- In *differential geometry*, forms are the language for describing all
  types of curvature on geometric (Riemannian) manifolds.

- In *algebraic topology*, forms give a fundamental example of
  *cohomology* and Stokes' Theorem gives a precise mechanism for
  *Poincaré duality*.

- In *real analysis* and *differential equations*, forms and Stokes'
  Theorem provide an elegant generalization of *Green's Identities*.

- In *complex analysis*, forms and Green's Theorem are the basis for
  deep integration results, including *Cauchy's Integral Theorem*.

- In *algebraic geometry*, forms are a crucial ingredient in *Hodge
  Theory* and many other approaches to classifying regular and singular
  behavior in solutions to polynomial equations.

#  

*There is so much more that you can learn and do\
with the background you now possess.*

⛤ ⛤ ⛤

*\*

**CALCULUS BLUE GUIDE**

1^st^ edition

Copyright © 2023-2025 Robert Ghrist

All rights reserved worldwide

Published by Agenbyte Press

Jenkintown PA, USA

ISBN 978-1-944655-07-5

**ABOUT THE AUTHOR**

Robert Ghrist (Ph.D., Cornell, Applied Mathematics, 1995) is the Andrea
Mitchell PIK Professor of Mathematics and Electrical & Systems
Engineering at the University of Pennsylvania. He is a recognized leader
in the field of Applied Algebraic Topology, working in sensor networks,
robotics, signal processing, data analysis, optimization, and more. He
is an award-winning researcher, teacher, and expositor of Mathematics
and its applications,\
currently serving as the Associate Dean of Undergraduate Education\
in the School of Engineering & Applied Sciences\
at the University of Pennsylvania

He is the author of several books, such as:\
*Elementary Applied Topology* and *Linear Algebra: Essence & Form*\
He is the creator of multiple video-text series on YouTube, including*\
Calculus BLUE\
Calculus GREEN\
Applied Dynamical Systems*

Ghrist has been an invited speaker at two International Congresses of
Mathematicians: once (Madrid 2006) for research and once (Seoul, 2014)
for education. Ghrist is a dedicated expositor and communicator of
Mathematics, with teaching awards that include the MAA James Crawford
Prize, Penn\'s Lindback Award, and the S. Reid Warren award in
Engineering at Penn.

In his spare time\
he publishes mathematical art and animation\
under the moniker *colimit*

*colimit.eth.xyz\
objkt.com/profile/colimit/created*