%header file for amslatex (using latex2e)
\documentclass[12pt]{article}
%Options: draft shows overfull lines, reqno-leqno puts eq numbers on right/left
%\documentclass[11pt,draft, reqno,a4paper,psamsfonts]{amsart}
\usepackage{amsmath} % preloaded by amsart above
\usepackage{amssymb}
%\usepackage{eufrak} % preloaded by amsart(?) above
%\usepackage[mathscr]{eucal}
\usepackage{pslatex}
\newcommand\eurfamily{\usefont{U}{eur}{m}{n}} % Fancy script font
\DeclareTextFontCommand{\texteur}{\eurfamily}
\usepackage{amscd} % for commutative diagrams
\usepackage[dvips]{epsfig} % to include PostScript figures
% *** CHANGE DIMENSIONS ***
\voffset=-0.3truein % LaTeX has too much space at page top
% \advance\hoffset by .375 in
% \advance\voffset by .125 in
\addtolength{\textheight}{0.3truein}
\addtolength{\textheight}{\topmargin}
\addtolength{\topmargin}{-\topmargin}
%\showthe\textheight
\textwidth 6.0in % LaTeX article default 360pt=4.98''
\oddsidemargin 0pt % \oddsidemargin .35in % default is 21.0 pt
\evensidemargin 0pt % \evensidemargin .35in % default is 59.0 pt
\mathsurround 1pt
%\parindent=20pt
% *** SWITCH BACK TO PLAIN.TeX USAGE ***
\renewcommand\smallskip{\vskip\smallskipamount}
\renewcommand\medskip{\vskip\medskipamount}
\renewcommand\bigskip{\vskip\bigskipamount}
% *** ENVIRONMENT ***
%\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{cor}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{prop}[theorem]{Proposition}
%\theoremstyle{definition}
\newtheorem{definition}{Definition}
\newtheorem{defn}[theorem]{Definition}
%\newcommand{\Proof}{\noindent{\sc Proof:} }
\newcounter{bean}
\newcounter{bacon}
\newcounter{butter}
% *** FONT STUFF ***
% Font Commands (and examples): Example
\newcommand{\trm}{\textrm}% text roman \trm{Rush}
\newcommand{\tbf}{\textbf}% text bold \tbf{Rush}
\newcommand{\tit}{\textit}% text italic \tit{Rush}
\newcommand{\tsl}{\textsl}% text slanted etc.
\newcommand{\ttt}{\texttt}% text typewriter
\newcommand{\tsc}{\textsc}% text small caps
\newcommand{\tsf}{\textsf}% text sans serif
\newcommand{\tup}{\textup}% text upright
\newcommand{\mrm}{\mathrm}% math roman
\newcommand{\mbf}{\mathbf}% math bold
\renewcommand{\mit}{\mathit}% math italic % undefined by amsart
%\newcommand{\mit}{\mathit}% math italic
\newcommand{\msf}{\mathsf}% math sans serif
\newcommand{\mtt}{\mathtt}% math typewriter
\newcommand{\Cal}{\mathcal} % Calligraphic - caps only
\newcommand{\BB}{\mathbb} % Blackboard Bold - caps only
% the old \frak is obsolete so reuse its name
\renewcommand{\frak}{\mathfrak} % Math Fraktur (= Gothic;
\newcommand{\BM}{\boldsymbol} % Bold math symbol Ital
\DeclareMathAlphabet{\BMi}{OT1}{cmm}{b}{it} % Bold math italic
\newcommand{\fancy}{\texteur} % Fancy script
% Examples: (assumed enclosed by $ $ )
% \Cal{A} Calligraphic A
% \BB{C} Blackbord Bold C
% \frak{g} Fraktur (= Gothic) g
% \BM{\alpha} bold Greek alpha
% \BM{\Cal{A}} bold Calligraphic A
% \fancy{\Large Nice} Large size fancy: Nice ($ $ not needed)
% \BM{R} bold math italic R
% {\mathversion{bold} u'' = f(x)} prints formula bold
% *** MACROS ***
%MATH Macros
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\abs}[1]{\lvert #1 \rvert} % absolute value
\newcommand{\norm}[1]{\lVert #1 \rVert} % norm
\newcommand{\ip}[2]{\langle #1,\, #2\rangle} % ip = inner product
\newcommand{\brak}[2]{[#1,\, #2]} % brak = bracket
\newcommand{\brac}[1]{\{\,#1\,\}} % brac = brace { }
\newcommand{\vect}[1]{\mathbf{#1}} % for bold face vectors
\newcommand{\medstrut}{\rule[-5pt]{0pt}{25pt}}
\newcommand{\bigstrut}{\rule[-5pt]{0pt}{25pt}}
\newcommand{\Strut}{\rule[-6pt]{0pt}{20pt}}
\newcounter{numb}
\newcommand{\Numb}{\stepcounter{numb}\arabic{numb}}
\newcommand{\Numbb}{{\arabic{numb}}{\usecounter{numb}}}
\def\bigprime{\kern-12pt\vphantom{\sum}^{'}}
%The following environment is for sets of problems.
\newenvironment{problems}{\begin{list}{\arabic{numb}.}{\usecounter{numb}
\setlength{\leftmargin}{20pt}
\setlength{\labelwidth}{15pt}
\setlength{\labelsep}{5pt}
\setlength{\itemsep}{ 15.0pt plus 2.5pt minus 1.0pt}
}
}
{\end{list}}
%The next environment is for the parts a), b), ... of a problem.
\newenvironment{parts}{\begin{list}{\alph{bean}) }{\usecounter{bean}
\setlength{\leftmargin}{20pt}
\setlength{\labelwidth}{15pt}
\setlength{\labelsep}{5pt}
\setlength{\topsep}{0pt}
\setlength{\partopsep}{0pt}
}
}
{\end{list}}
%==================== END PREAMBLE =========================================
\begin{document}
% Begin body of article here.
%\pagestyle{empty}
\parindent=0pt
\vspace*{-2cm}
{ Math 210 \hfill Jerry L. Kazdan}
\vskip 15pt
\centerline{\Large Vectors --- and an Application to Least Squares}
\medskip
\begin{quote}
\small This brief review of vectors assumes you have seen
the basic properties of vectors previously.
\end{quote}
\bigskip
We can write a point in $\R^n$ as $X = (x_1,\ldots,x_n)$. This point
is often called a {\bf vector}. Frequently it is useful to think of it
as an arrow pointing from the origin to the point. Thus, in the plane
$\R^2$, $X = (1, -2)$ can be thought of as an arrow from the origin to
the point $(1, -2)$.
\medskip
{\bf \qquad Algebraic Properties}
\smallskip
{\bf Alg-1.} {\sc addition: } If $Y = (y_1,\ldots,y_n)$, then $X+Y =
(x_1 + y_1,\ldots,x_n + y_n)$.
\smallskip
{\it Example}: In $\R^4$, $(1, 2, -2,0) + ( -1, 2, 3, 4) = (0,4,1,4)$.
\medskip
{\bf Alg-2.} {\sc multiplication by a constant: } $cX = (cx_1,\ldots,
cx_n)$.
\smallskip
{\it Example}: In $\R^4$, if $X= (1, 2, -2,0)$, then $-3X = (-3, -6, 6, 0)$.
\medskip
{\bf Alg-3.} {\sc distributive property: }$ c(X+Y) = cX + cY$. This is obvious
if one writes it out using components. For instance, in $\R^2$:
$$
c(X+Y) = c(x_1 +y_1, x_2 + y_2) = (cx_1 +cy_1, cx_2 + cy_2)
= (cx_1,cx_2) + (cy_1,cy_2) = cX + cY.
$$
\medskip
{\bf \qquad Length and Inner Product}
\smallskip
{\bf NIP-1.} $\norm{X} := \sqrt{x_1^2 + \cdots + x_n^2}$ is the {\it distance}
from $X$ to the origin. We will also refer to $\norm{X}$ as the {\it
length} or {\it norm} of $X$. Similarly $\norm{X - Y}$ is the {\it
distance between $X$ and $Y$}.
Note that $\norm{X} = 0$ if and only if $X=0$, and also that for any
constant $c$ we have $\norm{CB} = \abs{c} \norm{X}$. Thus, $\norm{-2X}
= \norm{2X} = 2\norm{X}$.
\medskip
{\bf LIP-2.} The {\it inner product} of vectors $X$ and $Y$ in $\R^n$
is, by definition,
\begin{equation} \label{eq:ip}
\ip{X}{Y} := x_1y_1 + x_2y_2 + \cdots + x_ny_n.
\end{equation}
This is also called the {\it dot product} and written $X \cdot Y$. The
inner product of two vectors is a number, {\it not} another vector.
In particular, we have the vital identity $\norm{X}^2 = \ip{X}{X}$
relating the inner product and norm. For added clarity, it is
sometimes useful to write the inner product in $\R^n$ as
$\ip{X}{Y}_{\R^n}$.
\smallskip
{\it Example}: In $\R^4$, if $X= (1, 2, -2,0)$ and $Y = ( -1, 2, 3,
4)$, then $\ip{X}{Y} = (1)(-1) + (2)(2) + (-2)(3) + (0)(4) = -3$.
\medskip
{\bf HIP-3.} {\sc algebraic properties of the inner product}. The
following are obvious from the above definition of $\ip{X}{Y}$:
{\parindent 20pt \obeylines
i).\ \ $\ip{X}{X}\ge 0$,\ with\quad$\ip{X}{X}=0$ \ if (and only if) \ \ $X=0$,
ii).\ $\ip{X+Y}{W}= \ip{X}{W} + \ip{Y}{W}$,
iii).\ $\ip{cX}{Y} = c\ip{X}{Y}$,
\nobreak iv).\ $\ip{Y}{X} = \ip{X}{Y}$.
}
These four properties can be viewed as the \emph{axioms} for an
inner product of real vectors.
\medskip
{\small {\sc Remark:}
If one works with vectors $Z:= (z_1,\,z_2,\ldots,z_n)$, having
\emph{complex numbers} $z_j$ as elements, then the
definition of the inner product must be modified since, for a complex
number $z:=x+iy$ we have $\abs{z}^2=x^2+y^2 = z\overline{z}$, where
$\overline{z}:=x-iy$ is the \emph{complex conjugate} of $z$. Using this we
define the \emph{Hermitian inner product} by
\begin{equation} \label{complexip}
\ip{W}{Z} := w_1\overline{z}_1 + w_2\overline{z}_2 + \cdots +
w_n\overline{z}_n.
\end{equation}
(note: many people put the complex conjugate on the first term, $w_j$, instead of
the $z_j$). The purpose is to insure that the fundamental property
$\norm{Z}^2 = \ip{Z}{Z} \ge 0$ still holds. Note, however, that the
symmetry property
$\ip{Y}{X} = \ip{X}{Y}$ is now {\it replaced} by
$\ip{Z}{W} = \overline{\ip{W}{Z}}$, and hence, as the following proof
shows, $\ip{W}{cZ}=\bar{c}\ip{W}{Z}$:
{\sc Proof:}
$ \ \ip{W}{cZ}=\overline{\ip{cZ}{W}} =
\ip{\overline{c}\overline{Z}}{\overline{W}}
= \overline{c}\overline{\ip{Z}{W}}= \overline{c}\ip{W}{Z})$.
For complex vectors or matrices one {\it always} uses a Hermitian inner
prodect.
}
\bigskip
{\bf IP-4.} {\sc geometric interpretation:} The definition
$\eqref{eq:ip}$ of the inner product is easy to compute. However, it
is not at all obvious that the inner product is useful -- until one
interprets it geometrically:
\hbox{\vbox{\hangindent=-2.20in \hangafter=0
\begin{equation}\label{ipG}
\ip{X}{Y} = \norm{X}\norm{Y} \cos\theta,
\end{equation}
where $\theta$ is the angle between $X$ and $Y$. Since $\cos(-\theta) =
\cos\theta$, the sense in which we measure the angle does not matter.}
\kern +0.40in%\vskip20pt
\raise -10pt\vbox{\epsfig{figure=ip0.eps,scale=0.80}\vskip 10pt}}
To prove \eqref{ipG}, we can restrict our attention to the two
dimensional plane containing $X$ and $Y$. Thus, we need consider only
vectors in $\R^2$. Assume we are not in the trivial case where $X$ or
$Y$ are zero. Let $\alpha$ and $\beta$ be the angles that
$X=(x_1,x_2)$ and $Y = (y_1,y_2)$ make with the horizontal axis, so
$\theta = \beta -\alpha$. Then
$$
x_1 = \norm{X}\cos\alpha \qquad \text{and} \qquad x_2 = \norm{Y}\sin\alpha.
$$
Similarly, $y_1 = \norm{Y}\cos\beta $ and $y_2 = \norm{Y}\sin\beta$.
Therefore
\begin{align*}
\ip{X}{Y} =& x_1y_1 +x_2y_2
= \norm{X}\norm{Y}(\cos\alpha \cos\beta + \sin\alpha\sin\beta)\\
=& \norm{X}\norm{Y} \cos (\beta - \alpha) = \norm{X}\norm{Y} \cos \theta.
\end{align*}
This is what we wanted. Alternatively, the equivalence of
\eqref{eq:ip} and \eqref{ipG} can be seen as just a restatement of the
law of cosines from trigonometry.
\medskip
{\bf IP-5.} {\sc geometric consequence:} $X$ and $Y$ are perpendicular
if and only if $\ip{X}{Y} = 0$, since this means the angle $\theta$
between them is $90$ degrees so $\cos\theta = 0$. We often use the
word {\it orthogonal} as a synonym for {\it perpendicular}.
\medskip
{\it Example}: The vectors $X=(1, 2, 4)$ and $(0, -2, 1)$ are orthogonal,
since $\ip{X}{Y} = 0 -4 +4 = 0$.
\medskip
\hbox{\vbox{\hangindent=-2.1in \hangafter=-5
{\it Example}: The straight line $-x + 3y = 0$ through the origin can
be written as $\ip{N}{X} = 0$, where $N = (-1,3)$ and $X = (x,y)$ is a
point on the line. Thus we can interpret this line as being the points
perpendicular to the vector $N$.
The line $-x + 3y = 7$ is parallel to the line $-x + 3y = 0$, except
that it does not pass through the origin. This same vector $N$ is
perpendicular to it. If $X_0$ is a point on the line $\ip{N}{X} = c$,
so $\ip{N}{X_0} = c$, then we can rewrite its equation as
$\ip{N}{X-X_0} = 0$, showing analytically that $N$ is perpendicular to
$X-X_0$.
}
\kern -5.2cm\vbox{\epsfig{figure=stline.eps,scale=0.70}\vskip 1.5cm}}
\bigskip
Many formulas involving $\norm{X}$ are simplest if one rewrites them
immediately in terms of the inner product. The following example uses
this approach.
\medskip
{\it Example}: {\sc [Pythagorean theorem]} If $X$ and $Y$ are
orthogonal vectors, then the \break Pythagorean law holds:
$$
\norm{X+Y}^2 = \norm{X}^2 + \norm{Y}^2.
$$
Since $X$ and $Y$ are orthogonal, then $\ip{X}{Y} = \ip{Y}{X} = 0$,
so, as asserted
\begin{align*}
\norm{X+Y}^2 = & \ip{X+Y}{X+Y} \\
= &\, \ip{X}{X} + \ip{X}{Y} + \ip{Y}{X} + \ip{Y}{Y} \\
= &\, \norm{X}^2 + \norm{Y}^2.
\end{align*}
\medskip
since if a vector $Z$ is orthogonal to all other vectors, in
particular, it is orthogonal to itself. Thus $\norm{Z}^2=\ip{Z}{Z}=0$
so $Z=0$.
{\sc Remark}: Observe that the zero vector is orthogonal to all
vectors. It is the \emph{only} such vector since if $\ip{Z}{V} = 0 $
for {\it all} vectors $V$, then $Z=0$. To prove this, since we can
pick any vector for $V$, this is true in particular if $V=Z$. But then
$\norm{Z}^2 = \ip{Z}{Z} = 0$ so the only possibility is $Z=0$.
\medskip
{\bf IP-6.} {\sc matrices and the inner product:} If $A$ is a $k\times
n$ matrix ($k$ rows, $n$ columns so $A:\R^n\to \R^k$), we want to
compute $\ip{AX}{Y}_{\R^k}$ for vectors $X\in \R^n$ and $Y\in\R^k$ in
order to introduce the concept of the \emph{adjoint} of a matrix.
Let $e_1 = (1,0,0,\ldots,0)$, \ldots, $e_n = (0,0,\ldots,0,1)$, be the
usual standard basis vectors in $\R^n$ and $\epsilon_1 =
(1,0,0,\ldots,0)$,\ldots,$\epsilon_k:=(0,\ldots,0,1)$ be the usual
basis vectors in $\R^k$. Recall that in matrix notation, we usually
think of vectors as \emph{column vectors}. If $A =(a_{ij})$ , it is
easy to see that $Ae_1$ is the first column of $A$, $Ae_2$ the second
column of $A$and so on. For instance
\begin{equation}\label{im-e2}
Ae_2 = \begin{pmatrix}
a_{11} & a_{12} &\ldots & a_{1n}\\
a_{21} & a_{22} &\ldots & a_{2n}\\
\vdots & \vdots &\ddots & \vdots\\
a_{k1} & a_{k2} &\ldots & a_{kn}
\end{pmatrix}
\begin{pmatrix}0\\1\\ \vdots\\ 0 \end{pmatrix}
= \begin{pmatrix}a_{12}\\a_{22}\\ \vdots\\a_{k2} \end{pmatrix}.
\end{equation}
In words, the image of $e_2$ is the second column of $A$, just as
asserted.
Using this observation it is clear that
$\ip{Ae_2}{\epsilon_1}_{\R^k} = a_{12}$. Similarly,
\begin{equation} \label{aij}
\ip{Ae_i}{\epsilon_j}_{\R^k} = a_{ji}.
\end{equation}
We use this to define the {\it adjoint} of the matrix $A$, written
$A^*$. It is defined by requiring that
\begin{equation} \label{eq:adj}
\ip{AX}{Y} = \ip{X}{A^*Y} \quad\text{or, more formally,}\quad
\ip{AX}{Y}_{\R^k} = \ip{X}{A^*Y}_{\R^n} .
\end{equation}
for all vectors $X\in\R^n$ and $Y\in\R^k$.
The formula \eqref{eq:adj} looks abstract but is easy to use -- although at this stage
it is not at all evident that it is useful. For the moment, write $B=A^*$, so
\eqref{eq:adj} says $ \ip{AX}{Y} = \ip{X}{BY}$. Say the elements of
$B$ are $b_{ij}$. We would like to compute the $b_{ij}$'s in terms of
the known elements $a_{ij}$ of $A$. From \eqref{im-e2} applied to $B$,
we know that $B\epsilon_1$ is the first column of $B$. Thus
$\ip{e_2}{B\epsilon_1}=b_{21}$. But the definition we have
$\ip{X}{BY} = \ip{X}{Y}$ so
$$
b_{21}=\ip{e_2}{B\epsilon_1} = \ip{Ae_2}{\epsilon_1} = a_{12}.
$$
In the same way, $b_{ij} = a_{ji}$ for all $i=1,2,\ldots n$,
$j=1,2,\ldots k$. In other words, the first row of $B=A^*$ is simply
the first column of $A$, etc. Thus we interchange the rows and columns
of $A$ to get $A^*$. For this reason $A^*$ is often called the {\it
transpose} of $A$ and written $A^T$.
\medskip
{\sc Example}
\begin{equation}\label{trans}
\text{if}\quad A:=
\begin{pmatrix}a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\end{pmatrix},
\quad\text{then}\quad A^* = A^T=
\begin{pmatrix}a_{11} & a_{21} \\
a_{12} & a_{22}\\
a_{13} & a_{23}\end{pmatrix}.
\end{equation}
\smallskip
A square matrix $A$ is called {\it self adjoint} or {\it symmetric} if
$A = A^*$. It is called {\it skew-adjoint} or {\it anti-symmetric} if
$A = -A^*$. An obvious property is that $A^{**} = (A^*)^* = A$.
As an example, let's obtain the property $(AB)^* = B^*A^*$. We begin
using the definition \eqref{eq:adj} applied to $AB$:
\begin{equation} \label{(AB)*}
\ip{(AB)^*X}{Y} = \ip{X}{(AB)Y}.
\end{equation}
But $(AB)Y = A(BY)$ so
\begin{equation} \label{(AB)*2}
\ip{X}{(AB)Y} =\ip{X}{A(BY)} = \ip{A^*X}{BY} = \ip{B^*(A^*X)}{Y}
= \ip{(B^*A^*)X}{Y}.
\end{equation}
Comparing \eqref{(AB)*} and \eqref{(AB)*2} we find that $(AB)^* =B^*A^*$.
\medskip
One consequence is that $A^*A$ is a symmetric matrix, even if $A$ is
not a square matrix, because $(A^*A)^*
= A^*A^{**} = A^*A$. In particular $A^*A$ is a square matrix.
Similarly $AA^*$ is a symmetric matrix. For many applications it is
useful to notice that $\ip{A^*AX}{X} = \ip{AX}{AX} = \norm{AX}^2 \ge
0$ for {\it all} $X$.
\medskip
\
{\sc Remark:} If, as is usual, we think of a vector
$X:=\begin{pmatrix} x_1\\ \vdots \\ x_n\end{pmatrix}$ as a column vector,
then we can treat it as a $1\times n$ matrix and observe the inner
product $\ip{X}{Y}= X^TY$, which is often useful. Also
$\ip{X}{AY}=X^TAY$ so computing inner products is now under the umbrella
of matrix multiplication. This observation is quite valuable in
computations.
\bigskip
{\bf \qquad Derivatives of Vectors}
\smallskip
{\bf D-1.} If $X(t) = (x_1(t),\ldots,x_n(t))$ describes a curve in
$\R^n$, then its {\it derivative} is
$$
X'(t) = \frac{dX(t)}{dt} = (x_1'(t),\ldots,x_n'(t)).
$$
One can think of this as the {\it velocity vector}. It is tangent to
the curve.
\smallskip
{\it Example}: If $X(t) = (2\cos t, 2\sin t)$, then this curve is a
circle of radius $2$, traversed counterclockwise. Its velocity is
$X'(t) = (-2\sin t, 2\cos t)$ and its {\it speed} $\norm{X'(t)} = 2$.
For instance, $X'(0) = (0,2)$ is the tangent vector at $X(0) = (2,0)$.
The curve $Y(t) =(2\cos 3t, 2\sin 3t) $ also describes the motion of a
particle around a circle of radius $2$, but in this case the speed is
$\norm{Y'(t)} = 6$
\medskip
{\bf D-2.} {\sc derivative of the inner product:} If $X(t)$ and $Y(t)$
are two curves, then
\begin{equation}\label{eq:derivip}
\frac{d}{dt} \ip{X(t)}{Y(t)}= \ip{\frac{dX(t)}{dt}}{Y(t)}
+\ip{X(t)}{\frac{dY(t)}{dt}}.
\end{equation}
or, more briefly, $\ip{X}{Y}' = \ip{X'}{Y} + \ip{X}{Y'}$.
To prove this one simply uses the rule for the derivative of a product
of functions. Thus
\begin{align*}
\frac{d}{dt} \ip{X(t)}{Y(t)} &= \frac{d}{dt}(x_1y_1 + x_2y_2 + \cdots) \\
&= (x_1'y_1 + x_1y_1') + (x_2'y_2 + x_2y_2') + \cdots \\
&= (x_1'y_1 + x_2'y_2 + \cdots) + (x_1y_1' + x_2y_2' + \cdots ) \\
&= \ip{X'}{Y} + \ip{X}{Y'}.
\end{align*}
\smallskip
{\it Example}:
\begin{equation} \label{deriv_norm}
\frac{d}{dt}\norm{X(t)}^2 = \frac{d}{dt}\ip{X(t)}{X(t)} = 2\ip{X(t)}{X'(t)}.
\end{equation}
\smallskip
As a special case, if a particle moves at a constant distance $c$ from
the origin, $\norm{X(t)} = c$, then $0 = dc^2/dt = d\norm{X(t)}^2/dt =
2\ip{X(t)}{X'(t)}$. In particular, if a particle moves on a circle or
a sphere, then the position vector $X(t)$ is always perpendicular to
the velocity $X'(t)$. This also shows that the tangent to a circle,
$X'(t)$, is perpendicular to the radius vector, $X(t)$.
\medbreak
{\bf \qquad Orthogonal Projections}
\smallskip
{\bf Proj-1.} {\sc Orthogonal projection onto a line:}
Let $X$ and $Y$ be given vectors. We would like to write $Y$ in the
form $Y = cX + V$, where $V$ is perpendicular to $X$. Then the vector
$cX$ is the {\bf orthogonal projection} of $Y$ in the line determined by the
vector $X$.
\vskip 12pt
\hbox{\vbox{\hangindent=-2.0in \hangafter=1
How can we find the constant $c$ and the vector $V$? We use the only
fact we know: that $V$ is supposed to be perpendicular to $X$. Thus we take
the inner product of $Y = cX + V$ with $X$ and conclude that
$\langle X,\,Y \rangle = c\langle X,\,X\rangle$, that is
$$
c = \frac{\ip{X}{Y}}{\norm{X}^2}.
$$}
\kern -1.8in\vbox{\epsfig{figure=ip1.eps,scale=0.80}\vskip 10pt}}
Now that we know $c$, we can simply define $V$ by the obvious formula
$V = Y -cX$.
At first this may seem circular. To convince your self that this works, let
$X = (1,1)$, and $Y= (2,3)$. Then compute $c$ and $V$ and draw a sketch
showing $X,\,Y,\,cX$, and $V$.
Since $cX \perp V$, we can use the Pythagorean Theorem to conclude that
$$
\|Y\|^2 = c^2\|X\|^2 + \|V\|^2 \geq c^2\|X\|^2.
$$
From this, using the explicit value of $c$ found above we conclude that
$$
\|Y\|^2 \geq \left(\frac{\ip{X}{Y}}{\norm{X}^2}\right)^2\norm{X}^2.
$$
and obtain the {\it Schwarz inequality\/}
\begin{equation}\label{schwarz}
|\langle X,\, Y\rangle| \leq \|X\|\|Y\|.
\end{equation}
Notice that this was done without trigonometry. It used only the
properties of the inner product.
\medskip
{\bf Proj-2.} {\sc Orthogonal projection into a subspace}.
If a linear space has an inner product and $S$ is a subspace of it, we
can discuss the orthogonal projection of a vector into that subspace.
Given a vector $Y$, if we can write
$$
Y = U + V,
$$
where $U$ is in $S$ and $V$ is perpendicular to $S$, then we call $U$
the projection of $Y$ into $S$ and $V$ the projection of $Y$
perpendicular to $S$. The notation $U=P_SY,\ V=P_S^\perp Y$ is
frequently used for this projection $U$.
\begin{center}
{\vbox{\epsfig{figure=project2b.eps,scale=0.70}}}
\end{center}
By the Pythagorean theorem
$$
\|Y\|^2 = \|U\|^2 + \|V\|^2, \qquad (U =P_SY,\ V=P_S^\perp Y).
$$
It is easy to show that {\em the projection $P_SY$
is closer to $Y$ than any other point in $S$}. In other words,
$$
\|Y - P_SY\| \leq \|Y - X\| \qquad \hbox{\rm for all $X$ in $S$}.
$$
To see this, given any $X \in S$ write $Y - X = (Y-P_SY) + (P_SY -X)$
and observe that $Y-P_SY$ is perpendicular to $S$ while $P_SY$ and
$X$, and hence $P_SY -X$ are in $S$. Thus by the Pythagorean Theorem
$$
\|Y - X\|^2 = \|Y - P_SY\|^2 + \|P_SY -X\|^2 \geq \|Y - P_SY\|^2.
$$
This is what we asserted.
\vskip 25pt
\bigbreak
\begin{center}
{\large \bf Problems on Vectors}
\end{center}
\begin{problems}
\item
\begin{parts}
\item For which values of the constant $a$ and $b$ are the vectors
$U = (1+a, -2b, 4)$ and $V = (2, 1, -1) $ perpendicular?
\item For which values of the constant $a$, and $b$ is the above vector
$U$, perpendicular to both $V$ and the vector $W = (1, 1, 0)$?
\end{parts}
\item Let $X = (3, 4, 0)$ and $Y = (1,-,1)$.
\begin{parts}
\item Write the vector $Y$ in the form $Y = cX + V$, where $V$ is
orthogonal to $X$. Thus, you need to find the constant $c$ and the
vector $V$.
\item Compute $\norm{X}$, $\norm{Y}$, and $\norm{V}$ and verify the
Pythagorean relation
$$
\norm{Y}^2 = \norm{cX}^2 + \norm{V}^2.
$$
\end{parts}
\item {\sc [converse of the Pythagorean theorem]} If $X$ and $Y$ are real
vectors with the property that the Pythagorean law holds: $\norm{X}^2
+ \norm{Y}^2 = \norm{X+Y}^2$, then $X$ and $Y$ are orthogonal.
\item If a vector $X$ is written as $X = aU + bV$, where
$U$ and $V$ are non-zero orthogonal vectors, show that $a =
\ip{X}{U}/\norm{U}^2$ and $b = \ip{X}{V}/\norm{V}^2$.
\item The origin and the vectors $X$, $Y$, and $X+Y$ define
a parallelogram whose diagonals have length $X+Y$ and $X-Y$. Prove
the {\it parallelogram law}
$$
\norm{X+Y}^2 + \norm{X-Y}^2 = 2\norm{X}^2 + 2\norm{Y}^2;
$$
This states that in a parallelogram, the sum of the squares of the
lengths of the diagonals equals the sum of the squares of the four
sides.
\item
\begin{parts}
\item Find the distance from the point $(2,\,-1)$ to the straight line
$3x -4y = 0$.
\item Find the distance from the straight line $3x -4y = 10$ to the
origin.
\item Find the distance from the straight line $ax + by = c$ to the
origin.
\item Find the distance between the parallel lines $ax + by = c$ and
$ax + by = \gamma$.
\item Find the distance from the plane $ax +by +cz = d$ to the origin.
\end{parts}
\item The equation of a straight line in $\R^3$ can be written as $X(t)
= X_0 + tV$, $-\infty < t < \infty$, where $X_0$ is a point on the line
and $V$ is a vector along the line (in a physical setting, $V$ might be
the {\it velocity} vector).
\begin{parts}
\item Find the distance from this line to the origin.
\item If $Y(s) = Y_0 +sW$, $-\infty < s < \infty$, is another straight
line, find the distance between these straight lines.
\end{parts}
\item Let $P_1$, $P_2$, \ldots, $P_k$ be points in $\R^n$. For
$X\in\R^n$ let
$$
Q(X):= \norm{X-P_1}^2 + \norm{X-P_2}^2 + \cdots \norm{X-P_k}^2.
$$
Determine the point $X$ that minimizes $Q(X)$.
\item
\begin{parts}
\item If $X$ and $Y$ are real vectors, show that
$$
\ip{X}{Y} =\frac{1}{4}\left(\norm{X+Y}^2 - \norm{X-Y}^2\right).
$$
This formula is the simplest way to recover properties of the inner
product from the norm.
\item As an application, show that if a square matrix $R$ has the property
that it preserves length, so $\norm{RX} = \norm{X}$ for every vector
$X$, then it preserves the inner product, that is, $\ip{RX}{RY} =
\ip{X}{Y}$ for all vectors $X$ and $Y$.
\end{parts}
\item If one uses the complex inner product \eqref{complexip},
show that the elements $A^*$ are the transpose conjugate, $A^* =
(\overline{a}_{\ell k})$, of the elements of $A = (a_{k \ell})$.
\item
\begin{parts}
\item If a certain matrix $C$ satisfies $\ip{X}{CY} = 0$ for {\it all}
vectors $X$ and $Y$, show that $C=0$.
\item If the matrices $A$ and $B$ satisfy $\ip{X}{AY}= \ip{X}{BY}$ for
all vectors $X$ and $Y$, show that $A=B$.
\end{parts}
\item
\begin{parts}
\item Give an example of a $3 \times 3$ anti-symmetric matrix.
\item If $A$ is any anti-symmetric matrix, show that $\ip{X}{AX} = 0$
for all vectors $X$.
\end{parts}
\item Say $X(t)$ is a solution of the differential equation
$\dfrac{dX}{dt} = AX$, where $A$ is an {\it anti-symmetric} matrix.
Show that $\norm{X(t)} = \text{constant}$.
\end{problems}
\vskip20pt
{\bf \qquad Application to the Method of Least Squares}
\smallskip
{\sc The Problem}. Say you have done an experiment and obtained the data
points $(-1,1)$, $(0, -1)$, $(1,-1)$, and $(2,3)$. Based on some
other evidence you believe this data should fit a curve of the form $y
= a + bx^2$. If you substitute your data $(x_j,y_j)$ into this
equation you find
\begin{align} \label{lsexamp1}
a + b(-1)^2 &= \phantom{-}1 \notag \\
a + b(0)^2\phantom{-} &= -1\\
a + b(1)^2\phantom{-} &= -1 \notag\\
a + b(2)^2\phantom{-} &= \phantom{-}3 \notag
\end{align}
This system of equations is \emph{over determined} since there are
more equations (four) than unknowns (two: $a$ and $b$).
As is the case with almost all overdetermined systems,
it is unlikely they can be solved exactly.
We rewrite these equations in the matrix form $AV = W$, where
$$
A = \begin{pmatrix}
1 & 1 \\
1 & 0 \\
1 & 1 \\
1 & 4
\end{pmatrix}, \quad
V = \begin{pmatrix}
a \\
b
\end{pmatrix},
\quad \text{and} \quad
W = \begin{pmatrix}
\phantom{-}1 \\
-1 \\
-1 \\
\phantom{-}3
\end{pmatrix}
$$
We refer to $A$ as the {\it data matrix} and $W$ as the {\it
observation vector}.
\smallskip
Instead of the probably hopeless task of solving $AV = W$, we instead
seek a vector $V$ that minimizes the error (actually, the square of the
error).
$$
Q(V) := \norm{AV - W}^2.
$$
If we are fortunate and find an exact solution of $AV = W$, so much the
better since then $Q(V) = 0$. We will find this error minimizing
solution in two different ways, one using calculus, another using
projections.
\medbreak
{\bf Summary.} The general problem we are facing is:
\begin{description}
\item [\qquad Given:] A data matrix $A$ and an observation vector $W$,
\item [\qquad To find:] The ``best solution'' of $AV = W$. For us,
``best'' means minimizing the error $Q(V)= \norm{AV - W}^2$.
\end{description}
\bigbreak
{\sc Solution Using Calculus.}
One approach is to use calculus to find the minimum by taking the first
derivative and setting it to zero. We will do this here only using
calculus of one variable (so we won't use partial derivatives, although
using these gives an entirely equivalent approach).
\smallskip
Say $V$ (this is what we want to compute) gives the minimum, so $Q(X)
\ge Q(V)$ for all $X$. We pick an arbitrary vector $Z$ and use the
special family of vectors $X(t) = V+tZ$. Let
$$
f(t) := Q(X(t)) = \norm{AX(t) - W}^2.
$$
Since $Q(X(t)) \ge Q(V) = Q(X(0))$ we know that $f(t) \ge f(0)$ so $f$
has its minimum at $t=0$. Thus $f'(0)=0$. We compute this. From
\eqref{deriv_norm}
$$
f'(t) = 2\ip{AX(t) - W}{AX'(t)} = 2\ip{AX(t) - W}{AZ}.
$$
In particular,
$$
0 = f'(0) = 2\ip{AV - W}{AZ}.
$$
We use \eqref{eq:adj} to rewrite this as $\ip{A^*(AV - W)}{Z} = 0$
(historically, this was one of the first places where the adjoint of a
matrix was used). But
now since $Z$ can be {\it any} vector, by the {\sc Remark} at the end
of property {\bf Ip-5} above, we see that the desired $V$ must satisfy
$$
A^*(AV - W) = 0,
$$
that is,
\begin{equation} \label{normal_eq}
\boxed{A^*AV = A^*W}.
\end{equation}
These are the desired equations to compute $V$. As observed above,
the matrix $A^*A$ is always a square matrix. The fundamental equation
\eqref{normal_eq} is called the {\it normal equation}.
\medskip
{\it Example}: We apply this idea to \eqref{lsexamp1}. Since
$$
A^* = \begin{pmatrix}
1 & 1 & 1 & 1 \\
1 & 0 & 1 & 4
\end{pmatrix},
$$
then
$$
A^*A =\begin{pmatrix}
4 & 6 \\
6 & 18
\end{pmatrix} \qquad \text{and} \qquad
A^*W =\begin{pmatrix}
2 \\
12
\end{pmatrix}.
$$
The normal equations $A^*AV = A^*W$ are then
\begin{align*}
4a + \phantom{1}6b =& 2 \\
6a +18b =& 12.
\end{align*}
Their solution is $a=-1$, $b=1$. Thus the desired curve $y = a+bx^2$
that best fits your data points is $y = -1 +x^2$.
\bigbreak
\noindent {\sc Solution Using Projections.}
As above, given a matrix $A$ and a vector $W$ we want $V$ that
minimizes the error:
$$
Q(V) = \|AV - W\|^2.
$$
Thus, we want to pick $V$ so that the vector $U:=AV$ is as close as
possible to $W$. Notice that $U$ must be in the image of $A$. From the
discussion of projections (see {\bf Proj-2} above), we want to let $U$
be the orthogonal projection of $W$ into the image of $A$.
\medskip
How can we compute this? Notice that $AV - W$ will then be
perpendicular to the image of $A$. In other words, $AV - W$ will be
perpendicular to all vectors of the form $AZ$ for any vector $Z$.
Thus by \eqref{eq:adj} above
$$
0 = \ip{AZ}{AV - W} = \ip{Z}{A^*(AV - W)}.
$$
But now since the right side holds for {\it all\/} vectors $Z$ we can
apply the {\sc Remark} at the end of {\bf Ip-5} above to conclude that
\begin{equation} \label{eq:normal}
A^*AV = A^*W.
\end{equation}
These again are the {\bf normal equations\/} for $V$ and are what we
sought. Of course they are identical to those obtained above
using calculus. Although this may seem abstract, it is easy to compute
this explicitly.
\bigskip
{\it Example}: Here is a standard example using the normal equations.
Say we are given $n$ experimental data points $(x_1,\,y_1),\ (x_2,\,
y_2),\ldots, (x_n,\,y_n)$ and want to find the straight line $y = a +
bx$ that fits this data best. How should be proceed? Ideally we want
to pick the coefficients $a$ and $b$ so that
\begin{eqnarray*}
a + bx_1 & = & y_1 \\
a + bx_2 & = & y_3 \\
\cdots & & \\
a + bx_n & = & y_n.
\end{eqnarray*}
These are $n$ equations for the two unknowns $a,\,b$. If $n > 2$ it
is unlikely that we can solve them exactly. We write the above
equations in matrix notation as $AV = Y$, that is,
$$
AV = \left(\begin{array}{cc}
1 & x_1 \\
1 & x_2 \\
\cdots & \cdots \\
1 & x_n
\end{array}\right)
\left(\begin{array}{c}
a \\
b
\end{array}\right) =\left(\begin{array}{c}
y_1 \\
y_2 \\
\cdot \cdot \\
y_n
\end{array}\right) = Y.
$$
Next we want the normal equations $A^*AV = A^*Y$. Now
$$
A^*A = \left(\begin{array}{cccc}
1 & 1 & \cdots & 1 \\
x_1 & x_2 & \cdots & x_n
\end{array}\right)
\left(\begin{array}{cc}
1 & x_1 \\
1 & x_2 \\
\cdots & \cdots \\
1 & x_n
\end{array}\right)
= \left(\begin{array}{cc}
n & \ \sum x_j \\
\sum x_j & \ \sum x_j^2
\end{array}\right).
$$
The computation of $A^*Y$ is equally straightforward so the normal equations
are two equations in two unknowns:
\begin{equation}\label{best-line}
\left(\begin{array}{cc}
n & \ \sum x_j \\
\sum x_j & \ \sum x_j^2
\end{array}\right)
\left(\begin{array}{c}
a \\
b
\end{array} \right) =
\left(\begin{array}{c}
\sum y_j \\
\sum x_jy_j
\end{array}\right).
\end{equation}
These can be solved using high school algebra. The solution is:
\begin{equation} \label{eq:ls-linear}
y - \overline{y} = m(x - \overline{x}),
\end{equation}
where
$${\overline{x} = \frac{1}{n}\sum_{1 \le j \le n} x_j, \qquad
\overline{y} = \frac{1}{n}\sum_{1 \le j \le n} y_j}, \quad \text{and} \quad
m = \frac{\sum (x_j-\overline{x})(y_j-\overline{y})}{\sum (x_j-\overline{x})^2}.
$$
Notice that the straight line \eqref{eq:ls-linear} passes through
$(\overline{x},\overline{y})$. The equations \eqref{best-line} are particularly
simple to solve if $\overline{x}=0$ and $\overline{y}=0$. The general case is
reduced to this special case by the natural substitution
\ $\hat{x}_j=x_j-\overline{x}, \quad \hat{y}_j=y_j-\overline{y}$. I used this to
get \eqref{eq:ls-linear}.
\smallskip
In these and related computations it is useful to introduce the data as
vectors:
$$
x = (x_1,x_2,\ldots,x_n) \quad \text{and} \quad
y = (y_1,y_2,\ldots,y_n)
$$
and, in occasionally confusing notation, identify the average
$\overline{x}$ with the vector $\overline{x} = (\overline{x},\ldots,\overline{x})$ having
$n$ equal components $\overline{x}$. We also use the ``data inner product'' and
``data norm''
$$
\ll \!x,y\!\gg = x_1y_1+x_2y_2 + \ldots x_ny_n \qquad
\vrule height8pt width 1.0pt \,x\, \vrule height8pt width 1.0pt{\,}^2 = \ll
\!x,x\!\gg.
$$
In statistics, $\ll \!x-\overline{x},y-\overline{y}\!\gg $ is called the
{\it covariance of x and y} and write $\text{Cov}\,(x,y)$. Using this
notation the slope of the above line is $m = \ll
\!x-\overline{x},y-\overline{y}\!\gg\! /\, \vrule height8pt width 1.0pt
\,x-\overline{x}\, \vrule height8pt width 1.0pt{\,}^2$. Of special
importance is the {\it correlation coefficient}
$$
r(x,y) = \frac{\ll \!x-\overline x,y -\overline y\!\gg}
{\vrule height8pt width 1.0pt \,x-\overline x\, \vrule height8pt width 1.0pt \,
\vrule height8pt width 1.0pt \,y-\overline y\, \vrule height8pt width 1.0pt}.
$$
This measures how closely the data points $(x_j,\,y_j)$ fit the
straight line. The Schwarz inequality asserts that $\abs{r(x,y)} \le
1$. If $r(x,y) = +1$ the data lies along a straight line with positive
slope, while if $r(x,y) = -1$ the data lies along a straight line with
negative slope. If $r(x,y) = 0$ the data forms a cloud and does not
really seem to lie along any straight line. See most statistics books
for a more adequate discussion along with useful examples.
\medskip
Identical methods can be used to find, for instance, the cubic
polynomial $y = a + bx + cx^2 + dx^3$ that best fits some data, or the
plane $z = a + bx + cy$ that best fits given data. The technique of
least squares is widely used in all area where one has experimental
data. The key feature is that the equations be {\it linear} in the
unknown coefficients $a$, $b$, etc. However, even if the equations
are not linear in the unknown coefficients $a$, $b$, etc., frequently
one can find an equivalent problem to which the techniques apply. The
following example illustrates this.
\medskip
{\it Example}: Say we are given $n$ experimental data points
$(x_1,\,y_1)$, $(x_2,\, y_2)$, \ldots, $(x_n,\,y_n)$ and seek an
exponential curve $y = ae^{bx}$ that best fits this data. Ideally we
want to pick the coefficients $a$ and $b$ so that
\begin{eqnarray*}
ae^{bx_1} = & y_1 \\
ae^{bx_2} = & y_2 \\
\cdots & \\
ae^{bx_n} = & y_n.
\end{eqnarray*}
These are $n$ equations for the two unknowns $a,\,b$. However, they
are nonlinear in $b$ so the method of least squares does not directly
apply. To get around this we take the (natural) logarithm of each of
these equations and obtain
\begin{eqnarray*}
\alpha + bx_1 = & \ln y_1 \\
\alpha +bx_2 = & \ln y_2 \\
\cdots & \\
\alpha + bx_n = & \ln y_n,
\end{eqnarray*}
where $\alpha = \ln a$. These modified equations are {\it linear} in
the unknowns $\alpha$ and $b$, so we can apply the method of least
squares. After we know $\alpha$, we can recover $a$ simply from
$a=e^\alpha$.
\smallskip
{\sc Remark.} Say one wants to fit data to the related curve $y =
ae^{bx} + c$. I don't know any way to do this using least squares,
where one eventually solves a linear system of equations (the normal
equations). For this problem it seems that one must solve a {\it
nonlinear} system of equations, which is much more difficult.
\bigskip
{\it Example}: This is similar to the previous example. Say we are
given $n$ experimental data points $(x_1,\,y_1)$, $(x_2,\, y_2)$,
\ldots, $(x_n,\,y_n)$ and seek a curve of the form $y = \dfrac{ax}{1 + bx^2}$
that best fits this data. Ideally we want to pick the coefficients
$a$ and $b$ so that
\begin{eqnarray*}
\frac{ax_1}{1+bx_1^2} = & y_1 \\
\frac{ax_2}{1+bx_2^2} = & y_2 \\
\cdots & \\
\frac{ax_n}{1+bx_n^2} = & y_n.
\end{eqnarray*}
These are $n$ equations for the two unknowns $a,\,b$. However, they
are nonlinear in $b$ so the method of least squares does not apply
directly. To get around this we rewrite the curve as $y(1+bx^2) = ax$,
that is, $ax -bx^2y = y$. This equation is now {\it linear} in the
unknown coefficients $a$ and $b$. We want to pick these to solve the
equations
\begin{eqnarray*}
ax_1 -bx_1^2y_1 = & y_1 \\
ax_2 -bx_2^2y_2 = & y_2 \\
\cdots \qquad & \cdots \\
ax_2 -bx_n^2y_n = & y_n.
\end{eqnarray*}
with the least error. These are linear equations of the form $AV = W$,
where the data matrix is
$$
A = \begin{pmatrix} x_1 & -x_1^2y_1 \\
x_2 & -x_2^2y_2 \\
\cdots & \cdots \\
x_n & -x_n^2y_b
\end{pmatrix}
$$
so we solve the normal equations $A^*AV = A^*W$ as before.
\vskip 25pt
\bigbreak
\begin{center}
{\large \bf Problems Using Least Squares}
\end{center}
\begin{problems}
\item Use the Method of Least Squares to find the straight line $y = ax +b$
that best fits the following data given by the following four points
$(x_j,y_j)$, $j = 1,\ldots,4$:
$$
(-2,4), \qquad (-1,3), \qquad (0,1), \qquad (2,0).
$$
Ideally, you'd like to pick the coefficients $a$ and $b$ so that the
four equations $ax_j + b = y_j$, $j=1,\ldots,4$ are all satisfied.
Since this probably can't be done, one uses least squares to find the
best possible $a$ and $b$.
\medskip
\item Find a curve of the form $y = a + bx + cx^2$ that best fits the
following data
\begin{center}
\begin{tabular}{|l ||c|c|c|c|c|c|c|}\hline
$\Strut x$ & -2 & -1 & 0 & 1 & 2 & 3 & 4\\ \hline
$\Strut y$ & 4 & 1.1 & -0.5 & 1.0 & 4.3 & 8.1 & 17.5\\ \hline
\end{tabular}
\end{center}
\medskip
\item Find a plane of the form $z = ax + by + c$ that best fits the
following data
\begin{center}
\begin{tabular}{|l ||c|c|c|c|c|c|c|}\hline
$\Strut x$ & 0 & 1 & 0 & 1 & 0 \\ \hline
$\Strut y$ & 0 & 1 & 1 & 0 & -1 \\ \hline
$\Strut z$ & 1.1 & 2 & -0.1 & 3 & 2.2 \\ \hline
\end{tabular}
\end{center}
\medskip
\item The water level in the North Sea is mainly determined by the
so-called M2 tide, whose period is about 12 hours. The height $H(t)$
thus roughly has the form
$$
H(t) = c + a \sin (2 \pi t/12) + b \cos(2 \pi t/12),
$$
where time $t$ is measured in hours (note $\sin (2 \pi t/12$ and $\cos
(2 \pi t/12)$ are periodic with period $12$ hours). Say one has the
following measurements:
\smallskip
\begin{center}
\begin{tabular}{|l |c|c|c|c|c|c|}\hline
$\Strut\quad t$ \ (hours) \quad & 0 & 2 & 4 & 6 & 8 & 10 \\ \hline
$\Strut H(t)$ \ (meters) & 1.0 & 1.6 & 1.4 & 0.6 & 0.2 & 0.8 \\ \hline
\end{tabular}
\end{center}
\smallskip
Use the method of least squares with these measurements to find the
constants $a$, $b$, and $c$ in $H(t)$ for this data.
\medskip
\item a). Some experimental data $(x_i,\,y_i)$ is believed to fit a curve
of the form
$$
y = \frac{1+x}{a + bx^2},
$$
where the parameters $a$ and $b$ are to be determined from the data.
The method of least squares does not apply directly to this since the
parameters $a$ and $b$ do not appear linearly. Show how to find a
modified equation to which the method of least squares does apply.
\smallskip
b). Repeat part a) for the curve $\displaystyle y = \frac{1}{a + bx}$.
\smallskip
c). Repeat part a) for the curve $\displaystyle y = \frac{x}{a + bx}$.
\smallskip
d). Repeat part a) for the curve $y =ax^b$.
\smallskip
e). Repeat part a) for the {\it logistic curve}
$\displaystyle y = \frac{L}{1 + e^{a- bx}}$. Here the constant $L$ is
assumed to be known. {\small [If $b > 0$, then $y$ converges to $L$ as
$x$ increases. Thus the value of $L$ can often be estimated simply by
eye-balling a plot of the data for large $x$.]}
\smallskip
f). Repeat part a) for the curve $y =1 -e^{-ax^b}$.
\smallskip
g) Repeat part a) for the curve $\displaystyle y = \frac{a + mx}{b + x}$
assuming the constant $m$ is known. [One might find $m$ from the data
since $y$ tends to $m$ for $x$ large.]
\smallskip
h). Repeat part a) for the curve $\displaystyle y = \frac{a}{1 + b
\sin x}$
\medskip
\item The comet Tentax, discovered only in 1968, moves within the solar
system. The following are observations of its position $(r, \theta)$
in a polar coordinate system with center at the sun:
\smallskip
\begin{center}
\begin{tabular}{|l |c|c|c|c|c|}\hline
$\Strut \quad r \quad $ & 2.70 & 2.00 & 1.61 & 1.20 & 1.02 \\ \hline
$\Strut\quad \theta\quad $ & 48 & 67 & 83 & 108 & 126\\ \hline
\end{tabular}
\end{center}
(here $\theta$ is an angle measured in degrees).
By Kepler's first law the comet should move in a plane orbit whose
shape is either an ellipse, hyperbola, or parabola (this assumes the
gravitational influence of the planets is neglected). Thus the polar
coordinates $(r,\theta)$ satisfy
$$
r = \frac{p}{1 - e \cos \theta}
$$
where $p$ and the eccentricity $e$ are parameters describing the orbit.
Use the data to estimate $p$ and $e$ by the method of least squares.
Hint: Make some (simple) preliminary manipulation so the parameters $p$
and $e$ appear {\it linearly}; then apply the method of least squares.
%\begin{minipage}[t]{5.4in}
%\hbox{\vbox{\hangindent=-1.80in \hangafter=0
\item {\bf Plotting graphs}% (see the graph on the right).
This problem concerns the straight line in the plane that passes
through the two points $(4,0)$ and $(0,2)$ (draw a sketch). This will
be useful for the next problem.
\begin{parts}
\item If the horizontal axis is $x$ and the vertical axis $y$, what
is the equation for $y$ as a function of $x$?
\item If the horizontal axis is $\log x$ and the vertical axis $y$,
what is the equation for $y$ as a function of $x$? \item If the
horizontal axis is $x$ and the vertical axis $\log y$, what is the
equation for $y$ as a function of $x$?
\item If the horizontal axis is $\log x$ and the vertical axis $\log
y$, what is the equation for $y$ as a function of $x$?
\end{parts}
%\end{minipage}
%\hskip -5.0cm \vbox{\scalebox{1.0}{\includegraphics{1-8}}}
\item For each of the seven closest planets, Kepler, using data from
Bruno, knew the distance $r$ from the planet to the sun (in million
km)and the time $T$ it takes to orbit the sun (the length in earth
days of a year on that planet).
\begin{center}
\begin{tabular}{|| c | c | c | c | c | c | c | c ||}\hline
\medstrut & Mercury & Venus & Earth & Mars & Jupiter & Saturn &
Uranus\\ \hline
\Strut r & 60 & 110 & 150 & 230 & 780 & 1430 & 2870\\ \hline
\Strut T & 90 & 225 & 365 & 690 & 4330 & 10750 & 30650\\ \hline
\end{tabular}
\end{center}
Kepler sought a formula relating $r$ and $T$. It took him a
long time; he did not have logarithms. Guided by the idea of using
graphs
as in the previous problem, you can do this fairly easily.
Make four experimental graphs of this data (as in the previous problem
just above). The goal is to hope one of these four curves looks
roughly like a straight line. If it does, then use least squares to
find the ``best'' straight line -- and then the desired formula for
the
relation between $r$ and $T$.
[Since the data is only approximate and since we anticipate a
``simple'' answer, you may find it appropriate to use your numerical
results to lead you to a simpler formula.]
\item Let $A:\R^n \to \R^k$ be a linear map. If $A$ is not
one-to-one, but the equation $Ax=y$ has some solution, then it has
many. Is there a ``best'' possible answer? What can one say? Think
about this before reading the next paragraph.
If there is some solution of $Ax=y$, show there is exactly one
solution
$x_1$ of the form $x_1=A^*w$ for some $w$, so $AA^*w=y$. Moreover of
all the solutions $x$ of $Ax=y$, show that $x_1$ is closest to the
origin (in the Euclidean distance). [{\sc Remark:} This situation is
related to the case where where $A$ is not onto, so there may not be a
solution --- but the method of least squares gives an ``best''
approximation to a solution.]
\item Let $P_1$, $P_2$,\dots, $P_k$ be $k$ points (think of them as
{\it data}) in $\R^3$ and let $\Cal{S}$ be the plane
$$
\Cal{S} := \left\{X\in \R^3\,:\, \ip{X}{N}=c\right\},
$$
where $N\ne 0$ is a unit vector normal to the plane and $c$ is a real
constant.
This problem outlines how to find the plane that {\it best
approximates
the data points} in the sense that it minimizes the function
$$
Q(N,c):= \sum_{j=1}^k \text{distance}\,(P_j,\Cal{S})^2.
$$
Determining this plane means finding $N$ and $c$.
\begin{parts}
\item Show that for a given point $P$, then
$$
\text{distance}\,(P,\Cal{S})
=\abs{\ip{P-X}{N}}= \left|\ip{P}{N} - c\,\right|,
$$
where $X$ is any point in $\Cal{S}$
\item First do the special case where the center of mass
$\overline{P}:=\frac{1}{k}\sum_{j=1}^kP_j$ is at the origin, so\,
$\overline{P}=0$. Show that for any $P$, then$\ip{P}{N}^2=\ip{N}{PP^*N}$.
Here view $P$ as a column vector so $PP^*$ is a $3 \times 3$ matrix.
Use this to observe that the desired plane $\Cal{S}$ is determined
by letting $N$ be an eigenvector of the matrix
$$
A:= \sum_{j=1}^k P_jP_j^T
$$
corresponding to it's lowest eigenvalue. What is $c$ in this case?
\item Reduce the general case to the previous case by letting
$V_j=P_j-\overline{P}$.
\item Find the equation of the line $ax+by=c$ that, in the above
sense, best fits the data points
%$(-1,3.1)$, $(0,1)$, $(1,-1.2)$, $(2,-3)$.
$(-1,3)$, $(0,1)$, $(1,-1)$, $(2,-3)$.
\item Let $P_j := (p_{j1},\ldots,p_{j3})$, $j=1,\ldots,k$ be the
coordinates of the $j^{\text{th}}$ data point and
$Z_\ell := (p_{1\ell},\ldots,p_{k\ell})$,
$\ell=1,\ldots,3$ be the vector of $\ell^{\text{th}}$ coordinates.
If $a_{ij}$ is the $ij$ element of $A$, show that
$a_{ij} = \ip{Z_i}{Z_j}$. Note that this exhibits $A$ as a
{\it Gram matrix }. % (see Problem \ref{Gram}).
\item Generalize to where $P_1$, $P_2$,\dots, $P_k$ are $k$ points in
$\R^n$.
\end{parts}
\end{problems}
\end{document}
Junk
\begin{align} \label{lsexamp2}
x_1 + x_2 &= 2 \notag \\
x_1 - x_2 &= 0\\
2x_1 - x_2 &= 2 \notag
\end{align}
Clearly the only solution of the first two equations is $x_1=x_2=1$, but
this does not satisfy the third equation. One repeatedly meets
situations such as this when one wants to fit experimental data to some
simple curve.
Here
$\displaystyle
A = \begin{pmatrix}
1 & \phantom{-}1 \\
1 & -1 \\
2 & -1
\end{pmatrix}$ and
$\displaystyle W = \begin{pmatrix} 2 \\ 0 \\ 2 \end{pmatrix}$.
Thus $\displaystyle A^*A = \begin{pmatrix} 6 & 2\\ 2 & 3 \end{pmatrix}$
and $\displaystyle A^*W = \begin{pmatrix} 6 \\ 6 \end{pmatrix}$. The
unique solution of the normal equation \eqref{normal_eq} is then found
to be $V = (3/7, 24/7)$.