We have already written \(g(f(x))\) as a linear function of \(x\) times something, so it suffices to show that the something decays faster than linearly. In this case, this means showing that \(\operatorname{I}\) and \(\operatorname{II}\) can be made smaller than any fixed \(\epsilon\) by taking \(x\) sufficiently close to \(p\).
Because \(R_p(x-p) \rightarrow 0\) as \(x \rightarrow p\), there exists \(\delta_1\) such that \(||x-p|| < \delta_1\) implies
\[ ||R_p (x-p)|| < \frac{\epsilon}{2 (||D_q g|| + 1)}\]
For any such \(x\), it follows that \(||\operatorname{I}(x-p)|| < \epsilon/2\).
Just by boundedness of \(D_p f\) and the fact that \(||R_p (x-p)|| \leq 1\) for \(x\) sufficiently near \(p\), there exists \(\delta_2\) such that \(||x-p|| < \delta_2\) implies that
\[{}|| f(x) - f(p) ||{}\]
\[{}= \Big|\Big| D_p f (x-p) + R_p(x-p) ||x-p|| \Big| \Big|{}\]
\[{}\leq ( ||D_p f|| + 1) ||x-p||.{}\]
Because \(\tilde R_q(y-q) \rightarrow 0\) as \(y \rightarrow q\), there exists \(\eta\) such that \(||y-q|| < \eta\) implies
\[ ||\tilde R_q (y-q)|| < \frac{\epsilon}{2(1 + ||D_p f||)}. \]
So if
\[ ||x-p|| < \min \left\{ \delta_1,\delta_2, \frac{\eta}{1 + ||D_p f||} \right\}, \]
then
\[{}||f(x) - q||{}\]
\[{}= \Big|\Big| D_p f (x-p) + R_p(x-p) ||x-p|| \Big|\Big|{}\]
is less than \(\eta\) (because it is less than \((||D_p f||+1) ||x-p||\), so in particular \(||f(x) - q||\) is small enough that \(||\tilde R_q(f(x) - f(p))|| < \epsilon / (2 (1 + ||D_p f|||))\). Combining these observations gives that \(||\operatorname{II}(x-p)|| \leq \epsilon/2\). Consequently, for all such \(x\neq p\), it follows that
\[ \frac{\left| \left| g(f(x)) - g(q) - (D_q g D_p f)(x-p)\right|\right|}{||x-p||} < \epsilon. \]
This is exactly what is asserted by the Chain Rule (because the total derivative is unique when it exists).