Multivariable Calculus, Total Derivative

Total Derivative

Let p be a point in n dimensional space, and let f be a real valued function, defined in a neighborhood about p. Let v be a vector in n space that starts at p and points "up hill", according to the surface defined by f. The first order taylor approximation in n dimensions is now f(p) + x.v + e(x), where x is a vector heading away from p, x.v is the dot product of the two vectors, and e is the error term. In other words, the components of v tell us how much to rise or fall for each of the orthogonal directions we might travel. Multiply each change in elevation by the component of travel in that direction, and add it all up to get the total change in elevation. And you'll be off by a bit, which is what the error term is for.

If this error term approaches 0 as x approaches 0, then f is continuous at p. However, if the error is an ever decreasing percentage of the length of x, as x approaches 0, then f is differentiable at p. Thus differentiable implies continuous. To state it analytically, f is differentiable if e(x)/|x| approaches 0 as |x| approaches 0, where |x| is the length of the vector x, or the distance from p.

Sometimes we divide e by |x|, then write f = f(p) + x.v + |x|×e(x). Here f is differentiable if e approaches 0 as |x| approaches 0. It's the same definition, just a matter of convention.

Let u be a unit vector, indicating the direction of travel, and let h be a real number, indicating the distance traveled. Thus x is h×u, and |x| = h. What about f(p+x)-f(p) over h? Using the latter definition of e, the limit becomes h×u.v+h×e(x) over h, or u.v+e(x). Yet e approaches 0 for small h, so the directional derivative exists, and is the dot product of u and v. If we are moving in the general direction of v, we are moving up hill. If we are moving away from v, we are moving down hill. If we are moving perpendicular to v, the elevation is constant, as though we were walking along the side of a mountain.

The n partial derivatives are equal to v dotted with the n unit vectors that point along the axes. In other words, the n partials are the n components of the vector v.

Suppose there are two vectors, v and w, that both demonstrate differentiability at the point p. Each has its own error function, which approaches 0 as we stay close to p. Let v and w differ in the j^th component. Now v and w give different answers for the partial of f with respect to the j^th coordinate. There is only one partial in this direction, one right answer, hence there is only one vector v that establishes differentiability of f at p. Call this vector the gradient of f at p. This is written∇f(p). It represents the derivative, or the total derivative, of f at p.

If f is differentiable, we know what the gradient is; its components must be the partials of f with respect to the coordinates in n space. Let's consider an example.

If f(x,y) = xy, ∇f = y,x. If you're looking east, the slope up, or down, is determined by the y coordinate. And as you look north, the slope is determined by the x coordinate. The gradient becomes the vector y,x. At the origin, the gradient is 0, and the surface is flat. Of course it does curve up as you move into the first quadrant, and down as you move into the second quadrant, but if you're small enough, like a tiny ant, the surface looks flat at the origin.

Note that a nontrivial gradient implies an "up hill" direction, precluding a local minimum or maximum. This is analogous to the one dimensional case, where a local maximum or minimum requires a derivative of 0.