Metric Spaces, Cauchy Schwarz Inequality, Triangular Inequality

Cauchy Schwarz Inequality, Triangular Inequality

In the last section we proved the triangular inequality for n dimensional space, but that was rather indirect. Finite products of arbitrary metric spaces produce a valid metric, and when this general result is applied to coordinate axes you get n dimensional space, with its euclidean metric. Hence the triangular inequality is valid in Rⁿ. But that proof is hard to understand without a substantial background in point-set topology, and it doesn't address the Cauchy Schwarz inequality. In this section we will prove the triangular inequality and the Cauchy (biography) Schwarz (biography) inequality using pure algebra.

Remember that the triangular inequality says that one side of a triangle is never longer than the sum of the other two sides. Well this is obvious really. Try drawing a triangle whose sides are 2 3 and 17. But sometimes we need to prove the obvious, because sometimes the obvious isn't true in higher dimensions, or in spaces that are substantially different from the world around us. An electron obviously can't pass through two slits at once, but it does. So let's prove the triangular inequality in n dimensions. You'll sleep better at night.

Follow Your Nose

I'd like to begin with a simple, follow-your-nose proof, where you assume the triangular inequality, do some algebra, and run into something that has to be true; hence the original inequality is true. I'll also restrict attention to the xy plane, where the algebra is simpler. Keep in mind however, that this proof does not generalize to other spaces, such as the continuous functions on a closed interval. (We'll deal with those spaces later.)

Although this proof is restricted to the xy plane, it is actually more powerful than it first appears. Start with any triangle in 3 space, or in n space for that matter, and move it down into the xy plane. You can always move the triangle, as a rigid body, without changing the lengths of its sides. So if the triangular inequality holds in the xy plane, it applies to the same triangle floating in 3 space, or n space. This simple assertion masks a great deal of linear algebra, but the intuition is so compeling, you almost don't need to prove it. We'll just assume you can move the triangle down into the xy plane, or if you prefer, relabel the coordinates of n space so that the (new) xy plane contains the triangle. Prove the theorem in R², and you have proved it for Rⁿ as well.

Given any triangle in the xy plane, move the triangle, or relabel the coordinates, so that one of the three corners is at the origin, another is at a,0 (on the x axis), and the third is at b,c. Now the sides have lengths:

a
sqrt(b²+c²)
sqrt((a-b)²+c²)

We want to show that the first distance + the second is at least as large (and probably larger) than the third.

a + sqrt(b²+c²) ≥ sqrt((a-b)²+c²)

As a number gets larger, its square gets larger. So we can square both sides without disturbing the inequality.

a²+b²+c² + 2a×sqrt(b²+c²) ≥ a²+b²+c² - 2ab

This simplifies to:

sqrt(b²+c²) ≥ -b

The left side is positive. If b is positive we're done. So let b be negative. Now the right side is the absolute value of b. The left side would be the same, if c was 0. If c is nonzero then b²+c² is larger, and its square root is larger than |b|. The inequality holds.

It is a precise equality when b is negative and c is 0. This makes sense; just draw the triangle. It's flat, and the sum of the two sides equals the third.

The thing about a follow-your-nose proof is, you started by assuming what you wanted to prove, then ran into something that has to be true. This is ok, as long as you know all the steps are reversible, so that something that is always true implies what you want to prove. In this case everything is reversible, the steps are all iff, and we're ok.

Here's a follow-your-nose proof that doesn't work. Start with 3 = -3, something we want to prove. Square both sides, and sure enough, 9 = 9, which is always true. So I guess 3 = -3, right? Wrong, because the steps cannot be reversed. But we can reverse the steps in the triangular inequality proof, because I knew the distances were all positive from the outset. It's ok to take the square root of both sides of the second inequality, to get back to the triangular inequality. We'll see this again in the next section, when we connect the triangular inequality with Cauchy Schwarz. All steps are reversible, and either implies the other.

Cauchy Schwarz Inequality

The Cauchy Schwarz inequality is equivalent to the triangular inequality, at least in n space. But Cauchy Schwarz generalizes to other spaces, such as complex continuous functions on closed intervals. That's why advanced textbooks often prove Cauchy Schwarz first, then extend this result to various abstract spaces, then bring the triangular inequality back in, almost as a corollary. Let's give it a whirl.

Let a₁ a₂ … a_n and b₁ b₂ … b_n be two vectors in n space. Let p be the sum of a_i², let q be the sum of a_ib_i, and let r be the sum of b_i². Using the notation of dot products, p = a.a, q = a.b, and r = b.b. We will show that q² ≤ p×r. This is the Cauchy Schwarz inequality.

Let x be any real number. Consider the sum of (a_i×x+b_i)². Since this is a sum of squares, it is 0 only if each term is 0. In other words, this sum is 0 only if the bector b is a multiple of the vector a, where x is the scaling factor. For all other values of x, the sum is positive.

Write our nonnegative sum as px² + 2qx + r, where p q and r are defined as above. Note that p and r are sums of squares, and cannot be negative. If either p or r is 0, then a or b is the zero vector, q is 0, and we have q² = pr.

With p nonzero, the quadratic expression attains its minimum when x = -q/p. Substitute x = -q/p and obtain -q² + pr ≥ 0. This gives q² ≤ pr, which proves the Cauchy Schwarz inequality.

As mentioned before, the inequality is strict, unless one vector is a scalar multiple of the other.

So what does this have to do with the triangular inequality? Let a and b be vectors in n space. We would like to show that the length of a+b is less than or equal to the length of a plus the length of b. In fact they are only equal if one vector is a scale multiple of the other. square |a+b| ≤ |a| + |b|, and cancel the squared terms from each side. The left side is twice the sum of (a_ib_i). This was 2q in our earlier notation. Continuing that notation, the right side becomes 2sqrt(pr). Square both sides again to get q² ≤ pr. Revers these steps to show Cauchy Schwarz implies the triangular inequality in n space.

Verify that the set of absolutely convergent series is a real vector space. For any two series a and b, we can build the product series with terms a_i×b_i. Once a and b drop below 1, they both dominate the product series, hence it converges absolutely. The same is true when the terms of a series are squared (special case of a×b).

To find the distance between two series, subtract them, giving another absolutely convergent series, square the terms, take the sum, and take the square root. The distance is 0 only if the two series are identical.

Extend the definitions of p q and r above. We are still adding a_i², a_ib_i, and b_i², but this time the sums are infinite. We would like to compare q² and p×r.

Start with the sum of (a_i×x+b_i)² as before, but this time the sum is infinite. Rewrite the sum as px²+qx+r ≥ 0. This rearrangement is possible thanks to absolute convergence. From here the proof of Cauchy Schwarz, and the triangular inequality, is the same. Therefore the absolutely convergent series form a metric space.

Next consider continuous functions on a closed interval. The product of two functions is continuous, and integrable. If p, q, and r are the integrals of a², a×b, and b², we can prove Cauchy Schwarz. Divide the interval into n subintervals, and Cauchy Schwarz holds for each Riemann sum. The inequality must hold in the limit. Use continuity to show equality iff the functions a and b are scale multiples of each other. Thus we have a metric space, where distance is given by the square root of the integral of (a-b)².

This theorem generalizes to piecewise continuous functions over a closed interval. We leave the functions undefined at the borders, where the continuous pieces touch. Thus two functions are equal across the entire interval iff their distance is 0.

Sometimes we can measure the distance between functions over an open or infinite interval, using indefinite integrals, but keep in mind, the product of two functions may not be integrable. This is illustrated by 1/x² and x, which are both integrable on (0,1], yet their product is not.

Vectors in Complex Space

Let a and b be vectors in complex space, i.e. arrays of complex numbers. In this case the dot product, a.b, is the sum of (a_i × b_i conjugate). Thus a.a is a real number, and sqrt(a.a) is a good candidate for our distance function.

Realize that an array of n complex numbers is also an array of 2n real numbers; just take the real and imaginary parts separately. This provides a 1-1 correspondence between complex space and real space, the latter having twice as many dimensions as the former. The distance functions in the two spaces give exactly the same answer; both give the square root of the sum of the squares of the components. And subtraction (the difference between two points) also gives the same vector in both spaces. Since the triangular inequality holds in real space, it is valid in complex space.

Moving from the triangular inequality back to Cauchy Schwarz is a bit trickier in the world of complex numbers. Let's review our notation. Remember that p = a.a, which is the sum of a_i times a_i conjugate. And r is still b.b. No problem there. Let q₁ equal a.b, which is the sum of a_i times b_i conjugate. Let q₂ be the conjugate of q₁, that is, the sum of b_i times a_i conjugate. Now start again with the triangular inequality.

|a+b| ≤ |a| + |b|

Note that this is an equality iff b is a real scale multiple of a.

Square the triangular inequality and subtract a.a and b.b from both sides. The left side becomes the sum of a_i times b_i conjugate, plus the sum of b_i times a_i conjugate. In other words, the left side is q₁+q₂.

The right side is twice the square root of (a.a)×(b.b). Put this all together to get the following.

q₁ + q₂ ≤ 2sqrt(pr)

Divide through by 2, and remember that q₁ and q₂ are conjugate. So the left side becomes the real component of q₁, or q₂ if you prefer.

re(q₁) ≤ sqrt(pr)

Square both sides again and get complex Cauchy Schwarz.

re(q₁)² ≤ pr

Again, we have equality only when b is a real scale multiple of a.

Can we say anything about re(a.b)? If one of the components of a is u+vi, and the corresponding component of b is x+yi, multiply the former by the conjugate of the latter and get a complex number whose real component is ux+vy. This begins to look like the traditional dot product. In fact, the real part of a.b is exactly the same as the traditional dot product of a and b when they are viewed as real vectors, having twice as many dimensions. In this sense, Cauchy Schwarz is no surprise at all. We have derived exactly the same equation. We just took a walk through complex arithmetic to get there.