Measure Theory, Normed Linear Space

Normed Linear Space

Before we begin, you will want to review the properties of a normed linear space. For example, the continuous functions on [0,1] form a normed linear space, with the square root of the integral of f² acting as the norm. The proof is straightforward, except for the triangular inequality, which is tedious to prove algebraically, but is nonetheless intuitive. The integral is the limit of the riemann sum, and the triangular inequality holds in finite dimensional space. It holds for each riemann sum, and it holds in the limit. This section generalizes the above, by placing a norm on the lebesgue integrable functions over a domain X.

Equivalence Classes

Let L(X,σ,μ) be our linear space. We have to be a bit careful, since function addition is not associative - whence this set isn't even a module, much less a vector space. There is a way around this little problem. Clump the functions together into equivalence classes, using "equal almost everywhere" as the equivalence relation. Remember that f is finite almost everywhere, so there is no harm in squashing the infinite values of f down to 0. The result lies in the same equivalence class. Now - prove addition and scaling are well defined on equivalence classes, and addition is associative, and scaling distributes over addition. We have a vector space.

The p Norm, and Lp Space

Fix a real number p ≥ 1, and the p norm is the integral of |f|^p, all raised to the 1/p power. Of course g = |f| is a measurable, finitely integrable function in L(X,σ,μ), but what about g^p?

Let h() be a continuous function from the nonnegative reals into the nonnegative reals, and consider h(g(X)). Let u₁, u₂, u₃, … be an increasing sequence of simple functions that approach g. Such a sequence can always be built. Since h is continuous, h(u_i) approaches h(g). By the monotone convergence theorem, h(g) is measurable, and the integral of h(g) is the limit of the integrals of h(u_i).

It's tempting to say the integral of h(g) is finite, since the integral of g is finite, but this need not be the case. Let h = y², thus building the traditional norm. Let g be 1 on an interval of length 1/2, then 2 on an interval of length 1/8, then 4 on an interval of length 1/32, then 8 on an interval of length 1/128, and so on. The integral of g is 1, which is certainly finite. Now square g, and the resulting integral is infinite. The norm is not well defined.

We get around this problem in a rather crude way. With p fixed, only functions with finite p norms are considered. This collection of equivalence classes of functions is called Lp space, or Lp(X,σ,μ), if you want to be technically correct.

If q ≤ p, and f is in Lp space, then f is in Lq space. The first integral dominates the second, and forces it to be finite.

Before we get lost in the details, consider a special case that you have probably seen before. Let X be the nonnegative integers, and give X the discrete topology, so that every subset belongs to σ. Let μ be the counting measure, the number of points in the set. Now the integral of f(X) is the infinite sum of f, when f is viewed as a series. The integral is well defined, i.e. f belongs to L(X,σ,μ), iff the series is absolutely convergent. (You can prove all this if you like; it's a good exercise.) Set p = 2 to get the traditional norm. Thus L2(X,σ,μ) is the square summable series. We prove this is a normed linear space, in fact a separable hilbert space, in another section. Measure theory generalizes this in a big way: to any space X, any σ, any μ, and any p ≥ 1. The result will still be a complete normed linear space, which is a banach space.

Let's begin by proving Lp is a linear space. That is, the sum of functions, or scale multiples of a function, still have finite p norms. The latter is easy. Let g be the absolute value of f, multiply g by c, raise to the p, take the integral (which is the integral of g times c^p), then raise this to the 1/p. The norm is |c| times the original norm. This is what we expect from a norm, and it guarantees a finite integral, whence c×f belongs to Lp space, as is required of a vector space.

For the sum, let f and g live in Lp space. Their sum can only grow larger if we replace f and g with their absolute values, so assume f and g are nonnegative. Let h be the max of f and g. Remember that h is measurable. Since h is bounded by f+g, it has a finite integral. In fact, h^p is bounded by f^p+g^p, thus h^p has a finite integral, and h belongs to Lp space. Multiply h by 2 and find another function in Lp space. This bounds f+g, hence f+g lives in Lp space. The result is indeed a linear space.

Let's look at the properties of a norm. When is the norm of g equal to 0? Suppose g(x) = c, for some c > 0. The preimage of c is a measurable set, and if this preimage has a nonzero measure, then g^p has a positive integral. This is a contradiction, hence the preimage of c has measure 0. Crunch g down to 0 across this set, and the resulting function lies in the same equivalence class. In other words, it is the same function.

Apply the above procedure to the range ≥ 1. The preimage has measure 0, else we could assign u = 1 across this set, and find a positive integral contained within the integral of g. Thus the preimage has measure 0, and we can crunch g down to 0 on this part of the domain. Do the same for the range [1/2,1], then the range [1/4,1/2], and so on. The preimages have measure 0, and their countable union has measure 0. Crunch g down to 0 across all these preimages simultaneously. This is still a set of measure 0, hence we have not changed the equivalence class of g. The result is the 0 function. Therefore, the only function with norm 0 is 0, as is required by the definition of a norm.

We already handled scaling by c. The triangular inequality will have to wait, since it follows from holder's inequality.

Holder's Inequality

Let p and q be real numbers ≥ 1. If f is in Lp space, and g is in Lq space, holder's inequality (biography) states that norm₁(fg) ≤ norm_p(f) × norm_q(g).

Actually, there is an additional constraint: 1/p + 1/q = 1. However, the following proof, which came out of my head, seems to work for all p and q. So - there's probably something wrong with my proof. If you can spot the error, please let me know. If not, let's publish it somewhere.

Nothing changes if we replace f with its absolute value, and g with its absolute value, so assume f and g are nonnegative. Now the left side is simply the integral of fg.

Imagine we have already proved holder's inequality for simple functions. Let u_i be an increasing sequence of simple functions approaching f, and let v_i be an increasing sequence of simple functions approaching g. Such a sequence can always be built. The product of limits is the limit of the product, hence u_iv_i approaches fg. Apply the monotone convergence theorem, and the integral of u_iv_i approaches the integral of fg. For each i, the integral is less than or equal to norm_p(u_i) × norm_q(v_i). Another application of the monotone convergence theorem shows these norms approach the norm of f and the norm of g respectively. Put this all together and find holder's inequality for f and g. Therefore, it is sufficient to prove the relationship for two simple functions u and v.

Subdivide the domain X into regions r_j, so that each region is constant with respect to u and v. The left side, which is the integral of uv, is now the sum of μ_ju_jv_j, over these regions. Move to the right side and build similar expressions for the two integrals, this time using u^p and v^q. These sums are raised to the 1/p and 1/q respectively, then they are multiplied together. This is a bit awkward, so raise both sides to the pq. (That does not change the direction of the inequality.) Now we have a finite sum of μ_ju_jv_j, raised to the pq, and this is compared to the sum of μ_ju_j^p, to the q, times the sum of μ_jv_j^q, to the p.

As a warmup exercise, assume there is but one region. In other words, u and v are constant over all of X. Thus there is one element in each sum.

(μ(X)uv)^pq ≤ (μ(x)u^p)^q × (μ(x)v^q)^p

The "area" of X, denoted μ(X), appears to the pq on both sides, and drops out. This leaves uv to the pq, and our inequality is actually an equation. Of course, things get harder when we have multiple regions, and u and v turn into finite sums. The first thing we want to do is get rid of μ, as we did above. This can be done if each region r_j is the same size, i.e. each μ_j is the same. That's not likely, but if each μ_j is rational, with a common denominator d, then X can be subdivided into regions of size 1/d. This doesn't change the totals of the finite sums, though there are more terms in each sum, since there are more regions. Don't worry about whether there is actually a region in X of size 1/d. We have left the realm of measure theory, and are now solving a problem in algebra, with real numbers.

Of course, the coefficients μ_j may not be rational. Assume we have proved the inequality when the coefficients are rational, and create a sequence of finer rational approximations to the real coefficients μ_j. In other words, d is approaching infinity. For convenience, let d step through the powers of 2, so that our net becomes twice as fine with each step. All functions are continuous, so each finite sum approaches its true value as d approaches infinity; and the left and right sides approach their true values as well. At each step, the left side is less than or equal to the right side. The same relationship holds in the limit. Therefore, it is enough to prove holder's inequality when μ_j is rational, and as mentioned above, μ_j can be pulled out, once we rewrite the finite sums in terms of the common denominator d. This leaves the following.

(∑ u_jv_j)^pq ≤ (∑ u_j^p)^q × (∑ v_j^q)^p

If you expand (a+b)^p by the binomial theorem, the result is a^p+b^p plus some other stuff that is nonnegative, as long as a and b are nonnegative. So we only make things smaller by pulling the exponent down to each term. That's clear when p is an integer; let's prove it when p is real.

Let f be a function on the nonnegative reals, whose first and second derivatives are positive, such as x^p. (Remember that p > 1. When p = 1 we have equality.) Suppose 0 < a ≤ b, and f(a)+f(b) ≥ f(a+b). Let c = a+b. Draw a line from b,f(b) to c,f(c). The slope is at most f(a)/a. By the mean value theorem, f′(e) ≤ f(a)/a for some e in [b,c]. At the same time, f′(d) = f(a)/a for some d in [0,a]. Since f′′ is everywhere positive, f′ is strictly increasing. This means d = e = b, and f′(b) = f(a)/a.

Consider any x between b and c. If x,f(x) is below the line that joins b,f(b) and c,f(c), the mean value theorem gives us a first derivative smaller than f′(b). Apply the mean value theorem again and find a negative second derivative, which is a contradiction. If f(x) lies above the line, we have the same problem between x and c. Therefore f is linear from b to c. This gives f′′ = 0, which is a contradiction. Therefore, f(a) + f(b) < f(a+b). If we allow a or b to be 0, we have f(a) + f(b) ≤ f(a+b).

Given three values, write f(a+b) + f(c) ≤ f(a+b+c). Replace f(a+b), and get f(a) + f(b) + f(c) ≤ f(a+b+c). By induction, the same holds for any finite sum.

Review the earlier inequality, and apply this relationship to make the right side (possibly) smaller.

(∑ u_jv_j)^pq ≤ (∑ u_j^p)^q × (∑ v_j^q)^p

(∑ u_jv_j)^pq ≤ (∑ u_j^pq) × (∑ v_j^qp)

If we can prove this inequality, we are done. Let's prove it when p and q are integers. Expand the left side by the multinomial theorem and consider any term in this expansion. It looks like u₁v₁ to the e₁, times u₂v₂ to the e₂, and so on up to u_nv_n to the e_n, where the exponents sum to pq. When expanded, the first factor on the right includes u₁ to the e₁ times u₂ to the e₂ … u_n to the e_n. The second factor on the right includes v₁ to the e₁ times v₂ to the e₂ … v_n to the e_n. Multiply these together and get the designated term on the left. Therefore the right includes everything that is present on the left, plus some other stuff, hence the inequality holds.

Next let p and q be rational. If d is the least common multiple of the two denominators, raise the inequality to the d power. This preserves the direction of the inequality. Now the exponent pq becomes an integer, and the inequality holds.

Finally, let p and q be real numbers. Use rational exponents to approach p and q from below. (This is a trick we've seen before.) Since all the mathematical functions are continuous, the left side approaches its true value, and so does the right. The inequality holds at each step, hence it holds in the limit. That completes the proof.

Minkowski's Inequality

Minkowski's inequality (biography) is the triangular inequality applied to Lp space. This will complete the characterization of norm_p as a norm - whence Lp space is indeed a normed linear space. The proof is based on holder's inequality, so here we go.

First, consider p = 1, and verify norm₁(f+g) ≤ norm₁(f) + norm₁(g). Norms are simply integrals, and |f+g| is bounded by |f|+|g|, so the theorem is true by domination.

Given p > 1, choose q so that 1/q + 1/p = 1. We are interested in the norm of f+g, which brings in the integral of |f+g|^p. (Here | is the traditional absolute value.) Rewrite the integrand as |f+g|^p-1 times |f+g|. The second factor can only increase if we change it to |f|+|g|. (This is the triangular inequality for real numbers.) The new integrand dominates the old, and the integral can only increase.

Separate this into the sum of two integrals: |f+g|^p-1 times |f|, and |f+g|^p-1 times |g|.

Look at the first integral. The integrand is a product of functions, namely |f+g|^p-1 and |f|. This happens to be norm₁((f+g)^p-1×f). By holder's inequality, we can replace it with a product of norms, and find something at least as large. Take norm_q of the first factor, and norm_p of the second.

{∫ |f+g|^(p-1)×q}^1/q × {∫ |f|^p}^1/p

Remember that 1/p + 1/q = 1, hence p+q = pq. Expand (p-1)×q and get p. At the same time, 1/q = 1-1/p.

{∫ |f+g|^p}^1-1/p × norm_p(f)

We already proved f+g lives in Lp space, so we're ok. Now, the first factor is rather awkward, so call it z. Remember that there were two integrals, and we only expanded the first. Expand them both, and get an inequality like this.

∫ |f+g|^p ≤ z×norm_p(f) + z×norm_p(g)

Divide through by z, and you're done. The result is minkowski's inequality.

norm_p(f+g) ≤ norm_p(f) + norm_p(g)

Isn't that beautiful! That completes the characterization of Lp space as a normed linear space.

Complete

The norm turns Lp space into a metric space; is it complete? Let f₁ f₂ f₃ … be a cauchy sequence of functions. Remember that infinite values have been squashed down to 0, so that each f_i maps X into the reals. For any x in X, consider the sequence f_i(x). If this is cauchy, let g(x) be the limit. (There is always a limit, since the reals are complete.) If the sequence is not cauchy, let g(x) = 0.

Let e_n be the supremum of |f_i-f_j| for i and j ≥ n. The absolute value of the difference between any f_i and f_j is a measurable function. Take the max of two such differences and find another measurable function, since max is measurable. Take the max of this function with another difference, and so on through a countable set of differences. This is a monotonically increasing sequence of measurable functions, and by the monotone convergence theorem, e_n is measurable.

Note that e_n is decreasing as n approaches infinity. Since e_n is nonnegative, it approaches a limit that I will call c. Descending functions imply measurability, just like ascending functions, hence c is measurable.

Note that c measures the convergence, or cauchiness, of f at every point x. If c(x) is 0, then e_n approaches 0, and the differences beyond f_n approach 0, and f_n is cauchy at x. Conversely, if e_n approaches y, for y > 0, then f_i-f_j remains at or above y infinitely often, and the sequence is not cauchy at x.

Let D be the region of X such that c(D) > 0. These are the points where f is not cauchy. Since c is measurable, D is a measurable set. The non-cauchy region is measurable.

From here I will merely outline the proof. Perhaps you can fill in the details.

Prove D has measure 0. If it doesn't, the integrals of f will not cluster together.

Set f_n(D) = 0 for all n. Since D has measure 0, this does not change any of the integrals.

Set g(D) = 0, so that f_n approaches g everywhere.

Apply the dominated convergence theorem, and the integrals of f_n approach the integral of g. (Actually we're talking about the integrals of f_n^p and g^p, but that is just a formality.) This makes g the limit of the cauchy sequence, and Lp space is complete. Since Lp is also a linear space, it is a banach space.

Coming Soon

There is much more, including L∞ space, and the completion of measures. I may get round to these theorems, but for now, I'm kinda bored with this topic, and I need to move on to something else.