The Hahn Banach Theorem

Banach Spaces, The Hahn Banach Theorem

Functional

A functional f is a linear map (respecting scaling and addition) from a real vector space S into the reals.  If you think of S as an R module, then a functional on S belongs to the dual of S.  The following theorem extends a functional from a subspace T up to a larger space S.

The Hahn Banach Theorem

Let S be a normed vector space and let T be a linear subspace of S.  If f is a linear functional from T into the reals satisfying f(x) ≤ |x|, then f can be extended to all of S, with the same constraint f(x) ≤ |x|.  This is the Hahn (biography) Banach (biography) theorem.

By zorn's lemma, let U be the largest subspace with f(U) extending f(T).  Suppose U is not all of S, so that y is a point in S-U.  For any two points U1 and U2 in U:

f(U1) + f(U2) =
f(U1+U2) ≤
|U1+U2| =
|U1-y + U2+y| ≤
|U1-y| + |U2+y|

Put this all together and get this.

f(U1) + f(U2) ≤ |U1-y| + |U2+y|

f(U1) - |U1-y| ≤ |U2+y| - f(U2)

View the left side as a function of U, and the right side as a function of U, and consider the respective images of U in R.  The image of the left function is below, or at worst shares a boundary point with, the image of the right function.  Choose a real number e between these images.  If the two images share a boundary point, let e be this boundary point.

Set f(y) = e.  By linearity, this defines f on the span of U and y.

Let's check our constraint.  In the following, x is any point in U, and c is positive.  Verify the top half of S, x+cy, then the bottom half of S, x-cy.

f(cy+x) =
ce+f(x) =
c × (e + f(x/c)) ≤
c × (|x/c+y| - f(x/c) + f(x/c)) =
c × |x/c+y| =
|x+cy|

f(x+cy) ≤ |x+cy|

f(x-cy) =
-ce+f(x) =
c × (-e + f(x/c)) ≤
c × (|x/c-y| - f(x/c) + f(x/c)) =
c × |x/c-y| =
|x-cy|

f(x-cy) ≤ |x-cy|

We have extended f to a subspace beyond U, and this contradicts the maximality of U.  Therefore f extends to all of S, and f(x) ≤ |x| everywhere.

Extending f through a Monoid of Operators

In the above, the selection of e might be forced, but more often there is a gap between the left image and the right image, providing some wiggle room.  We can generalize this theorem by adding more requirements to f.  This may close the gap, but it is still possible to select e, and extend f to all of S.

Let S, T, and f(T) be as above.  In addition, there is an abelian monoid of bounded linear operators from S into itself.  Don't worry about the word monoid; it just means operators can be composed in the usual way.  However, it's an abelian monoid, so two operators can be composed in either order, and the result is the same.  If they are implemented by matrices, for example, the matrices must commute.  That's not typical, but there are sets of matrices that commute, such as the symmetric matrices.  Multiply any two symmetric matrices, in either order, and get the same result, which happens to be another symmetric matrix.

Assume these commuting operators are bounded by 1, i.e. they never expand distance in S.  Since the composition of two such operators is still bounded by 1, we're all right.

The operators also map T into itself, and preserve the value of f.  If a is one of our operators, write the constraints this way.

|a(x)| ≤ |x|
x ∈ T → a(x) ∈ T
x ∈ T → f(a(x)) = f(x)

There is an extension of f to all of S, with f(x) ≤ |x|, and f(a(x)) = f(x), for every x in S and all the operators a in our monoid.

Wow - that was just the statement of the theorem - now for the proof.

Consider all finite sets of operators taken from the monoid.  Given a set of n operators, find the n images of x, add them up, take the norm in S, and divide by n.  Let q(x) be the lower bound of this "average", across all finite sets of operators.  Since norm is at least 0, the average is at least 0, and the greatest lower bound is well defined.

The monoid includes the identity map.  Select this single operator, and the average norm is simply |x|.  Therefore q(x) ≤ |x|.

We will see that q(x) has most of the properties of a norm.  Since q is derived from the (true) norm, q(cx) = |c|×q(x).  Showing the triangular inequality requires more work.

Choose a finite set of n operators that keeps the average of x below q(x)+ε.  Find a second set of m operators that keeps the average of y below q(y)+ε.  Now build a set of m×n operators that is the cross product of these, an operator from the first set composed with an operator from the second.  This is a finite set from our monoid, so compute the sum, then take the norm, then divide by mn.

The operators are linear, so replace the sum with two sums, using x, and then y.  Instead of taking the norm of the entire sum, take the norm of the first sum, over x, then take the norm of the second sum, over y, then add these norms together, then divide by mn.  Thanks to the triangular inequality, this can only make things bigger.

The monoid is abelian, so we can apply the operators in either order.  When the sum is applied to x, apply the n operators from the first set and add up the images of x.  Call this intermediate result w.  By assumption, |w|/n < q(x)+ε.  Now each operator from the second set is applied to w, and the images are added.  Once again, things only get bigger if we run w through each operator in turn, take norms, and add up the results.  Since the operators are bounded by 1, the norm is not increased by any operator in the second set.  We can just skip that step.  Thus the norm of the sum over x is below mn×(q(x)+ε).

Similar reasoning shows the norm of the sum over y is below mn×(q(y)+ε).

Add these together and divide by mn, and the result is below q(x)+q(y)+2ε.  Let ε approach 0, and q(x+y) ≤ q(x)+q(y).

Watch what happens when x lies in T.  Consider a finite set of n operators.  Let w be the sum of the images of x under these operators.  Since the operators do not change the value of f, f(x) = f(w)/n.  We also know that f(w) ≤ |w|.  Therefore f(x) ≤ |w|/n.  This holds for all finite sets of operators from the monoid, hence f(x) ≤ q(x).

Go back up to the top of this page, and apply the Hahn Banach theorem, using q(x) in place of the norm of S.  You'll see that q has all the properties we need - the triangular inequality, scaling by positive constants, and f(x) ≤ q(x) on T.  Therefore f extends to a linear function on S with f(x) ≤ q(x).

Since q(x) ≤ |x|, we have f(x) ≤ |x|.  We only need show f is preserved by the operators in our monoid.

Let a() be any operator, and x any element of S.  Select a finite set of n operators 1, a, a2, a3, … an-1.  Apply these operators to x, add up the images, take the norm, and divide by n.  This gives the "average", and it bounds q(x).  This holds for any x, so apply the inequality to a(x)-x.  The left side becomes q(a(x)-x).  The right side telescopes down to two terms, and looks like this.

q(a(x)-x) ≤ |an(x)-x| / n

The right side only gets bigger if we replace the norm with |an(x)|+|x|.

Since a is bounded by 1, |an(x)| is no larger than |x|.  This gives 2×|x|/n, which approaches 0 for large n.  Therefore q(a(x)-x) ≤ 0.

Since f is bounded by q, f(a(x)-x) ≤ 0.  The same is true of -x, and since all functions are linear, we can pull -1 out, giving -f(a(x)-x) ≤ 0.  This means f(a(x)-x) ≥ 0.  combine these results and f(a(x)-x) = 0.  This implies f(a(x)) = f(x), and that completes the proof.