A software engineer will write a subroutine once, then use it over and over again in many computer programs.  In the same way, mathematicians like to generalize, so that a theorem can be proved once and used again and again thereafter.  Let's be honest - most of this book is a compendium of useful definitions and theorems, somewhat like the standard C library.  But it is possible to overgeneralize, is it not?  The integers, for instance, form a ring, where a ring is the highest level of abstraction.  A ring has addition and multiplication, with associative and distributive properties, and that's it.  A ring, by itself, doesn't have enough structure to prove unique factorization.  We really want a layer of abstraction somewhere between the integers and the ring.  A euclidean domain, for instance, encompasses more than just the integers, yet every euclidean domain supports unique factorization.  This is the sweet-spot for many theorems.

n dimensional space has real valued coordinates, like x y and z, and a notion of distance, and a topology of open sets based on open balls.  At the other extreme, a topological space is any set with certain subsets designated as open, provided these open sets follow a few rules, as described in the previous chapter.  This is a bit like the ring - as general as it gets.  Fortunately, there is an intermediate level of abstraction that supports more theorems, a category of spaces that is a bit closer to real space.

A metric space has a notion of distance, like real space, but it could have nothing to do with real variables.  The p-adic numbers form a metric space, even though they are based on the integers mod p.  There is a distance metric, and a topology based on open balls.  The metric space is an intermediate layer of topological abstraction, and it is the sweet-spot for more than a few theorems. A metric is a distance function on a set of points, mapping pairs of points into the nonnegative reals.  You can use functional notation, d(x,y), or you can write |x,y| for the distance between x and y, with |x| giving the distance from x to the origin.  The metric must obey three properties:

A pseudo metric may have |x,y| = 0, even when x ≠ y.  For a given point x, all the points that are 0 distance from x are 0 distance from each other by the triangular inequality.  Thus each point is a representative of a set of points, all zero distance apart, and all occupying the same location in our topological space.  Clump all these points together into a single point.  The distance between two separate clumps is the distance between a point in the first and a point in the second.  Use the triangular inequality to show this is well defined; it doesn't matter which point is selected on either side.  The quotient set, the set of zero-distance aggregates, now has a valid metric.  Map the first space onto the second by carrying each point to its zero-distance cluster.  Open balls in both worlds correspond, hence the function is bicontinuous and onto, giving a quotient space.  I will assume this "clumping" has already been accomplished, hence d() is a metric, not a pseudo metric, and the points of our metric space are topologically distinct. If a set of points has a valid metric, as described in the previous section, then the set has an induced topology.  The set, with its metric topology, is called a metric space.  The topology is also called the open ball topology, and is defined as follows.

For each point p and each positive ε, let the set of points satisfying |x,p| < ε be a base open set.  This is the open ball centered at p with radius ε.  We need to show this is a valid base.  Let p be a point inside two balls.  Thus p is within distance q of the center of the first ball, and within distance r of the center of the second ball, where q and r are the two radii.  Let d be the difference between q and the distance from p to the center of the first ball.  Think of d as the distance from p to the edge of the first ball.  Find a second distance separating p from the edge of the second ball.  Let ε be smaller than either of these distances.  Now all the points within ε of p are contained within the two larger balls, thanks to the triangular inequality.  This establishes a valid base, and a valid topology.

Don't forget to define the empty set and the entire space as open; they aren't characterized by open balls.

You can restrict radii to rational numbers; the topology is unchanged.  Consider an open ball with radius r, where r is an irrational number.  Every point p in the ball is a certain distance away from the edge of the ball, and can be enclosed in a smaller ball with a rational radius.  Thus our ball of radius r is covered by open sets, and remains open, reproducing the original topology.

A metric space, like any space, is separable if it possesses a countable dense set.  This is equivalent to having a countable base.  Let the metric space T have a countable dense set D.  Let the base be the set of open balls with centers at D and rational radii, a countable set.  We need to show that such a base covers every open set.  Take any point x in an arbitrary open set W, build a ball of radius e about x, such that the ball is inside W, find a point y from D within e/3 of x, choose a rational radius between e/3 and e/2, and construct a ball, centered at y, that contains x, and is wholly contained in W.  Thus W is covered by base sets, and the metric space has a countable base.

Conversely, given a countable base, choose any point from each base set to build the dense set D.  Every open set contains a base set, and consequently, a point from D.

The discrete topology can be given the metric 1 or 0, for points different or the same respectively.  Every point is an open set, with ε set to .  Whenever |x,y| is bounded above some ε, where ε may depend on x, then every point is open and the topology is discrete.

If a metric exists for a given space and a given topology, and that metric defines the same topology, then the space is metrizable.  We may as well think of it as a metric space. If f() is a distance metric, and g() is a monotonically increasing real-valued function, with g(0) = 0, and g′′ ≤ 0, then g(f()) also determines a metric.  Log(x+1) is an example.  The only tricky part is the triangular inequality.

Let a b and c be the lengths of the sides of a triangle, thus c ≤ a+b.  Since g is increasing, g(c) ≤ g(a+b).  We only need show that g(a) + g(b) is even larger than g(a+b).

Hold a fixed and let x be a continuous variable that starts at 0 and increases through b and beyond.  Let h(x) = g(a) + g(x) - g(a+x).  Note that h(0) = 0.  The derivative of h is g′(x) - g′(a+x).  If this is negative then g′(a+x) is larger than g′(x).  By the mean value theorem, g′′ is positive somewhere between x and a+x.  Yet g′′ is never positive, hence h′ is never negative.  Again, the mean value theorem tells us h can never get smaller, else h′ would be negative.  Thus h(b) is at least as large as h(0), which is 0.  Thus g(a)+g(b) ≥ g(a+b), and the triangular inequality holds.

If g is strictly increasing, with g′ = 0 only at isolated points (if ever), Then g is invertible.  The set of points that are less than ε distance from a given point p, under the original metric, can also be described as the set of points that are less than g(ε) distance from p, in the new metric.  The open balls are the same, and the topology is the same.

If you want to make an unbounded space bounded, set g(x) = x/(1+x), or any other bounded, increasing function. Within the xy plane, the euclidean distance function acts as a metric, and defines a metric space.  The base for the topology consists of open disks of all radii and centers.  However, many other bases are possible.  For instance, one can use squares instead of circles.  Build a base using the interior of squares of all sizes, and at all locations throughout the plane.  Given an open disk, and a point p inside that disk, p is always a certain distance from the edge, and can be enclosed in one of our squares, which is wholly contained in the disk.  Thus the disk is an open set, and similarly for all other disks.  That reproduces our original topology.

Generalizing to higher dimensions, a topology based on blocks, i.e. hypercubes, is the same as the topology based on open balls.  A ball can always be covered with tiny blocks, and an open block can be covered with open balls.

In fact we can use any shape at all, as long as it is open and bounded.  Draw the shape of a turtle in the plane, and build a base consisting of the interior of turtles, all sizes, all locations.  Sure enough, each open disk can be covered by open turtles, and each turtle can be covered by open disks, so the topologies are the same.  Besides, you can tell your friends, "It's turtles all the way down." Given two metric spaces, let the set S be the direct product of the two spaces, i.e. points in the first cross points in the second.  Given two points in the product space, compute their distance as follows.  Take the distance between the components in the first metric space and square it, then add the square of the distance between the components in the second metric space, then take the square root.  The obvious example is the xy plane.  The x and y axes are metric spaces on their own, and their product gives the xy plane, as a set, with a metric that is the same as euclidean distance.  The distance = the square root of the change in x squared + the change in y squared.  It's just the Pythagorean theorem.

Take the product of a dozen axes, and find 12 dimensional space with the usual distance metric.  As a variation on this theme, take the product of various dimensional spaces, some lines, some planes, etc, and the result is n space with the usual distance metric, where n is the sum of all the dimensions.  The distance is the square root of the sum of (xi-yi)2, no matter how you slice it.

There are still a couple of issues to deal with.  Is the new distance function a valid metric?  Realize that the distance function is the same, whether we take a finite product of metric spaces all at once, or whether we build the product sequentially, bringing in one space at a time.  Either way you get the square root of the sum of the squares of the individual distances.  Therefore we only need prove our results for the product of two spaces, and it will generalize to all finite products.

The only tricky part is the triangular inequality.  Assume a+b ≥ c, three distances in the first space that obey the triangular inequality.  Similarly, x+y ≥ z in the second space.  We would like to prove the following.

sqrt(a2+x2) + sqrt(b2+y2) ≥ sqrt(c2+z2)

The right side can only increase if we replace c with a+b, and z with x+y.  Let's prove this more ambitious inequality.  Square both sides; this preserves the direction of inequality.  The terms a2, b2, x2, and y2 drop out, leaving the following.

sqrt(a2b2 + a2y2 + x2b2 + x2y2) ≥ ab + xy

Square again to get this.

a2y2 + b2x2 ≥ 2abxy

Move everything to the left, and a perfect square is greater than or equal to 0, which is always true.  The distance function is a metric, and the product is a metric space.

But there's another problem; we've already defined the topology for a product space.  Open sets in the product are the product of open sets in the individual spaces.  Does this agree with the topology induced by our metric?  It does.  I'll illustrate with the xy plane.  This is the product of the x and y axes, and the standard product topology has a base of open rectangles, all sizes, all locations.  This is what you get when you cross x intervals with y intervals, the base sets for the x and y axes.  Take the product of the x and y axes as metric spaces, and the base becomes open disks, all sizes, all locations.  In the last section we proved these two topologies are the same.  Disks can be covered with rectangles and vice versa.  This is harder to visualize when the product is not as simple as the xy plane, but the idea is the same.

Start with a ball of radius r about the point p,q, and let v be a point in this ball.  There is a distance e from v to the edge of the ball, so put v at the center of a ball with radius e.  Allow each space, each component, to stray no farther then e/2 from v1 and v2 respectively.  Points in this box, this cross product, are less then e distance from v.  In fact they are bound by e/sqrt(2).  The ball centered at p,q has been covered with open boxes, and is open in the product topology.

Conversely, consider the box whose first component is within r of p, and second component is within r of q.  Put v in the box, whence v is a certain distance e from either edge.  Place v in a ball of radius e, and the ball is wholly contained in the box.  The box is covered by open balls, and the topologies are the same. In the last section we proved the triangular inequality for n dimensional space, but that was rather indirect.  Finite products of arbitrary metric spaces produce a valid metric, and when this general result is applied to coordinate axes you get n dimensional space with its euclidean metric.  Hence the triangular inequality is valid in Rn.  But that proof requires a background in point-set topology, and it doesn't address the Cauchy Schwarz inequality.  In this section I will prove the triangular inequality and the Cauchy Schwarz inequality using pure algebra.

Remember that the triangular inequality says that one side of a triangle is never longer than the sum of the other two sides.  Well this is obvious really.  Try drawing a triangle whose sides are 2 3 and 17.  But sometimes we need to prove the obvious, because sometimes the obvious isn't true in higher dimensions or in spaces that are substantially different from the world around us.  An electron obviously can't pass through two slits at once, but it does.  So let's prove the triangular inequality in n dimensions.  You'll sleep better at night.

This is a simple, follow-your-nose proof, where you assume the triangular inequality, do some algebra, and run into something that has to be true.  Since the steps are reversible, the original inequality is true.

This takes place in the xy plane, where the algebra is simpler, but the results hold in n dimensions nonetheless.  Start with any triangle in n space, and move it down into the xy plane through a rotation, which does not change lengths or angles.  Prove the theorem in R2, and you have proved it for Rn as well.

Given any triangle in the xy plane, move the triangle, or relabel the coordinates, so that one of the three corners is at the origin, another is at a,0 (on the x axis), and the third is at b,c.  Now the sides have lengths:

a
sqrt(b2+c2)
sqrt((a-b)2+c2)

We want to show that the first distance + the second is at least as large (and probably larger) than the third.

a + sqrt(b2+c2) ≥ sqrt((a-b)2+c2)

As a number gets larger, its square gets larger.  So we can square both sides without disturbing the inequality.

a2 + b2 + c2 + 2asqrt(b2 + c2) ≥ a2 + b2 + c2 - 2ab

This simplifies to:

sqrt(b2+c2) ≥ -b

The left side is positive.  If b is positive we're done.  So let b be negative.  Now a and b run in opposite directions.  The right side is the absolute value of b.  The left side would be the same, if c was 0.  If c is nonzero then b2 + c2 is larger, and its square root is larger than |b|.  The inequality holds.

It is a precise equality when b is negative and c is 0.  This makes sense; just draw the triangle.  It's flat, and the sum of the two lengths equals the third. The Cauchy Schwarz inequality is equivalent to the triangular inequality, at least in n space.  But Cauchy Schwarz generalizes to other spaces, such as continuous functions on closed intervals.  That's why some textbooks present Cauchy Schwarz first, then bring the triangular inequality in, almost as a corollary.  Let's give it a whirl.

Let a1 a2 … an and b1 b2 … bn be two vectors in n space.  Let p be the sum of ai2, let q be the sum of aibi, and let r be the sum of bi2.  Using the notation of dot products, p = a.a, q = a.b, and r = b.b.  We will show that q2 ≤ pr.  This is the Cauchy Schwarz inequality.

Let x be any real number.  Consider the sum of (aix + bi)2.  Since this is a sum of squares, it is 0 only if each term is 0.  In other words, this sum is 0 only if the vector b is a multiple of the vector a, where x is the scaling factor.  For all other values of x, and for all other vector combinations, the sum is positive.

Write our nonnegative sum as px2 + 2qx + r, where p q and r are defined as above.  Note that p and r are sums of squares, and cannot be negative.  If either p or r is 0, then a or b is the zero vector, q is 0, and we have q2 = pr.

With p nonzero, the quadratic expression attains its minimum when x = -q/p.  Substitute x = -q/p and obtain -q2 + pr ≥ 0.  This gives q2 ≤ pr, which proves the Cauchy Schwarz inequality.

As mentioned before, the inequality is strict, unless one vector is a scalar multiple of the other.

So what does this have to do with the triangular inequality?  Let a and b be vectors in n space.  We would like to show that the length of a+b is less than or equal to the length of a plus the length of b.  Square |a+b| ≤ |a| + |b|, and cancel the squared terms from each side.  The left side is twice the sum of (aibi).  This was 2q in our earlier notation.  Continuing that notation, the right side becomes 2sqrt(pr).  Square both sides again to get q2 ≤ pr.  Revers these steps to show Cauchy Schwarz implies the triangular inequality in n space.  Furthermore, the triangular inequality can only be an equality when Cauchy Schwarz is an equality, when a and b are colinear.  As you reverse these steps, from Cauchy Schwarz back to lengths, there is a square root that could cause the left side to go negative.  That turns the equality into an inequality once again, and it happens when a and b are parallel.  That leaves only the case of a and b pointing in opposite directions.  This reaffirms the geometric intuition: the third side of a triangle is as long as the first two sides exactly when those sides line up in opposite directions.

Let's extend this to domains other than n space.  Verify that the set of absolutely convergent series is a real vector space.  You can add them together, or scale them by real constants.  For any two series a and b, build the product series with terms aibi.  Once a and b drop below 1, they both dominate the product series, hence it too converges absolutely.  The same is true when the terms of a series are squared (special case of ab).

To find the distance between two series, subtract them, giving another absolutely convergent series, square the terms, take the sum, and take the square root.  The distance is 0 only if the two series are identical.

Extend the definitions of p q and r above.  We are still adding ai2, aibi, and bi2, but this time the sums are infinite.  Now compare q2 and pr.

Start with the sum of (aix+bi)2 as before, but this time the sum is infinite.  Rewrite the sum as px2+qx+r ≥ 0.  This rearrangement is possible thanks to absolute convergence.  From here the proof of Cauchy Schwarz, and the triangular inequality, is the same.  Therefore the absolutely convergent series form a metric space.

Next consider continuous functions on a closed interval.  The product of two functions is continuous, and integrable.  If p, q, and r are the integrals of a2, ab, and b2, we can prove Cauchy Schwarz.  Divide the interval into n subintervals, and Cauchy Schwarz holds for each Riemann sum.  The inequality must hold in the limit.  Use continuity to show equality iff the functions a and b are scale multiples of each other.  This gives another metric space, where distance is given by the square root of the integral of (a-b)2.

This theorem generalizes to piecewise continuous functions on a closed interval.  Leave the functions undefined at the borders, where the continuous pieces meet.  Thus two functions are equal across the entire interval iff their distance is 0.

Sometimes you can measure the distance between functions over an open or infinite interval, using indefinite integrals, but keep in mind, the product of two functions may not be integrable.  This is illustrated by 1/x2 and x, which are both integrable on (0,1], yet their product is not. Generalized euclidean space was introduced before, but let's look at it again, as a metric space.  Take the direct sum of arbitrarily many copies of the real line, using zero as the preferred element.  As you recall, every point in this composite space has finitely many nonzero coordinates, all the other coordinates are 0.  Call this space Ej, for Generalized euclidean Space.  By default, Ej joins a countable collection of real lines together, but higher cardinalities are possible.  Of course, if you join a finite number of real lines together you simply get Rn, i.e. n dimensional space.

Construct a metric on Ej as follows.  Take any two points and note that their coordinates agree almost everywhere.  In fact their coordinates are 0 almost everywhere.  Subtract the coordinates that disagree, square the differences, add them up, and take the square root.  Since two points can only disagree on a finite number of coordinates, this is well defined.

Take any three points and derive their distances, using the above formula.  Note that all three points live in Rn for some integer n.  Furthermore, our distance metric is identical to the metric on n space, Therefore the triangular inequality holds.  The metric is valid across all of Ej.  This is a metric space.  Use open balls to build a base for the topology as usual.

Wait a minute!  The direct sum of spaces already has its own topology.  Does this metric give the same topology?

Let Q be a base open set in Ej, according to the direct sum topology.  Thus Q restricts a finite set of coordinates to various open intervals, and leaves the other coordinates unconstrained.  Let p be a point in this open set, and let p act as the center of an open ball, using our metric.  If Q constrains the ith coordinate, and pi is a distance ri from the nearer end of the open interval, let the radius of our ball be anything less than ri, across all of the constrained coordinates.  Since there are finitely many intervals, the radius is well defined.  The resulting ball lies entirely inside Q, and Q is covered by open balls.  The metric produces a topology that is as strong as the direct sum topology.

Next suppose some base set Q fits into the ball of radius 1.  Let z be any point in Q and change one of its unconstrained coordinates from 0 to 2.  This is still a point in Q, still in the direct sum, yet it is not contained in our open ball.  Thus Ej has open sets that are not present in the direct sum.  Its topology is stronger.

The countable product of metric spaces, using the weak topology, is metrizable.  The metric is, obviously, different from the metric of Ej, which is only defined for the direct sum, and even in that context it produces open sets outside of the topological product.

Let M1 M2 M3 etc be metric spaces, and let P be their product, using the weak product topology.  Compress the metric of each component space Mi, so that |x,y| is bounded by 1.  The diameter, or the greatest distance between any two points in Mi, is bounded by 1.  Since the topologies have not changed, the topology of P has not changed either.

Scale the metrics again, so that the diameter of Mi is i.  Each component space is half the size of the space that came before, using the term "half" loosely.

Given two points in P, project them into each Mi, find the distance within each Mi, and add up the results.  The maximum distance between two points is + + … = 1.  Distance is well defined.

The triangular inequality holds in each component space Mi, hence it holds across each partial sum, and finally, it holds for the infinite sum.  The metric is valid.

If the topologies are the same, base open sets under our metric must cover each base open set in the product, and conversely.  Begin with a base open set in P, as defined by the product topology.  Some of the components are restricted to open balls; most are left unconstrained.  Let z be a point in this set.  Let d be the shortest distance that zi might stray, before running into the edge of the open ball in Mi, for those components that are constrained.  Place a ball of radius d about z.  No coordinate of z can stray by more than d, and the ball fits inside the box.  The metric topology covers the product topology.

Now for the converse.  Let z be a point inside an open ball, using our new metric.  Let z be a distance d from the edge of the open ball.  In fact, make z the center of a new ball with radius d.  Find l such that l is less than half of d.  If the components beyond l are allowed to run freely, the result will stray from z by a distance that does not exceed half of d.  For the components 1 through l, restrict Mi to a ball about zi with radius d/l.  The sum of these deviations is at most d.  Put this all together and the points in a base open set, according to the product topology, are never farther than d away from z, and are wholly contained in the open ball at z.  Each topology implies the other, and the product P is metrizable. A set S is bounded if there exists m such that |x,y| ≤ m for all x and y in S.  In other words, S does not stretch out to infinity.  If S contains 0, or some special point denoted as the origin, then |x| ≤ m.  Conversely, if |x| ≤ m, then |x,y| ≤ 2m by the triangular inequality.  Thus it is equivalent to assert |x| is bounded.

The diameter of S is the least upper bound of |x,y| for x and y in S.  Note that the diameter of a circle or sphere agrees with its traditional definition.  The diameter of an n dimensional unit cube is sqrt(n).  The diameter of an equilateral triangle, one unit on a side, is 1, even though this triangle will not fit into a circle of diameter 1.  A set with diameter d always fits in a ball of radius d.

A set is totally bounded if it has a finite cover of ε-balls for every ε > 0.  A totally bounded set is always bounded.  If the set is covered by k balls of radius 1, find the ball farthest from the origin, and the far edge of that ball is the bound for the set.

Let's cover a region in Rn with balls of radius ε.  Let t be anything less than half of ε/sqrt(n) and build a lattice, a regular grid, with spacing t.  Each point lies in the center of a cube of width t, and each point is "responsible" for this cube.  The ball of radius ε covers this cube, and then some.  In fact the point can move about within this cube, to coincide with a point of S, and the ball still covers the cube.  If S does not intersect this cube, then you don't need this point, or its ball.  All these balls cover S.  A larger ball of radius r can be covered with finitely many ε balls for any ε, hence every ball, and every bounded set, is totally bounded.

Switch to Ej, and it's a different story.  Try to cover the ball of radius 3, centered at the origin, with balls of radius 1.  Build a sequence of points by setting each coordinate in turn to 2, while all the others are 0.  All these points lie within the ball of radius 3, and each point is 2.818 distance from the others.  Each must be covered by a separate ball of radius 1, requiring infinitely many balls.  Within this metric space, a bounded set need not be totally bounded. The point p in a metric space is a limit point of the set s if members of s exist arbitrarily close to p.  This is consistent with the topological definition of limit point.  Every open set that contains p intersects s iff there are points of s arbitrarily close to p.  Remember, the base open sets in a metric space are balls of ever shrinking radii, thus pulling s towards p.

Next let s be a sequence in a metric space.  Recall the definitions of cluster point and limit point when dealing with a sequence.  These definitions have equivalent formulations in a metric space.  A sequence converges to p if every ε has an index n such that the entire sequence, beyond index n, is within ε of p.  This is the same as saying every open set containing p contains most of s.  Similarly, p is a cluster point if every ε has infinitely many points of s within ε of p.  This is almost the same as saying p is a limit point of the set s.  If s does not contain p, they are in fact the same.  Keep shrinking ε to find new points, closer and closer to p.  This builds an infinite sequence headed towards p.  Of course if s contains p then p is a limit point of s, but need not be a cluster point.

We can now define continuity in terms of limits.  This is probably the definition you've seen before; the definition you are use to.  As you might surmise, it coincides with the topological definition.  Let f be a function from one metric space into another, and let f(p) = q.  Recall that continuity means every open set about q implies an open set about p whose image is wholly contained in the open set about q.  Open sets are now open balls with specific radii, so here is the same definition.  For every ε there is δ, such that |p,x| < δ implies |q,f(x)| < ε.

Here is yet another equivalent definition of continuity.  The function f is continuous at p if every sequence s in the domain with limit p maps to a sequence t in the range with limit q.  Let's see why this is the same as the δ ε definition.

Assume the distance restriction is in force, and let s be a sequence that approaches p.  Let t be the image sequence and suppose it does not approach q.  Let ε be the radius of a ball about q that does not contain most of t.  That is, the sequence t lies outside the ε ball infinitely often.  However, an open ball of radius δ about p contains most of s, since s approaches p, and that means most of t actually lies in the ε ball about q.

Conversely, let every sequence with limit p map to a sequence with limit q, and suppose an ε ball about q does not have a δ ball about p in its preimage.  Step δ through the reciprocals of the integers.  At each step there is at least one point within δ of p whose image is more than ε away from q.  These points form a sequence that converges to p, yet the image sequence does not converge to q.  This is a contradiction, hence there is a δ-ball in the preimage of every ε-ball.  The open set definition, the δ ε definition, and the convergent sequence definition are all equivalent.

As before, f is continuous throughout an open region if it is continuous at every point in the region.  This is the definition of regional continuity, but it also means the preimage of every open set is open. Adopt the δ ε definition of continuity, as described in the previous section.  A function f is uniformly continuous, or uniform, throughout a region R if one δ fits all.  In other words, δ is a function of ε, but does not depend on the point p in the domain.

Every uniform function is continuous, but the converse is not true.  Let f(x) = 1/x on the open interval (0,1).  Choose δ small, as small as you like, and consider the image of the points in the interval (0,δ).  Points in this set are arbitrarily far apart as they climb the y axis, hence they are not within ε of anything.  This function, and any other function that approaches infinity, is not uniform.

If f and g are continuous, consider f(g()).  Pull an ε neighborhood in f(g()) back to a δ neighborhood in g(), back to a γ neighborhood in the domain.  Thus the composition of continuous functions is continuous.  But we already knew that; we proved it using open sets.  The new result is that the composition of uniform functions is uniform.  Regardless of where we are in the domain, ε determines δ determines γ, hence the composition is uniform. Several metric spaces are quite common, and notation has evolved to meet the need.  We've been using some of this notation all along, but here it is for the record.

Rn is n dimensional space, the product of n copies of the real line.  Note that Rn is more than just a set of points, each identified by n coordinates; Rn is a metric space with an implied topology.  The metric is the euclidean distance metric, although other metrics, such as the manhattan metric, will produce the same topology.

Ej is generalized euclidean space, as described earlier in this chapter.

Cn is n dimensional complex space, where each coordinate takes on a complex number.  If you aren't multiplying points together via complex multiplication, Cn is indistinguishable from R2n.  The points are the same; the metric is the same; the topology is the same.

The set of vectors in Rn+1 with norm 1 is denoted Sn, the n-sphere.  The circle in the plane is denoted S1, and the sphere in 3 space is denoted S2.  The vectors in Rn that have norm ≤ 1 form Dn, the n dimensional disk.  Thus the boundary of D2 is S1.

The direct product of S1 n times gives the n-torus, or TnT2 is the traditional torus, like an inner tube.  One circle runs around the circumference of the tube and the other defines the cross section of the tube.  The product of these two circles defines the entire torus, with the product topology.

Take the nonzero points of Rn+1 and identify a point with all its scalar multiples to get real projective space, denoted RPn, or simply Pn.  This is equivalent to identifying antipodal points on Sn.  Similarly, identify scalar multiples in Cn+1 to get complex projective space, or CPn. Every normal, second countable space is metrizable.  This is Urysohn's metrization criteria.

This condition is sufficient, but not necessary.  Every metric space is normal, but may not be second countable, as shown by the discrete topology on an uncountable set.  Each isolated point is open, and is the union of base sets, hence there is a base open set for each point, which contradicts second countable.

Let S be a space that is normal and second countable.  Enumerate the base sets and consider them one at a time.  Let O be a base open set and let C be its complement.  If O is the empty set, or all of S, skip it; we don't need it.

Let U be a base open set in O, such that U closure is also in O.  Let D be the closure of U.  Since C and D are disjoint closed sets, and S is normal, use urysohn's lemma to build a continuous function fO,U(S) that is 0 on C and 1 on D.

Build such a function fO,U for every nested pair of base sets O and U in S.  This is a countable collection of functions, and they can be enumerated as fj, as j runs from 1 to infinity.  Each function is a continuous map from S into [0,1].  The composite function f maps S into the product space P, a hypercube with countable dimensions.  Give P the weak product topology.  Since each component function fj is continuous, the composite function f is continuous.  We have built a continuous map from S into P.

Consider to points x and y in S.  Separate x and y in disjoint open sets, and these may as well be base open sets.  Let x be contained in O, whence y is in C, the complement of O.  Since x and C are closed, separate them in disjoint open sets.  Let U be a base open set containing x.  Let D be the closure of U.  D misses the open set containing C, and C itself.  Once again C and D are disjoint closed sets.  There is some fj, derived from the open sets O and U, whereupon fj(y) = 0 and fj(x) = 1.  For any two points x and y, some function fj in the list maps them to different values in [0,1].  Therefore the composite function f is injective.

We know f is continuous; let's show it is bicontinuous.  It is enough to show that the image of O, for any base set O, is the union of open sets in P.

Let C be the complement of O.  Select any x in O and build U and D as we did before.  Let fj map C to 0 and D to 1.  Consider the open set in P produced by restricting the jth coordinate to values greater than one half.  The other coordinates are unconstrained.  Intersect this open set with f(S) to get an open set W in the image of S.  Let V be the preimage of W.

A point z lies in V iff fj(z) exceeds one half.  Clearly z cannot lie in C, hence V is wholly contained in O.  The image of O under f covers all of W.  Since fj(x) = 1, x is in V.  We have accounted for the image of x.  Do this for all x in O, and O is the union of open sets Vx, each having an open image Wx in f(S).  Put this all together and the image of O is an open set.  Therefore f is bicontinuous, and a homeomorphism.

For all practical purposes, S is a subspace of the space P.  Since P is the countable product of metric spaces, P is metrizable, as demonstrated earlier.  This metric gives S its topology.  Therefore S is metrizable. A sequence s in a metric space is Cauchy if every ε implies an n such that all points beyond sn are within ε of sn.  The tail of the sequence can always be contained in a small open set about sn.

There is another definition of Cauchy: any two points at or beyond sn are within ε of each other.  In particular, sn and sj are within ε of each other, hence this definition implies the previous.  Conversely, if p and q are within ε/2 of sn, they are within ε of each other by the triangular inequality.  The two definitions are equivalent; use which ever one you like.

Real functions, defined for x ≥ 0, can also be Cauchy.  Every ε implies y such that |f(x),f(y)| < ε for x ≥ y.  Most of the time we deal with sequences.

Set ε to 1 to show that a Cauchy sequence, or the tail of a Cauchy function, is bounded.

If there is a smallest accessible positive distance ε in our metric space, restrict a Cauchy sequence to ε/2 to show that it becomes constant.  Clearly such a sequence converges to sn.

In fact, every convergent sequence is Cauchy.  If p is the limit point, find n such that everything beyond sn is within ε/2 of p.  By the triangular inequality, everything beyond sn is within ε of sn.

A metric space is complete if every Cauchy sequence converges to some point in the space.  In such a space, a sequence is Cauchy iff it is convergent.

Before we apply this to metric spaces in general, let's show that the reals are complete.  After all, metric spaces are based on distance, which is measured using real numbers, so we'd better get a handle on the reals first. What is a real number anyways?  Thus far I have only defined the nonnegative integers with any rigor.  These are the finite ordinals, built from the ground up as sets.  Where do we go from here?  A negative number is a positive number with a minus sign prepended.  How is that done?  There are many ways, equally valid; here is one.  An ordered pair x,y can be represented as ((x), (x,y)).  This is the set containing the set containing x and the set containing x and y.  If x = y, like the order pair 3,3, then the entire set collapses to ((x)).  That's ok; we can read, unambiguously, the ordered pair from the set, whether it be x,y or x,x.  Pair a positive number with either 0 or 1 to indicate a positive or a negative number.  Math functions can now operate on signed integers as sets.  Then pair a signed integer with a nonzero unsigned integer to get a fraction.  Rational numbers are born.

It's not quite that simple however, because a rational number is actually an equivalence class of fractions, that are represented by a fraction in lowest terms.  Thus a rational number is a set of fractions, such that for any two fractions a/b and c/d in the set, ad = bc.  That's a lot of set manipulation just for a rational number.  Let's press on.

A set of ordered pairs is a relation.  Again, we are still doing everything with sets.  A relation that does not duplicate anything in the first component is a function.  A function from the positive integers into the rationals is a sequence of rational numbers.  Finally, a sequence of rational numbers that is cauchy, using rational values of ε of course, is a real number.  Cauchy must be expressed as a formula in logic on sets: {for every positive rational ε {there is an index n such that { for every j > n { abs(sj - sn) < ε}}}.  Even this is a simplification of how the formula would be written in ZF set theory.  And after all that, cauchy sequences clump together into equivalence classes, as we did with fractions, because, as illustrated below, many different cauchy sequences converge to the same real number.

Pause for a moment and think about all the machinery that goes into building a real number as a set.  It's like looking at the molecular chemistry of the cement in the foundation of your house.  Most of the time we work at a much higher level, hammering nails and putting up dry wall.  This book does not spend much time on the foundation of mathematics, but it is there nonetheless.  I hope this brief description gives you a small taste of the underlying complexity.

A real number is a cauchy sequence of rationals, which is often indicated by an infinite decimal number.  This is one way to write a cauchy sequence.  π is approached by the following sequence.

3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, …

This sequence is rather special, since it is monotonically increasing, approaching π from below.  A cauchy sequence need not be strictly increasing or decreasing.  It could bounce around as it approaches its limit.  A cauchy sequence for π/4 is given by the Taylor expansion of atan(1).  This is an infinite series, where the cauchy sequence consists of the partial sums.  Here is the series, follow by the cauchy sequence of partial sums.

1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + …

1, 2/3, 13/15, 76/105, 789/945, …

The rationals approach the square root of 2 via the cauchy sequence 1.4142135…, yet the square root of 2 is not rational.  Thus the metric space of rational numbers, with the open interval topology, is not complete.  There are cauchy sequences of rational numbers that do not approach a rational number.  However, the real numbers are complete.  A cauchy sequence of real numbers always approaches a real number.  The proof is rather technical, so hang on.  It begins with the real numbers as a set, a group under addition, a Q vector space, a ring, a field, a continuous field, and finally a metric space.

Embed the rationals in the reals by mapping x to the constant sequence x,x,x,x… etc.  Such a sequence is cauchy, and approaches x.

Let s and t be two real numbers, two Cauchy sequences of rationals.  Let u be the difference sequence: uj = sj-tj.  Given ε, find n so that sj is within ε/2 of sn and tj is within ε/2 of tn for all j > n.  Now uj is within ε of un, hence u is a Cauchy sequence of rationals.  We can add or subtract real numbers and the result is cauchy.  Furthermore, adding or subtracting rationals, embedded in the reals, gives the expected result.  The constant sequence 2.5 + the constant sequence 4.3 gives the constant sequence 6.8, hence 2.5+4.3 = 6.8.

Let two sequences be equivalent if their difference is a sequence that approaches the rational number 0.  This seems like circular reasoning, but it's not.  We can talk about convergence using only rational numbers.  For each rational ε there is n such that everything beyond un is between -ε and ε, hence u approaches 0.  A formula, strictly within the rationals, determines whether two real numbers are equivalent.

If s-t approaches 0 then negate all the terms to show that t-s approaches 0.  The relation is symmetric.  Also, restrict r-s and s-t to ε/2 and add together to restrict r-t to ε.  If r and s are equivalent and s and t are equivalent then r and t are equivalent.  This establishes an equivalence relation, so clump all the equivalent sequences together to make one real number.  You've seen this before.  3.699999… is the same as 3.700000…, because their difference gives a sequence .01 .001 .0001 .00001 etc that approaches 0.  They are actually the same real number.  And these aren't the only sequences that approach 3.7, they are just the simplest examples.  Other sequences include 3.7-1/n and 3.7+2-n.

Return to the definition of addition and subtraction.  Show that the result does not depend on the particular sequence used to represent the real number.  Start with u = s+t, as real numbers or as sequences.  Let e be a sequence that approaches 0 and add it to s, thus creating an equivalent sequence to s.  This adds e to u, term by term.  The new sequence u+e is equivalent to u by definition, thus the result has not changed.  Addition and subtraction of real numbers is well defined.

If c is rational and s is real, cs is the real number produced by the sequence csn.  This is a scale multiple of s.  If e approaches 0, add e to s.  Of course ce still approaches 0, so the result is equivalent to cs.  Scaling by rationals is well defined.  Therefore the reals form a vector space over the rationals.

If r is a real number other than 0 then its sequence contains infinitely many terms outside of ε.  Yet the sequence is Cauchy, and its tail is trapped within ε/2 of rn for some n.  This tail includes some rj, at least ε away from 0, hence the entire tail is ε/2 away from 0.  In other words, most of r is positive or negative, and bounded away from 0.  Let the real number r be positive or negative, according to its tail.  This is consistent with the positive and negative rationals as they embed in the reals.

As a corollary, the absolute value of a real number is found by taking the absolute value of its sequence.  If the sequence approaches 0 it still approaches 0, and if its tail went negative, that tail is now positive.

Define an order s < t if t-s is positive.  Note that adding e to s or t does not change the fact that s-t is positive, hence the ordering is well defined on real numbers.  Verify that this is a linear ordering on the reals.  That is, r<s and s<t implies r<t, and every pair of numbers is equal, or one is less than the other.  This ordering agrees with the ordering on the rationals.

Now turn the reals into a metric space.  The distance between s and t is the absolute value of s-t.  Two reals will not have distance 0 unless their difference is equivalent to 0, which makes s and t equivalent - hence the same real number.  t-s produces the negated sequence of s-t, and when you take the absolute value, you obtain the same distance.  The only thing left is the triangular inequality.  If r s and t are sequences defining real numbers, the triangular inequality holds term by term.  That is, |rj-tj| ≤ |rj-sj| + |sj-tj|.  The second sequence dominates the first, and is at least as large as the first.  So distance is a valid metric, giving a metric space.

Base open sets are open intervals of all sizes about all points.  This is the linear topology.

Let s and t be distinct real numbers with s-t positive.  If u is the difference sequence, there is an integer n and positive rational b such that uj exceeds b for all j > n.  We saw this earlier; a positive real number has a tail that is positive, and bounded away from 0.  Let c = tn+b/2.  This is at least b/2 above the tail of t, and at least b/2 below the tail of s.  Thus c is a rational number between s and t.  If we want an irrational number, find some rational whose square root is less than b/2 and add it to c.  The result is an irrational number between s and t.  The rationals and the irrationals are both dense in the reals.

Consider a new base for the topology of the reals, open intervals with rational end points.  If x lies inside an arbitrary open interval, there is a rational b between x and the lesser endpoint and a rational c between x and the greater end point.  The interval (b,c) contains x and is contained in the larger interval.  The entire interval is covered by open rational intervals, and remains open.  Rational intervals are sufficient to build the topology of the reals.

If r is a real number, let s be the tail of r, with finitely many lead terms discarded.  Let the difference sequence e be given by ej = rj-sj, or rj-rj+i for some fixed i.  Since r is Cauchy, rj and rj+i approach each other.  Thus e is a sequence that approaches 0, and s represents the same real number as r.  If r is nonzero, restrict r to a tail, so that all terms have the same sign and are bounded away from 0.

Implement multiplication and division of real numbers by multiplying or dividing the terms of their sequences.  The terms of a nonzero dividend are bounded away from 0 as described above, so division per term is well defined.  Division by the real number 0 is of course undefined.

If the sequence s is bounded by b, and t approaches 0, then bt also approaches 0.  Since bt dominates st, st = 0.  In other words, 0 times any real number is 0.

When s and t are nonzero, let b act as a bound on s.  Since bt is Cauchy, st is trapped within bt, and is also Cauchy.  As we saw above, we can add a 0 sequence e to t, which adds be = 0 to the result, so multiplication does not depend on the particular sequence; it is well defined on real numbers.

If t is nonzero, consider the inverse of t, implemented by taking the inverse of each term.  Let the terms of t all exceed b, for some rational b > 0.  Let the terms of t be restricted to cε beyond tn.  Consider 1/c - 1/(cε).  This becomes ε/(c(cε)).  We only make this term larger by replacing the denominator with b2.  Now ε/b2 shrinks as ε shrinks, hence 1/t is Cauchy.  If the sequence e approaches 0, add e to t and consider 1/(t+e).  Subtract 1/tj from 1/(tj+ej).  Once again the difference is bounded by ej divided by b2, which goes to 0 as j approaches infinity.  The inverse of a real number does not depend on the sequence representing that real number.

As an exercise, you may want to prove the following about real numbers.  These properties are inherited from the rationals, as you operate term by term on the representative sequences.

You may not be used to treating mathematical operators as continuous functions, so let me illustrate with 100400 = 40,000.  Let 100 deviate by ε/400 and let 400 deviate by ε/100, hence the product deviates by no more than 2ε+ε2/40000.  Assuming ε < 1, this is less than 3ε.  We can always restrict the domain, so that the product is arbitrarily close to a given value, hence multiplication is continuous. The reals form a metric space, but is it complete?  Does every Cauchy sequence converge to a real number?

Let r1 r2 r3 … be a Cauchy sequence of real numbers, i.e. a Cauchy sequence of Cauchy sequences of rationals.  Their limits cluster together.  Think of r as a matrix, where ri,j is the jth term in the ith sequence.  Build a new sequence s as follows.  For each n, find the tail of rn, such that every pair of terms contained in that tail has a separation less than 1/n.  If c is the first term of this tail, let sn = c.  If you like, select c farther along in rn, or get rid of some of the lead terms of rn, so that c = rn,n.  Now s is the diagonal of the matrix r.  This doesn't really change the proof; it just makes it easier to visualize.

We need to show s is Cauchy.  Given an ε, find k such that the limit of ri and the limit of rj differ by less than ε whenever i and j are greater than k.  Remember that the rows of r are themselves cauchy.  Then find n such that 1/n is less than ε.  If k is less than n, advance k to n.  Now assume k ≤ i ≤ j.  We know that the tail of ri, and hence the limit of ri, is trapped within si1/n, or siε.  The limit of ri and the limit of rj are within ε of each other.  And the limit of rj is within ε of sj.  Put these three together and si-sj is no more than 3ε, for all i and j beyond k.  Let ε shrink to 0, and s is Cauchy.  Hence s represents a real number.

Is s the limit of our sequence of real numbers?  Given ε, compute k so that 1/k is less than ε.  Then increase k as necessary, so that the terms of s, beyond k, are within ε of each other.  This can be done since s is Cauchy.  Now let j be any index beyond k.  Let e be the difference sequence s-rj.  For i beyond k, look at the term si-rj,i.  We know si and sj are within ε of each other, and as long as we stay within the tail of rj, sj and rj,i are within ε of each other.  Differences are bounded by 2ε, so the limit of s-rj is bounded by 2ε.  This holds for all j beyond k, and for any ε.  Therefore the real number s is the limit of our sequence of real numbers, and the reals are complete. Given a metric space S, let T be the set of Cauchy sequences taken from S.  Embed S in T using constant sequences, just as we carried the rationals into the reals.

Let a and b be two Cauchy sequences taken from T.  Let d(a,b) be the sequence dj = |aj,bj|.  Given ε, choose n so that both tails are within ε of an and bn respectively.  Now dj is within 2ε of dn.  In other words, d is a Cauchy sequence of real numbers.  Since the reals are complete, this is a real number.  Let this be the distance between a and b.

Verify that the triangular inequality holds; it is inherited from S.  And the distance from a to b is the distance from b to a.  This gives a valid pseudo metric.  Turn T into a proper metric space by clumping equivalent sequences together.  In other words, all sequences that are distance 0 apart become one point.  We did the same thing for the reals.  The result is a metric space, the completion of S.

Is T a complete metric space?  It is, and the proof is the same as that shown in the previous section.  Go back and apply it to a general metric space.  Therefore the completion of any metric space is complete. Let S be a set of real numbers, bounded above by the real number b.  There is a smallest real number u that bounds all of S.  Most books call this an axiom, the beginning of real analysis, but it is actually a theorem if real numbers are defined properly.  I'll prove the existence of the greatest lower bound.  Multiply through by -1 to parlay this into a proof of least upper bound.

Let b be a real number that is less than or equal to every number in the set S, and let q be any point in S.  Rescale, so that b is 0 and q is 1.  Now all of S is bounded below by 0, and at least one point in S is as high as 1.  Build a binary string t as follows.

Let t0 = 0.  Follow this with a radix point, so that t represents a real number somewhere between 0 and 1.  This is similar to a decimal expansion, but in base 2.  If 1/2 is a lower bound for all of S, let t1 = 1, else let t1 = 0.  The string t now represent 0 or 1/2.  The next digit represents 1/4.  Add 1/4 to the previous rational number (either 0 or 1/2), and if that is a lower bound for S then the next binary digit t2 is 1, else it is 0.  The next step adds 1/8 to the lower bound, if possible, or it leaves the lower bound where it is.  Thus t3 is 1 or 0 accordingly.  Continue this process, building the binary representation of a real number g.

Suppose s contains a real number q that is less than g.  The fractions whose denominators are powers of 2, such as 21/32, are dense in the reals, so let c be such a fraction somewhere between q and g.  Let c have l binary digits, commensurate with the power of 2 in its denominator.  g and c might agree in their first l digits, as defined by the string t, and then g could go on to be larger, but q is strictly less than c, and must be less than c in its first l digits.  Somewhere in the first l steps, t became larger than q, and that is a contradiction.  Therefore g is a lower bound for all of S.

Suppose some larger real number h is a lower bound for S.  Again choose c between g and h.  g is less than c in its first l digits.  For some i ≤ l, ti = 0 and ci = 1.  However, c is a lower bound for S, and by construction, ti should have been set to 1.   This is a contradiction, hence g is the greatest lower bound for the set S. Consider the function y = x2- as x runs from 0 to 1.  This is a continuous path whose y coordinate runs from - to +.  Is there some x with f(x) = 0?

If the domain and range are the rationals, the answer is no.  There is no rational square root of .  The path is continuous in this metric space, but the curve does not attain the value of 0, because the rationals are not complete.  If the metric space is complete however, every intermediate value is attained.  I'll prove this for y = 0; it is true in general.

Let S be the set of points with y < 0.  These points have x coordinates bounded by 1.  Let u be the least upper bound of this set.  Since the reals are complete, u is well defined.  Let v = f(u).  If v is positive then values of x near u have f(x) positive, and u is not the least upper bound.  If v is negative then values of x beyond u still have f(x) negative, and u is not a valid upper bound.  Therefore v = 0.  Every intermediate value has some x with f(x) equal to that value.

This theorem remains valid if the domain is n dimensional space, or any space that is path connected.  If the bottom of the hill is at sea level, and the top of the hill is 1,000 meters high, there is some point on the hill with elevation 374 meters.  Draw a path, any path, from bottom to top, and apply f, the elevation function, to this path, giving a function from [0,1] into [0,1000].  Now apply the previous theorem to find a point on the hill that is 374 meters high. The function 1/x is continuous on (0,1), yet the cauchy sequence 1/n does not map to a cauchy sequence in the range.  A cauchy sequence does map to a cauchy sequence however, when its limit point p is in the domain.  This is an equivalent criterion for continuity in a metric space, as described in an earlier section.

Let f be a uniformly continuous function from one metric space into another, and let s be a cauchy sequence in the domain, with image sequence t in the range.  Select an ε and find the corresponding δ, such that points within δ of x map to points within ε of f(x), for every x in the domain.  Find n such that points in s beyond sn are never more than δ apart.  In the image sequence, points beyond tn are never more than ε apart.  A uniform map carries cauchy sequences to cauchy sequences.

Extend f() to g(), from the completion of the domain to the completion of the range.  The first task is to show that g is well defined.  Different cauchy sequences could represent the same point in the complete metric space, as when 3.4999999… = 3.5000000…  Let the cauchy sequences s and t in the domain map to u and v in the range, respectively, and assume the distance sequence |sn,tn| approaches 0, whence s and t are equivalent.  Select ε and δ as before, and find n such that the distances |sj,tj| are bounded below δ for j > n.  Apply f to build the image sequences u and v, and the distance sequence |uj,vj| is bounded by ε, for j > n.  The image sequences approach each other, and represent the same point in the completion of the range.  The map g is well defined.

If x is in the domain of f, then the constant sequence sn = x is in the domain of g, and maps to the constant sequence f(x) in the range.  In other words, g is backward compatible with f.  The action of g on x is simply f(x) when x lies in the domain of f.

Finally, show that g is uniform.  Start with a sequence s in the domain, which maps to u in the range.  Given ε, select δ according to the dictates of f.  Consider any other sequence t such that |s,t| < δ.  Beyond some n, the terms of s are within δ of the terms of t.  Moving to the image space, the terms of u are within ε of the terms of v, and |u,v| ≤ ε.  (Technically, we're looking for strict inequality, but we can always choose δ for ε, and that is strictly less than ε.)  The selection of δ does not depend on s, thus g is uniform.

Extending f to the closure of S is much like the above, but there are a few more details.  S could be the open disk, which is already complete, and f extends to the closed disk, as long as f is uniform.

Let S and T be metric spaces, where f is a uniform map from U, a subset of S, into T.  If the range space T is not complete, embed T in its completion, which does not change f.  Let p be a limit point of U that is not in U.  Find a sequence in U that converges to p.  This sequence is cauchy, and its image in T is cauchy, with a limit point y.  Declare f(p) = y.

Is this well defined?  If two different sequences approach p, the points in these sequences must approach each other by the triangular inequality.  The distances between corresponding terms goes to 0, and when we apply the uniform map f, distances still go to 0.  The two image sequences are 0 distance apart, and approach the same point in the complete metric space T.  The value of y is well defined.

Note that f(p) really can't be anything else if f is to remain continuous at p.  The extension of f at p is well defined and unique.  Make a similar extension of f for all the points in the closure of U.

Finally, show that f is uniform.  Demonstrate a viable δ for an arbitrary ε, which applies across all of U closure.  Find δ for ε across U, which can be done since f is uniform on U, There is no trouble within U, so let p be a limit point of U.  Let q be a point of U that is within δ ( p.  Picture a cauchy sequence, drawn from U, approaching p.  Remember, the limit of the image of this sequence in T is y, and it doesn't matter which sequence we use.  Any sequence will do.  When the sequence is within δ ( p, include q in the sequence.  Use f to map this cauchy sequence, including q, over to T.  Terms of the image sequence, beyond f(q), are now separated by ε, and so f(q) is at most ε away from y.

Next let q be some other point in U closure, with f(q) = z.  The distance from p to q is strictly less than δ, so there is some r in U, quite close to p, that is still within δ of q.  The image of r is ≤ ε of both y and z, hence |y,z| ≤ ε.  That completes the proof. The lipschitz constant is a property of a function f from one metric space into another.  It is technically accurate to refer to this constant as kf, since it depends on f, although most textbooks simply use the letter k, where the function f is understood.  As we shall see, there are some functions with no lipschitz constant.  In this case k is undefined.  Here is the formal definition.

Let f be a function from one metric space into another.  The nonnegative real number k acts as a lipschitz constant for the function f if, for every x and y in the domain, the distance from f(x) to f(y) is no larger than k times the distance from x to y.  Write the relationship this way.

|f(x),f(y)| ≤ k |x,y|

Technically, the lipschitz constant for f is the greatest lower bound of all k satisfying the above criteria.  The condition fails when k is negative.  The smallest possible value of k is 0.  This can happen when f maps the entire domain onto one point.  Since the values of k are bounded below, let's see why the greatest lower bound is also a lipschitz constant.  Let k be the greatest lower bound, so that k+ε satisfies the above inequality, for any ε > 0.  Suppose there is a pathological x and y such that |f(x),f(y)| exceeds k times the distance from x to y.  In fact, let the distance from f(x) to f(y) be k+δ times the distance from x to y, where δ > 0.  Set ε to half of δ, and k+ε is not a lipschitz constant, whence the greatest lower bound should be at least k+ε. Therefore the greatest lower bound is a valid lipschitz constant.  If any k satisfies the above inequality for all of f, then f has a unique lipschitz constant, determined by taking the greatest lower bound of all such k.

In practice we rarely need to find the minimum lipschitz constant.  It is often sufficient to demonstrate the existence of such a constant, whence any k will do.  In other applications, any k less than 1 will suffice.

If f has a lipschitz constant k then f is uniform.  Select x in the domain and y in the range such that y = f(x).  Restrict attention to an open ball about y, with radius ε.  The open ball about x, with radius δ, where δ = ε/k, maps into the open ball about y.  So far this looks like the definition of continuity, but nothing in this argument depends on x.  It works for every x in the domain, hence δ is determined by ε, and the function is uniform.

As a corollary, a function that is not uniform, even a continuous function that is not uniform, cannot have a lipschitz constant.  An example is the function 1/x on the open interval (0,1). If f is a real valued function, and f is everywhere differentiable, and the derivatives are bounded (in absolute value) by k, then k acts as a lipschitz constant.

Suppose x1 and x2 are two points that violate the lipschitz constant.  In other words, the slope from x1,y1 to x2,y2 exceeds k.  By the mean value theorem, f′(c) exceeds k for some c between x1 and x2.  This is a contradiction, hence k is a valid lipschitz constant.

Continuing the above example, let u be an upper bound for the derivative of f, where f is continuously differentiable.  Further suppose f has some lipschitz constant k < u.  Let g = u-k, the gap between these values.  Find a point s such that f′(x) is at least k+g.  Select a neighborhood about s such that the derivative does not change by more than g.  Now the derivative throughout this neighborhood is at least k+g.  Take any two points x1 and x2 in this neighborhood.  The lipschitz constant tells us the slope from x1,y1 to x2,y2 is no larger than k.  The mean value theorem implies a derivative bounded by k, when in fact all derivatives in this neighborhood are larger than k.  This contradiction means the lipschitz constant can be no smaller than the largest derivative.  When f is continuously differentiable, the lipschitz constant is precisely the least upper bound of the absolute value of f′(x).

This result generalizes to n dimensional space.  Again, f is continuously differentiable across the domain. This time we are interested in the largest directional derivative.  This direction, and the change in output relative to the change in input, can be calculated via the jacobian. The upper bound of this magnification factor, across the entire domain, is the lipschitz constant. A function f that takes one metric space into another is a contraction map if it has a lipschitz constant less than 1.  Such a map is automatically uniformly continuous.

Let f be a contraction map from a complete metric space into itself.  We will prove the existence of an s such that f(s) = s.  This is the fixed point.  Furthermore, repeated invocations of f, starting at any point x, move inexorably towards s.  Thus s is an attractor.

For any starting point a0, the sequence of points ai+1 = f(ai) is cauchy.  Remember that the lipschitz constant k is less than 1.  Let d be the distance from a0 to a1.  Hence the distance from a1 to a2 is at most dk.  The distance from a2 to a3 is at most dk2, and so on.  The distance between am and an, for n > m, is no larger (by the triangular inequality) then the distance from am to am+1, plus the distance from am+1 to am+2, plus the distance from am+2 to am+3, and so on.  This is bounded by the tail of the geometric series dkm + dkm+1 + dkm+2 and so on.  This in turn is equal to dkm/(1-k).  As m increases, terms of the sequence cluster together, and the sequence is cauchy.

Remember that the domain is complete, hence the sequence approaches a point s.  Apply f to the sequence, and to s.  By continuity, f(s) is the limit of f(a).  When f is applied to the sequence a, it shifts the sequence to the left and drops the first term a0.  This does not change the limit.  Therefore f(s) = s.

The approach to s is geometric.  The distance from am+1 to s is bounded by k times the distance from am to s.

Finally the fixed point is unique.  If s and t are distinct fixed points, then f(s) and f(t) are closer to each other than s an t, which is impossible if s and t are fixed.  s is an attractor for the entire domain.

As we saw in the previous section, a lipschitz constant can often be inferred from the derivative of f.  If the derivative is bounded below 1, then f is a contraction map.  Start anywhere, repeatedly invoke f, and approach the fixed point.  Furthermore, the algorithm is efficient, since the approach is geometric.  Each step adds a certain number of digits to the precision.  For example, solve cos(x)-3x = 0 numerically by repeatedly evaluating cos(x)/3.  The derivative is bounded by 1/3, so the procedure is sure to work. Let a descending chain of closed sets in a complete metric space have diameters that approach 0.  As intuition suggests, they intersect in a single point.

Select any point from each set and construct a Cauchy sequence, which implies a limit point p.  Suppose p is not in one of the closed sets C.  Start the cauchy sequence at C; it is still cauchy with the same limit.  The sequence lies entirely in C, and p is its limit point, which puts p in the closure of C.  Since C is closed, p belongs to C.  This holds for every closed set in the chain, hence p is in the intersection.

Any other point q will be a certain distance from p, and will not be contained in the closed sets with lesser diameters.  The intersection is precisely p.

This result does not hold for open sets, as illustrated by the intervals (0, 1/n) in R1.  Similarly, closed sets in a non-complete metric space may close in on a missing point, as when shrinking rational intervals approach the square root of 2.

There are several important theorems, such as the baire category theorem, that lie at the confluence of metric and compact spaces.  These will be presented at the end of the next chapter.