Abstract Algebra and Discrete Mathematics

Chapter 38, Quadratic Forms


Introduction Table of Contents Start of chapter

This chapter is a significant departure from the previous topics.  Some would argue, and reasonably so, that it doesn't belong in this book at all.  I'm including it anyways, just because I like it.

Some of these sections require a basic understanding of calculus.  If you can take the first derivative of a function to find the line tangent to a curve, then you're in good shape.  If these concepts are foreign to you, then you might want to skip ahead to the next chapter.  This material is not used elsewhere in the book. A quadratic equation has a variable x that is squared, and possibly a linear term, and possibly a constant term.  In general, a quadratic form has many variables, where each term has degree 2 or less.  An example is shown below.

7x2 + 13y2 - 17z2 + 11xy - 5xz + 9x + 4y - 67 = 0

This is a 3 dimensional example; it defines a quadratic surface in 3 space.  You are probably more familiar with quadratic forms in two variables.  These generate the conic sections in the plane: parabola, ellipse, and hyperbola.  We'll get to all that later, but first, let's talk about quadratic forms and matrices.

Let M be a fixed matrix of real numbers and let x be an unspecified vector in Rn.  (You can use any field, not just the reals.)  The components of x are indeterminants, i.e. variables, x1 x2 x3 etc.  The quadratic form associated with M is x*M*xT, using standard matrix multiplication.  Expand this all out, and the coefficient on the squared term xi2 is Mi,i, while the coefficient on the mixed term xixj is Mi,j+Mj,i.

Different matrices can produce the same quadratic form.  Subtract 1 from Mi,j, and add 1 to Mj,i, and the result is the same.  By convention, M is a symmetric matrix.  Then there is no ambiguity.

Conversely, every quadratic form comes from a symmetric matrix M.  Take the coefficient on xi2 and place it in Mi,i.  Take half the coefficient of xixj and place this in Mi,j and Mj,i.  (If the field has characteristic 2, let M be upper triangular.)  Quadratic forms and symmetric matrices correspond 1 for 1.

If M is diagonal, the result is a diagonal form.  There are no mixed terms, no linear terms, no constant terms; just a series of squared terms.  If we can characterize the diagonal forms, we're halfway home.  This is because a rigid rotation turns the quadratic form into a diagonal form.  More on this later.

Conic Sections Table of Contents Start of chapter

As you recall from the introduction, a diagonal form is an equation whose terms are all squares.  This is produced by xMxT, where x is a row of variables and M is a fixed diagonal matrix.  We will explore the diagonal forms in 2 and 3 dimensions, and then generalize to n dimensions.

Diagonal forms in 2 dimensions are called conic sections, for reasons that will be explained later.  They take the form:

ax2 + by2 = c

Set c = 0 for the degenerate conics.  Let's consider those first.

If b = 0 the solution is the y axis; if a = 0 the solution is the x axis.  If both a and b are zero the form is not quadratic.  At least one term must have degree 2.

If a and b are both positive, or both negative, the only solution is the origin.

If a and b have opposite sign, replace the coefficients with their square roots and rewrite the equation as follows.

a2x2 - b2y2 = 0

(ax+by) (ax-by) = 0

The solution is a pair of lines that intersect at the origin and have opposite slope.  We've drawn a letter X on the plane.

Next assume c is nonzero.  This gives the "true" conic sections.  Divide through by c, so that the constant term is 1, and scale a and b accordingly.

ax2 + by2 = 1

Again, either a or b is nonzero.  If b = 0 the solution is empty (for a < 0), or two lines parallel to the y axis (for a > 0).  A similar result holds for a = 0, with lines parallel to the x axis.

If a and b are negative there is no solution.

Ellipse Table of Contents Start of chapter

If a and b are positive, replace the coefficients with their inverse square roots and write:

x2/a2 + y2/b2 = 1

The curve intersects the x axis at a, and the y axis at b.  In fact, we can rescale x by a, and y by b, and obtain the unit circle.  Therefore this shape, which is called an ellipse, is a stretched out version of the circle.  The circle is multiplied by a in the x direction and b in the y direction.  It is therefore described by the parametric equation:

x(t) ← acos(t)
y(t) ← bsin(t)

Verify that x(t)2/a2 + y(t)2/b2 equals 1, as it should.

Assume, without loss of generality, that a is greater than b.  In other words, the ellipse is oriented along the x axis.  The major axis (horizontal) has length 2a, and the minor axis (vertical) has length 2b.  The semimajor axis (from the origin to the right or left) has length a, and the semiminor axis (from the origin to the top or bottom) has length b.

The ellipse has a focus at the point 0,c on the x axis, where c2 = a2 - b2.  It has another focus at x = -c.  Together these are the two foci (the plural of focus).

My apologies for using the letter c again.  At the start of this chapter I used c as part of the equation of the ellipse.  But we normalized that away, and scaled the values of a and b accordingly.  From here on in, c is a value that is derived from a and b, according to the formula c2 = a2 - b2.  It lies on the major axis of the ellipse, which is the x axis in our example.  If a = b, a circle, then a2 - b2 = 0, and c2 = 0, and the focus is at the origin.  If a is substantially larger than b, than subtracting b2 hardly matters, and c is almost a.  The two foci are very close to the left and right edge.  So the location of the focus, between the origin and the right edge, is an indicator of the roundness, or flatness, of the ellipse.  I'll make this more precise in the next paragraph.  Note, it is called a focus because light is focused onto this point by an elliptical mirror.  We'll get to that later.

The eccentricity of the ellipse is the ratio c/a.  This can range from 0, when the ellipse is a circle, up to nearly 1, when the ellipse is very long and skinny.

If p is any point on our ellipse, the distance from p to one focus, plus the distance from p to the other focus, is always 2a.  This is clear when p is one of the two points on the x axis, or one of the two points on the y axis.  In fact it's true all the way around, as though you had tied two ends of a string of length 2a to the foci and traced the curve with a pencil, keeping the string taut.

Let q be the distance to the first focus, and r the distance to the second.

q = sqrt((x-c)2 + y2)

r = sqrt((x+c)2 + y2)

q + r = 2a (start with the conclusion and work backwards)

q = 2a - r

q2 = 4a2 - 4ar + r2

-4cx = 4a2 - 4ar

ar = a2 + cx

a2r2 = a4 + 2a2cx + c2x2

a2 (x2 + 2cx + c2 + y2) = a4 + 2a2cx + c2x2

a2x2 + a2c2 + a2y2 = a4 + c2x2

a2 - c2 = b2 (it's how c was defined)

b2x2 + a2c2 + a2y2 = a4

b2x2+ a2y2 = a2b2

x2/a2 + y2/b2 = 1 (formula for the ellipse)

If two points s and t are 2c units apart, and a is any distance greater than c, draw the points whose distance to s, plus the distance to t, equals 2a.  Choose coordinates so that s and t are on the x axis, with the origin in the middle.  This gives an ellipse with the formula shown above.  The major axis is an extension of the segment st, and the minor axis is perpendicular to st.

Hyperbola Table of Contents Start of chapter

By now you've probably forgotten how we got to the ellipse.  We were looking at all possible forms of ax2 + by2 = 1, and the ellipse arose when a and b are positive.  Of course we have a circle when a and b are equal.

The last case to consider is a positive and b negative.  (We could have b positive and a negative, but that just reverses the roles of x and y.)  Replace the coefficients with their inverse square roots and write:

x2/a2 - y2/b2 = 1

Once again the curve intersects the x axis at a, but it doesn't intersect the y axis at all.  This is obviously not an ellipse.  The curve, called a hyperbola, is disconnected, with one branch to the left of the y axis and one branch to the right.

For large x and y, the constant term 1 is almost insignificant.  In other words, the hyperbola approaches the degenerate conic x2/a2 - y2/b2 = 0.  This conic was described earlier.  The result was two lines that intersect at the origin, like an infinite letter X.  From far away, the two branches of the hyperbola approach the left and right sides of the letter X.  The lines of the letter X are called the asymptotes.  Therefore the hyperbola approaches its asymptotes.

The noun asymptote has been turned into an adverb, asymptotically, as in, "The curve approaches the line y=4 asymptotically."  Sometimes the curve is a hyperbola, but it could be any curve at all.  We could write, for instance, "The exponential curve approaches the x axis asymptotically as x approaches -infinity."  So the more general meaning of asymptote is a line that a curve approaches, steadily and monotonically.

Like an ellipse, the hyperbola has two foci.  Set c2 = a2 + b2 and plot the points c on the x axis.  These are farther from the origin than a, and they lie inside the two branches of the hyperbola.

The eccentricity is, once again, c/a, but this time the eccentricity is greater than 1.

Like the ellipse, the hyperbola defines, and is defined by, a distance relationship.  Let p be a point on the hyperbola, and let q be the distance from p to +c, and let r be the distance from p to -c.  On the right branch, r-q = 2a, and on the left branch, q-r = 2a.  The algebra is almost identical to that shown for the ellipse, but let's run through it again, for the left branch.

q - r = 2a (start with the conclusion and work backwards)

q = 2a + r

q2 = 4a2 + 4ar + r2

-4cx = 4a2 + 4ar

-ar = a2 + cx

a2r2 = a4 + 2a2cx + c2x2

a2 (x2 + 2cx + c2 + y2) = a4 + 2a2cx + c2x2

a2x2 + a2c2 + a2y2 = a4 + c2x2

a2 - c2 = -b2 (it's how c was defined)

-b2x2 + a2c2 + a2y2 = a4

-b2x2+ a2y2 = -a2b2

b2x2- a2y2 = a2b2

x2/a2 - y2/b2 = 1 (formula for the hyperbola)

Let s and t be two points in the plane, 2c units apart, and draw the points whose distance to s, minus the distance to t, equals 2a.  This reproduces the hyperbola.

Parabola Table of Contents Start of chapter

The parabola has a squared term in one variable and a linear term in the other.  Assume x is squared and y is linear.  Divide through by the coefficient of y, so it is simply y.  If there is a constant term c, replace y with y+c, so that the constant term goes away.  This moves the curve up or down in the xy plane, but doesn't change its shape.  Therefore the equation for a parabola looks like this.

ax2 = y

This shape does not close up like an ellipse, nor does it approach any asymptotes, like a hyperbola.  It is the conic in between.

As you might guess, the parabola has a focus.  It lies on the y axis, just above the curve.  Set c to 1 over 4a, and place the focus at +c on the y axis.  Draw a line parallel to the y axis, at y = -c.  This is called the directrix.

Like the ellipse and hyperbola, the parabola defines, and is defined by, a distance relationship.  A point p is on the parabola iff its distance to the focus equals its distance to the directrix.  Let q be the distance to the focus.  The distance to the directrix is obviously y+c.

q = y + c (start with the conclusion and work backwards)

q2 = y2 + 2cy + c2

x2 + y2 - 2cy + c2 = y2 + 2cy + c2

x2 = 4cy

ax2 = y (equation for the parabola)

Discriminant Table of Contents Start of chapter

If the equation has not been put into normal form, and it looks like ax2 + bxy + cy2 + dx + ey + f = 0, can you tell at a glance whether it is an ellipse, a parabola, or a hyperbola?  You can, just by checking the discriminant.  Evaluate b2 - 4ac.  If it is negative you have an ellipse.  If it is positive you have a hyperbola.  If it is 0 you have a parabola.  Verify this when b = 0, as per the previous sections.

ax2 + cy2 + dx + ey + f = 0

Rotate the plane through an angle of θ, which does not change the size or shape of the conic section.  This transformation is accomplished by an orthonormal matrix.  Let u and v be the alternate coordinate system, and rotate the uv plane to coincide with the xy plane.

uv
*
cos(θ)sin(θ)
-sin(θ)cos(θ)
=
cos(θ)u - sin(θ)vsin(θ)u + cos(θ)v

As u and v rotate on to x and y, the following substitutions are made.

x ← cos(θ)u - sin(θ)v
y ← sin(θ)u + cos(θ)v

If a b and c are 0, the equation dx + ey + f = 0 is that of a line.  The rotation of a line is a line, and as if in confirmation, the replacement of x and y with linear combinations of u and v produces another linear combination of u and v, thus a line in the uv plane.  Rotation cannot magically turn a line into a quadratic form.  It maps a line to a line.  Nor can it turn a quadratic form into a line, for if it did, the rotation through -θ would turn the line back into a quadratic form, which is impossible.  If the original shape is quadratic, then any rotation of that shape is quadratic.

Substitute for x and y and derive the new coefficients for u2, uv, and v2.  Then evaluate the new discriminant, using a program if you like; you will find that b2 - 4ac is multiplied by (sin(θ)2 + cos(θ)2)2, which is 1.  The discriminant is constant through every rotation.

The coefficient on uv is bcos(2θ) + (c-a)sin(2θ).  Set this to 0, and tan(2θ) = b/(a-c).  Reverse this algebra, setting θ to one of 4 angles 90 degrees apart, so that tan(2θ) = b/(a-c), and rotation through θ makes the uv term go away.  If a = c, putting 0 in the denominator, then let θ be 45, 135, 225, or 315 degrees, so that cos(2θ) = 0, and again the uv term goes away.

It is not surprising that there are 4 angles to choose from.  Picture the ellipse in the plane, aligned with the x and y axes, as described in the previous sections.  Rotate this through 90 degrees, and the ellipse is once again aligned with the axes.  Its major axis is now vertical, rather than horizontal.  Still, its equation does not have an xy term.  Rotate by another 90 degrees and again the conic section is aligned, and so on.  There are four choices.  If the conic section starts out with an xy term, and you want the uv term to go away, then you might rotate through 37, 127, 217, or 307 degrees.  In any case, there is an angle of rotation that rewrites the conic section in u and v, such that the uv term drops out.  As with any rotation, the discriminant is unchanged.

the final steps of normalization are easy, and once again they do not change the discriminant, in fact they don't change a or c at all.  The conic section is aligned with the u and v axes; slide it horizontally and vertically, so that the origin is in the center.  Here is an example.

4u2 + v2 + 8u - 12v + 24 = 0

4(u+1)2 + (v-6)2 = 16

The ellipse is centered at -1,6.  Make that the origin, or equivalently, think of u as u+1 and v as v-6.

4u2 + v2 = 16

u2/4 + v2/16 = 1

the major axis is vertical, from -4 to 4, and the minor axis is horizontal, from -2 to 2.  Of course this ellipse, in standard form, is a rotated and translated image of the original.

Here is another example.  Start with xy = 1.  Graph this in the xy plane as y = 1/x.  This curve approaches 0 for large x, and it approaches infinity for small x near 0.  It has two asymptotes, the x and y axes.  Since it is a quadratic form, it has to be a conic section.  The only conic section with asymptotes is a hyperbola, so you already know it's going to be a hyperbola.  Or you could evaluate the discriminant and get 1, which is positive, thus a hyperbola.  Its two branches are in the first and third quadrants.  The closes points to the origin are sqrt(2) away, and the two asymptotes meet at 90 degrees.  Let's see if these characteristics are preserved after rotation and translation.

What is the angle of rotation?  This is the special case where a = c, where the tangent of 2θ has to equal infinity.  Thus the cosine of 2θ is 0, giving 45 degrees, or 45 plus any multiple of 90.  Let k be the square root of , the sine and cosine of 45 degrees, thus x = ku - kv, and y = ku + kv.  Substitute and get this.

u2/2 - v2/2 = 1

u2/k2 - v2/k2 = 1

Happily, this is already centered at the origin, so there is no need for translation.  The major axis runs from u = -sqrt(2) to sqrt(2).  These are the closest points to the origin.  The branches open out to the left and right, and the asymptotes have slope 1 and -1, meeting at an angle of 90 degrees.  The foci are at 2 and -2.  The eccentricity is sqrt(2).

Like it's cousins the ellipse and the hyperbola, a parabola would also rotate and translate into place.  After normalization, its base lies at the origin, and it opens up, down, left, or right.  Any equation in two variables that is quadratic is a conic section: an ellipse, a hyperbola, or a parabola.  Of course the graph could be degenerate, giving a point, a line, two parallel lines, two intersecting lines, or nothing at all.

So - Why are They Called Conic Sections Anyways Table of Contents Start of chapter

I know, you're still wondering why these are called conic sections.  That's because each is the intersection of a cone and a plane.  The cone has to be an infinite double cone, like two cones joined at their tips, one pointing up and one pointing down.  The plane intersects the cone in a slice, or section, thus a conic section.

A plane can cut through one cone, say the top cone, and create a circle or an ellipse.  Tilt the plane up at a steeper angle, so that it intersects the top cone and runs parallel to the bottom cone.  This perfectly balanced conic section is the parabola.  Tilt the plane just a bit more, so it intersects both cones, and find the two branches of a hyperbola.  That's the idea; the proof is presented below.  But you hardly need a proof.  The cone is a quadratic form in 3 variables: vz2 = x2 + y2, where v sets the angle of the cone.  Replace z with a linear combination of x and y, as defined by the plane, and the result is a quadratic form in x and y.  It has to be a conic section.

The degenerate conic sections also result from a plane and a double cone.  Cut across the apex, where the two cones meet, and find a single point.  Let the plane run straight up through both cones, through the common apex, and find two intersecting lines.  Finally, a plane can run tangent to the left of the lower cone and the right of the upper cone, producing a single line.  The one exception is two parallel lines.  A plane and a cone cannot create two parallel lines, even though this is a degenerate conic section.  However, a plane and a cylinder can produce two parallel lines, or a single line of tangency, or a circle, or an ellipse.  It's the same story.  The cylinder has the equation x2 + y2 = c.  Replace x with a linear combination of y and z, as defined by the plane, and find a quadratic form in y and z, i.e. a conic section.

Ice Cream Cone Proof Table of Contents Start of chapter

Let a double cone point up from below and down from above, meeting at a common apex, as described in the previous section.  Let a horizontal plane cut straight across the top cone, intersecting in a circle.

Place a small sphere inside the cone, just below the plane.  In fact the plane and the sphere are tangent at the point q.  Then place a larger sphere in the cone, above the plane.  This sphere is also tangent to the plane at q.  Hence both spheres touch at the point q.  These spheres resemble two blobs of ice cream, scooped into an up-turned cone, hence this is called the ice cream cone proof.

The lower sphere intersects the cone in a lower ring, and the upper sphere intersects the cone in an upper ring.  Let p be any point on the circle, where the cone and the plane intersect.  Draw a segment from p down to the lower ring, and another segment up to the upper ring.  These segments add up to a fixed distance d, the distance between the parallel rings.  The segments are also equal to the distance from p to q.  Thus the distance from p to q is half of d, all the way around the circle.

This isn't very interesting, until we tilt the plane.  Tip the plane at an angle, so that it cuts the upper cone in an ellipse.  Embed a small sphere below the plane, and a larger sphere above the plane.  The spheres are tangent to the plane at the points q1 and q2.  These become the foci for the ellipse.

Let p be any point on the ellipse.  The distance from p to the lower ring, plus the distance from p to the upper ring, is a fixed distance d.  Now the distance from p to the lower ring is the same as the distance from p to q1, and the distance from p to the upper ring equals the distance from p to q2.  (If this is less than obvious, I'll prove it in a minute.)  Therefore the distance from p to the first focus, plus the distance from p to the second focus, is constant.  This defines an ellipse.

Ok, why is the distance from p to the upper ring equal to the distance from p to q2?  In each case p is the start of a ray that runs tangent to the upper sphere.  In one case the ray runs straight up the cone, and in the other the ray runs along the intersecting plane and just touches the sphere at q2.  Move away from the ice cream cone and prove equality in its own setting.  Let p be the head of two rays that run tangent to a common sphere.  The points of tangency are x and y.  The three points p x y determine a plane.  Restrict attention to this plane, hence our sphere becomes a circle.  This need not be the equator of the sphere; it could be a tiny circle near the north pole.  No matter.  The segments px and py are now tangent to the circle.  Let c be the center of the circle and draw radii from c to x and y.  Then draw the segment from c to p.  This produces two right triangles with a common hypotenuse.  Since the two legs formed by the radii are equal, the triangles are congruent, and px = py.

The ice cream cone proof also works for a cylinder.  Let a plane cut across a vertical cylinder at an angle, intersecting in an ellipse.  Place two spheres inside the cylinder, above and below the plane.  The points of tangency become the two foci.  The spheres intersect the cylinder in an upper and lower ring.  From here the proof is the same as the one above.  Therefore a cylinder and a plane intersect in an ellipse.

The ice cream cone proof can also be applied to the hyperbola.  Tip the plane up so that it intersects both cones.  You can picture the plane as vertical, or nearly so.  Place a sphere in the upper cone, just below the plane.  The sphere is tangent to the plane at the point q1, and the sphere intersects the cone in the upper ring.  Let p be a point on the upper branch of the hyperbola, where the plane and the cone intersect.  The distance from p to q1 is the distance from p to the upper ring.

Place another sphere in the lower cone, tangent to the plane at q2.  The tangent points q1 and q2 are the foci of the hyperbola.  The lower sphere also creates a lower ring, where it intersects the lower cone.  Let d be the distance from the upper ring, passing through the apex, and down to the lower ring on the opposite side of the cone.

Travel from p all the way down to q2, drawing a line segment in the plane.  This is a line from p tangent to the lower sphere.  Any other line from p tangent to this sphere has the same length.  So, draw a line from p along the cone through the apex and down to the lower ring.  Then Subtract the distance from p to the upper ring, and find a fixed distance d between the two rings.  Yet the distance from p to the upper ring is the distance from p to q1.  Put this all together and |p,q2| - |p,q1| = d.  This defines a hyperbola.

Parabolic Mirror Table of Contents Start of chapter

Aside from flat mirrors, the parabolic mirror is the most common.  It appears in everything from the household flashlight to the Hubble Space Telescope.

Place a light source at the focus of a parabolic mirror, and the light rays are reflected into a straight parallel beam, heading away from the mirror.  Most of the light streams forward, with a small percentage spreading out spherically from the front of the bulb.  Aim your flashlight at a target 30 feet away, and most of the light reaching that target actually comes from the sides and back of the bulb, bouncing off the mirror.  Car headlights are similarly designed.

Conversely, light streaming in from a far away source is reflected, by a parabolic mirror, toward the focus.  This is how a large telescope concentrates and magnifies the light from a distant galaxy.  A solar cooker also uses a parabolic mirror to concentrate the sun's rays on the object of your lunch.  Of course you need to point your cooker at the sun, or nearly so, to properly gather and focus the heat.

Consider the curve y = ax2, the equation of a parabola.  It doesn't really matter whether we measure distance in inches or meters or miles, so scale x and y by 1/a.  The new equation becomes y = x2.  The focus of this parabola is on the y axis, at y = .

Let p be a point on the parabola, and drop a perpendicular from p down to the directrix at y = -, and call this point q.  Light leaves the focus f, strikes the parabola at p, and is reflected straight up along the line qp.

Draw a line tangent to the mirror at p, and let this line intersect the directrix at r.  Now ∠fpr is the angle of incidence, and using the principle of vertical angles, ∠rpq is the angle of reflection.  We need to show these angles are equal.  If they are, then the light beam must be reflected along the line qp.

If the triangles qpr and rpf are congruent, then the two angles are equal.  They both share the common side pr, and the segment pq equals the segment pf.  (We established this earlier; distance to the focus equals distance to the directrix.)  If the third sides are equal, the triangles are congruent, and the angle of incidence equals the angle of reflection.

Let p be the point s,s2, hence q = s,-.  Of course the focus f is at 0,.  We only need find r.

The tangent line through p has slope 2s.  It has the following equation.

y-s2 = 2s(x-s)

Set y = -, and x becomes (--s2)/2s + s.  Subtract this from s, and the distance from q to r is (s2+)/2s.  For large values of s, this is about half of s, which is what we would expect.

Use the pythagorean theorem to find the distance from r to f.  Rewrite x as (-+s2)/2s.  Square this and add the square of , giving the following.

(2 - s2 + s4) over 4s2 +

(2 + s2 + s4) over 4s2

Take the square root to get the distance (+s2)/2s, which equals the distance from q to r.  The triangles are congruent, the angles are equal, and the parabola reflects light from the focus onto a distant object, or light from a distant object back to the focus.

Applications are not restricted to visible light.  Look at any satellite dish; it has the shape of a parabola.  It concentrates the signal from the transmitting satellite onto the receiver, then down to your tv.

The same principle works for sound.  You've probably stood in an exploratorium whispering into a large parabolic dish on the wall.  Your sounds are reflected straight behind you, across a crowded, noisy room, to another parabolic dish where your friend is standing.  You can hear each other easily, as though you were enclosed in a long invisible tube.  It sounds like you are face to face, even though you are back to back and 50 meters apart.  Meantime, the other patrons, milling about in the room, cannot hear your conversation at all, because your voice does not spread out in all directions as it normally would.

Elliptical Mirror Table of Contents Start of chapter

In the last section we discussed the parabolic mirror, which focuses light from a distant object onto a point.  This point is called the focus of the parabola.  In this section we turn to the elliptical mirror, which reflects the light from one focus onto the other.  Place a light bulb at one focus, and all the energy from that bulb is directed towards the other focus.

If you stand at one focus of a giant elliptical room with mirrors all around, and your friend stands at the other focus, you will see her everywhere you look.  If you and your friend are standing back to back, then you can see the back of her head reflected in the curved mirror in front of you.  Turn your head just a bit and you will see her right ear, and then her profile, as though you were walking around her.  At 180 degrees about you can see a giant image of her face in the far mirror, perhaps interrupted by the back of her head, which you see directly.  At every angle it's her, nothing but her, and at every angle she sees nothing but you.

Let the major axis of an ellipse run along the x axis, from -a to +a, and let the minor axis run up the y axis from -b to +b.  The foci are at f, where f2 = a2 - b2.

Imagine a particle moving along the ellipse.  Its position, as a function of time, is a parameterization of the ellipse.  Let's not worry about the precise parameterization for now.  We simply have x(t),y(t), and the particle's velocity is x′(t),y′(t).  For notational convenience, let u = x′(t) and v = y′(t).  Remember that the velocity vector is also the tangent vector, so the line tangent to the ellipse is defined by the vector u,v.

Stop the particle at a point p, with coordinates x,y and tangent vector u,v.  Draw the segments from p to the two foci.  If you recall our notation from an earlier section, the distance from p to +f is q, and the distance from p to -f is r.  This is summarized below.

q = sqrt((x-f)2 + y2)

r = sqrt((x+f)2 + y2)

If the elliptical mirror reflects light from one focus to the other, then the two lines, from p to the two foci, must make the same incident angle with the tangent vector u,v.  In other words, angle of incidence equals angle of reflection.  If we can prove this, then our elliptical mirror behaves as described above.

Remember that the angle can be characterized by the dot product.  In other words, the two angles are equal iff the dot product of u,v with either segment, divided by the length of that segment, gives the same number in absolute value.

(u,v) . (x-f,y) over q = (u,v) . (x+f,y) over r

Cross multiply and expand the dot products.

(u(x-f) + vy) r = (u(x+f) + vy) q

Square both sides and substitute for q2 and r2.

(u(x-f) + vy)2 ((x+f)2+y2) = (u(x+f) + vy)2 ((x-f)2+y2)

I used a program to expand and simplify.  After clearing out the common factor 4fy, this remains.

(v2 - u2)xy + uv(x2 - y2) - f2uv = 0

Now pick a parameterization.  For notational convenience, let c be the cosine of θ and let s be the sine of θ.  Thus x = ac and y = bs.  Differentiate to find u = -as and y = bc.  Substitute for x y u and v in the above.  Clear out the common factor of abcs and get this.

-a2(c2 + s2) + b2(c2 + s2) + f2 = 0

Remember that c2 + s2 = 1, and a2 - b2 = f2, hence the above equation is always true.  That completes the proof.

A Hall of Mirrors Table of Contents Start of chapter

A funhouse is a house where every wall is a mirror.  If you shine a flashlight directly into a wall, any wall, the light beam bounces straight back to you.  If you point your flashlight at an angle, the beam of light bounces all around the house faster than you can imagine.  In fact it probably makes its way into every nook and cranny.

Assume the fun house is closed to the outside world, so that light never leaves the house.  Yet inside, none of the rooms are closed off.  In other words, every room has an open door leading to another room.  The interior forms a connected set.  There is one light bulb in the house.  When it is turned on, does light spread throughout the entire house?  Or is there a small room somewhere that remains in darkness?

If all the walls consist of straight lines, we're pretty sure the entire house is lit.  Nobody has proved this yet, but that's the way it appears.  However, if some of your walls are curved, e.g. circular arcs, then some of the rooms might remain dark.

Place the light bulb at the center of your picture and draw a circle of radius 1 around it.  Then draw another circle of radius 2 around that.  Beams of light leave the bulb, bounce straight back from the inner circle, and return to the bulb.  The outer room, the annulus between the two circles, remains dark, but that's because it is closed off.  Cut a small door in the inner circle.  Light now streams out through this door, bounces off the outer circle, and returns to the light bulb.  As long as you're not standing near the doorway, the outer room remains dark.

This is a simple construction in theory, but it is very hard to build in practice.  If a person or object is placed in the inner room, some of the light bounces off that object and leaves the inner room at a different angle.  This light does not bounce straight back into the inner room.  Instead it bounces away and spreads throughout the annulus.  The slightest speck of dust could disperse light throughout both rooms.  I like the following solution better, because you could actually build it in a science museum, and it would work, even if people were standing inside to watch the action.  This is my original idea, so please do not implement it without permission.

Near the bottom of your paper, draw a horizontal line 6 inches long.  Then draw four vertical line segments 2 inches high, standing up from the horizontal base at 2 inch intervals.  You have built three rooms, each room 2 inches square.  Now draw the top half of an ellipse, joining the two outer walls.  This closes off the house.  Let the two foci of the ellipse be the top end points of the two inner walls.  If a beam of light bounces off the end point of either of these inner walls, it is actually leaving one focus of the ellipse.  It bounces off the curve of the ellipse and returns to the other focus, the other inside wall.  If a beam of light leaves the middle room in any other way, passing between the two foci, it bounces off the curve and returns to the middle room.  The left and right rooms remain dark.  You could walk out of the middle room, pass through the elliptical hallway, and step into one of the outer rooms, a step into total darkness.

Alternatively, place the light in the left room, and it will bounce over to the right room, and back again, leaving the middle room in darkness.  The middle room and the outer rooms are optically independent of one another, and remain so even if people and furniture are inside.

A pitch black room, adjacent to a room that is fully lit, separated by an open doorway, is a great effect, but if you are concerned about liability - leaving your patrons in complete darkness - you can place a red light in the middle room and a green light in the left room.  Visitors will see either a green world or a red world, depending on which room they are standing in.  Not quite as dramatic, but it still gets the point across.

Hyperbolic Mirror Table of Contents Start of chapter

Recall that an elliptical mirror reflects light from one focus to the other.  In contrast, a hyperbolic mirror reflects light from one focus away from the other.  After the light has been reflected, pull it back through the mirror, and it converges at the other focus.  Put a light bulb at the front focus, and it looks like it is shining from the back focus, from behind the mirror.

Once again, the angle of incidence must equal the angle of reflection.  The incident angle is determined by the tangent line at p and the line joining p to the front focus.  The reflected angle is determined by the tangent line and the line joining p to the back focus.  These angles must agree, for every point p on the hyperbola.

If you haven't done so already, please review the section on elliptical mirrors, the math here is virtually the same.  In fact the proofs are identical up to this equation.

(v2 - u2)xy + uv(x2 - y2) - f2uv = 0

At this point we need a parameterization for the curve, and sine and cosine just won't do.  Instead, use secant and tangent.  For notational convenience, let s = sec(θ) and let t = tan(θ).  Now x = as, y = bt, u = ast, and v = bss.  Substitute and expand, and clear out the common factor abs3t, yielding the following equation.

a2(s2 - t2) + b2(s2 - t2) - f2 = 0

Since s2 - t2 = 1, and f2 = a2 + b2, this equation is always true.  That completes the proof.

Surfaces in 3 Space Table of Contents Start of chapter

A reflecting telescope, with its parabolic mirror, is obviously a 3 dimensional object, but so far we've only described conic sections in 2 dimensions.  Rotate a conic section in 2 dimensions to create a quadratic surface in 3 dimensions.  For instance, rotate the parabola about its axis to get a paraboloid.  Its equation is x2 + y2 = z.  In other words, the distance from the z axis, when squared, gives the height (i.e. the z coordinate).  This is the paraboloid, the shape of the parabolic mirror in your flashlight, and in the Space Telescope.  If a light ray approaches the mirror running parallel to the z axis, restrict attention to the plane containing the z axis and the light ray.  This plane intersects the paraboloid in a parabola, which reflects the light ray towards the focus as described earlier.  Thus the paraboloid has the same optical properties as the parabola, but in 3 dimensions.

Similarly, an ellipse, rotated about its major axis, produces a prolate spheroid, like a rounded football, and light emanating from one focus is reflected to the other.  Spin the ellipse about its minor axis to get an oblate spheroid, like a rounded top, or perhaps a flying saucer.  Are there any other quadratic surfaces?

In this section, I will assume the equation has been normalized, i.e. rotated and translated, so there are no mixed terms, and no squared and linear terms in the same variable.  This is similar to the rotation by θ that was done in the plane to normalize a conic section.  The higher dimensional process (rotation and translation in n space) will be described later.  For now let's just assume it has been done.

If one of the variables is missing, e.g. there is no term associated with z, then we have a conic section in the xy plane as before, and the same curve appears for every z coordinate.  A circle extends to an infinite cylinder, and an ellipse becomes an elliptical cylinder.  The parabola stretches out into a trough, and the hyperbola becomes two branch surfaces, with intersecting planes as asymptotes.  The degenerate conics extend as well.  Parallel lines become parallel planes, and intersecting lines become intersecting planes.  With this case behind us, assume all variables are present, in squared or linear form.

If two of the variables are linear then the third must be squared, else the equation is simply a plane.  Consider the following example.

x2 + 5y + 7z = 0

Let u = 5y+7z and let v = 7y-5z.  Our equation becomes x2+u = 0, and v does not participate.  This is an extended parabola, as described above.  Whenever there are two linear variables, apply a change of basis to produce an extended parabola.

Next assume there is one linear variable and two squares, as in:

ax2 + by2 = z

If there is a constant c, fold it into the variable z.  This shifts the surface up or down, but does not change its shape.

There is a conic section for each level z, and a degenerate conic when z = 0.  Consider a = b = 1.  This produces a point at the origin, and a circle for each positive z, with radius sqrt(z).  This is the paraboloid discussed earlier.  When a and b are not equal, the resulting surface is an elliptical paraboloid.  Each cross section is an ellipse.

If a and b have opposite sign, then each cross section is a hyperbola, except z = 0, which yields two intersecting lines.  As you move down towards z = 0, the branches of the hyperbola squeeze closer and closer to their asymptotes.  At z = 0 they reach their asymptotes.  As z goes negative the branches move to the other sides of their asymptotes, pulling farther away as z approaches -infinity.  This is called a hyperbolic paraboloid.  You can see the hyperbolas, but you have to turn your head a bit to see the parabolas.  Restrict attention to the plane y = cx for some constant c.  Now some multiple of x2 equals z, and that's a parabola.

This shape is also called a saddle, since it curves upward in front and back, and down on the sides - just like a real saddle.  A more ubiquitous example might be the pringles potato chip.  You may have seen this shape in another form, namely z = xy.  When rotated by 45 degrees, this becomes x2 - y2 = 2z.

Finally assume all three variables are squared.  We can't simply fold the constant into z any more, so there are actually four coefficients to consider.

ax2 + by2 + cz2 = d

Start with d = 0, the degenerate surfaces.  At least two coefficients have the same sign, so assume a and b are positive.  If z is positive, only the origin will do, but if z is negative, the result is an elliptical cone.  Set a = b to produce a traditional cone, circular in cross section.  The distance from the z axis, squared, is proportional to the z coordinate squared, hence the radius is proportional to the height, and that's a cone.

If d is nonzero, divide through so that d = 1.  If a b and c are negative we are out of luck.  If they are positive we have an ellipsoid.  This is a scaled version of the sphere along the x y and z axes.  If the coefficients are equal the surface is a sphere.  If two of the three coefficients are equal the surface is a spheroid.

If a and b are positive and c is negative, move cz2 to the other side.  As z increases or decreases, the ellipse in the xy plane expands according to 1+cz2.  If the ellipse is a circle, i.e. a = b, the surface is a hyperboloid of one sheet.  It has a double cone as its asymptote.  This is a surface of revolution.  Place the branches of your favorite hyperbola to the left and right of the y axis, with a giant letter X acting as asymptotes, and spin the whole thing around the y axis.  The X becomes the double cone, and the branches form the hyperboloid.  If the base conic is an ellipse, rather than a circle, the surface is not a surface of revolution any more, since each cross section is an ellipse, not a circle.  Yet the surface is still called a hyperboloid, or perhaps an elliptical hyperboloid if you want to be precise.  It has an elliptical cone as asymptote.

Last but not least, two of the coefficients could be negative.  Rewrite the equation in the following form, using all positive coefficients.

ax2 + by2 = cz2 - 1

When z is close to 0, there is no solution.  When z = sqrt(1/c), x = y = 0, a single point.  As z moves farther away from 0 the ellipse grows larger.  This is a hyperboloid, or elliptical hyperboloid, of two sheets.  The term "two sheets" indicates two surfaces that are not connected.  One lies above the xy plane and the other lies below the xy plane.  When the cross section is a circle we have a surface of revolution.  Draw the capital X as before, but put the branches of the hyperbola above and below the X.  Now rotate about the y axis.  The X creates the double cone, the upper branch becomes an upper sheet inside the top cone, and the lower branch becomes a lower sheet inside the bottom cone.

These are all the quadratic surfaces in 3 space.

Translation Table of Contents Start of chapter

If a quadratic form includes the terms x2 and 52x, replace x with u-26.  This is called a translation.  The origin is moved 26 units in the x direction, making the surface easier to analyze.  We can always put it back, once we understand it.  The substitution eliminates the linear term, leaving u2-676.  The number 676 is subtracted from the constant term that was present in the original quadratic form.

In general, the linear terms that are associated with quadratics can all be eliminated, simply by moving the origin.  This does not change the shape of the surface.  Once this is done, each variable appears squared, or linear, but not both.  And as we saw in the last section, several linear variables can be combined into one.  Assuming there are no mixed terms, (see the next section), the quadratic form has become a combination of squared variables, plus a constant, plus at most one linear variable.

Rotation Table of Contents Start of chapter

The complete analysis of an arbitrary quadratic form in n dimensions involves matrices, eigen vectors, and eigen values.  These are used to rotate the surface into position, so that the axes of the surface line up with the coordinate axes.  Turn your head, and suddenly the surface looks familiar.

The equation xy = 1 is a perfect example.  It's a hyperbola, but how do you know that?  It doesn't look like ax2 - by2 = 1.  Well - you can transform the equation into standard form by turning your head 45 degrees.  A rotation causes the mixed term xy to go away, leaving the terms u2 and v2.  This example was analyzed in detail in the earlier section on conics and discriminants.  Happily, the same thing happens in n dimensions.  A rotation produces a quadratic form with no mixed terms.  Follow this up with a translation, as described in the previous section, to write a quadratic equation in standard form that represents the same shape.

Start with a quadratic form in n variables.  This is an equation of degree 2, i.e. at least one term has degree 2.  Move the linear and constant terms to the side.  We'll bring them back later.  This leaves a series of squared and mixed terms in the variables x1 x2 x3 … xn.  In the introduction, which was a ways back I know, I represented this expression using a symmetric matrix M.  The sum of squared and mixed terms is equal to xMxT, where x is a row vector containing the n variables, and M is the corresponding symmetric matrix.  The shape, in n space, is the set of points x such that x*M*xT = 0.  Let P be a matrix that is a change of basis.  A vector u, run through P, gives a vector x, and conversely, x has some preimage u under P.  The set of points in our quadratic surface, using the new coordinates, consists of the points u that correspond to the points x.  In other words, u*P*M*(u*P)T = 0.  In the special case where P is orthonormal, P transpose is P inverse, thus u*P*M/P*uT = 0.  We are looking for that special matrix P.

Apply this to the 2 dimensional example xy = 1.  We found, from first principles, that a rotation of 45 degrees did the trick.  With k as sqrt(), the rotation matrix P had [k,k] on top and [-k,k] below.  Instead of rotating and substituting for x and y in the quadratic, as we did before, let's use the rotation to change the matrix M, relative to the new coordinates.  The symmetric matrix M corresponding to xy has in the upper right and the lower left.  Q, the inverse of P, is P transpose, moving -k from the lower left to the upper right.  Evaluate PMQ, and get in the upper left and - in the lower right, thus x2/2 - y2/2 = 1.

kk
-kk
*
0
0
*
k-k
kk
=
0
0-

Let M be a symmetric matrix representing a quadratic form in n dimensions.  Use Schur's theorem to transform M into a lower triangular matrix.  Recall that the change of basis in that theorem is orthonormal, i.e. a rigid rotation, that does not change the shape of the quadratic surface.  This is almost what we need, but two pieces of the puzzle are missing.  The triangular matrix must come out diagonal, and the eigen values must all be real, so that the rotation takes place in real space.  You don't have to dip into complex numbers to turn your head.  A symmetric matrix has both these properties.  It's eigen values are real, and its transformation produces a diagonal matrix.  That is less than obvious, but after we prove it, the resulting quadratic form comprises quadratic terms, with no mixed terms.  Bring the linear terms back in, translate the origin, and write the quadratic in standard form.  That ties it all together.  We only need prove a symmetric matrix has these properties.

Normal Matrix Table of Contents Start of chapter

A matrix that builds a quadratic form is real and symmetric, but in this section I will consider all matrices over the complex numbers C.  The dot product u.v is defined as ∑ uivi, which is backward compatible with u.v over the reals.  Dot product is no longer commutative, but that isn't a showstopper.  In fact, v.u is the conjugate of u.v.  With this in mind, u.u is its own conjugate, and is real.  Furthermore, u.u is nonzero as long as u is nonzero.  It is the sum of the squares of the real and imaginary components of the entries in u.

The conjugate of 0 is 0, thus u.v is 0 iff v.u is 0.  An orthonormal matrix is well defined over C Gram Schmid and Schur's theorem generalize to complex numbers, so that M is unitarily similar to a lower triangular matrix.  Write PMQ = T, where T is triangular, and P and Q are inverse orthonormal (also known as unitary) matrices over C.

The transpose of a matrix is its reflection through the main diagonal, but the tranjugate is the conjugate of the transpose.  The transpose of M is written MT, and the tranjugate is written M*.  If M is real then these operations are the same.

If P is orthonormal, using the complex dot product, then P times its transpose is no longer the identity matrix; you need to use the tranjugate.  In other words, P* is the inverse of P.

A normal matrix, not to be confused with orthonormal, commutes with its tranjugate.  That is, MM* = M*M.

If M is real and symmetric, then M* = M, and M is normal.

Let M be an orthonormal matrix, hence M and M* are inverses.  A left inverse is a right inverse, so M and M* commute, and M is normal.

A diagonal matrix is normal.  Each diagonal entry is multiplied by its conjugate, and the order doesn't matter.

Let M be diagonalizable.  In other words, PM/P is a diagonal matrix, where P is nonsingular.  Further assume P is unitary, so that M is unitarily similar to a diagonal matrix.  Remember that Q, the inverse of P, is also the tranjugate of P, and is orthonormal.  Q and P are inverse rotations.  The tranjugate of PDQ is Q* D*P*.  The tranjugate of P is Q, and the tranjugate of Q is P.  Thus (PDQ)* = PD*Q.  Use this in the following.

MM* =
(PDQ) * (PDQ)* =
PDQ * PD*Q =
PDD*Q =
PD*DQ =
PD*Q * PDQ =
M*M

A unitarily diagonalizable matrix is normal.  Conversely, let M be a normal matrix.  If P is orthonormal, then PMQ is also normal.  Multiply it out and see.  With this in mind, apply Schur's theorem, making PMQ a normal, lower triangular matrix T.  Thus T commutes with its tranjugate.  We want to show that T is in fact diagonal.

Let U be the tranjugate of T, so that U is upper triangular.  Let c be the upper left entry of T, hence c is the upper left entry of U.  Consider T*U.  The top row of the product is c times the top row of U.  The upper left entry is c.c, which is the real part of c squared plus the imaginary part of c squared.  Multiply U by T and get the same upper left entry.  The left column of t, dotted with itself, equals c.c.  The dot product increases as you bring in more nonzero entries.  It starts out as c.c, and can't get any larger after that.  Thus all the other squares are 0, and the left column of T is 0 except for the upper left entry.  Since U is T*, the top row of U is 0, except for the upper left entry.

The same analysis holds for the second column of T, and the third, and so on down the line, until T is diagonal.  Therefore, M is normal iff it is unitarily diagonalizable.

There is another criterion for normal.  M is normal iff M has n orthogonal eigen vectors.  One direction is easy.  Let M have orthogonal eigen vectors, and scale them so they form an orthonormal matrix P.  Multiply P by M, and run the rows of P, the eigen vectors, through M.  The result is a matrix whose rows are scale multiples of the rows of P.  Multiply this by Q, the inverse of P, and get a diagonal matrix whose entries are the eigen values.  Thus M is unitarily diagonalizable.

Conversely, if M is normal, write M = PDQ, where D is diagonal.  The rows of Q are eigen vectors of M, with eigen values in D.  The rows of Q are orthonormal, hence M has n n orthogonal eigen vectors.

Hermitian Operator Table of Contents Start of chapter

A hermitian operator f is a function on Rn, or Cn, such that for any two vectors x and y, f(x).y = x.f(y).  I could write several chapters on hermitian operators, but the key observation is that a hermitian matrix, i.e. a matrix that equals its tranjugate, implements a hermitian operator.  Let M be hermitian and start with x*M.y.  Remember that the dot product conjugates the second argument.  Thus x*M.y = x*M*y, where x is a row vector and y is a column vector.  By associativity, this is x*(M*y).  Take the transpose of M*y and get y as a row vector times MT.  x times this is x dotted with its conjugate, or x.y*M*, or x.y*M.  As a function from complex space into itself, M is a hermitian operator.

Let v be an eigen vector of M, with eigen value s.  Write s as sv.v over v.v.  The denominator is a real number, and is not affected by conjugation.  Therefore the conjugate of s is v.sv over v.v.  If f is the function implemented by M, then sv.v is f(v).v, and v.sv = v.f(v).  Since f is hermitian, these are equal.  The conjugate of s = s, and s is real.

Finally we can rewrite an arbitrary quadratic equation in n variables.  The matrix M is real and symmetric, hence hermitian.  Its eigen values are real.  It is also normal, and unitarily diagonalizable through its real eigen values.  A real orthonormal matrix P implements a change of basis, such that PMQ is diagonal.  The resulting quadratic form has squared terms and no mixed terms.  Translate the origin, so that it lies at the center of the quadratic surface.  That puts the equation in standard form.

If you are willing to stretch or shrink the shape in n dimensions, and perhaps stretch it back when you are done, all the coefficients can be scaled to 1.  Now there are only a handful of cases to consider, as we did in 2 and 3 dimensions.  For instance, set all coefficients to 1, including the constant, to get the unit sphere.  Scale the sphere by positive numbers in n dimensions, stretching or shrinking along each axis, to get the generalized ellipsoid.

Intersecting with a Plane Table of Contents Start of chapter

Let f be a quadratic form in n space.  In other words, f is a quadratic expression in n variables.  Let h be a hyperplane running through n space.  In other words, h is a linear expression in n variables.  What is f∩h?

Find a rigid rotation / translation that carries the hyperplane h to the plane z = 0.  Apply the same rotation to f.  This replaces each variable with a linear combination of variables.  For instance, x might be replaced with 3x+5y-7z.  After substitution and expansion, f is still a quadratic form.  Each term still has degree 2 or less.  Therefore f∩h is a quadratic form inside h, establishing a quadratic surface in a lower dimension.

Recall that a conic section is the intersection of a cone and a plane.  We proved this geometrically using the ice cream cone proof, but there is also an algebraic justification.  The (possibly elliptical) cone has the equation z2 = ax2 + by2.  This is a quadratic form.  When it is cut by a plane, the result is a quadratic form in two dimensions.  This is a conic section: an ellipse, a hyperbola, or a parabola.  The same holds for any slice of a quadratic surface in 3 space.  Intersect a hyperboloid with a plane, and you're sure to get a conic section of some sort.

Unchanging Sign Table of Contents Start of chapter

Let x be a row vector of variables x1 x2 x3 … xn, and let q(x) be a quadratic form in x.  Ignore any linear terms; they only move the origin to a new location.  And don't worry about the constant term.  We simply have terms of degree 2.  Thus q(x) = x*M*xT for a suitable symmetric matrix M.

Let u be a unit vector in n space with q(u) positive.  Now q(2u) is 4 times as large as q(u).  The same holds for q(-2u).  It follows that q(u) is positive along the entire line determined by u, except for q(0), which equals 0.

The sign of q(x) depends only on the direction of x.  It is enough to establish the positive and negative regions on the unit sphere; these extend to all of Rn.

Assume q(x) never changes sign.  If it is positive along one vector, it is never negative along another.  An example is the ellipse x2/4 + y2/9.  Set this equal to a positive constant c and find an ellipse, i.e. the curve x2/4+y2/9 = c.  As c increases the ellipse grows larger, but it doesn't change shape.  The shape of the conic section is captured by the quadratic form q(x).  The constant c only changes the size.

When c = 1 the ellipse surrounds the origin, and that means q(x) is positive along every vector.  Thus q is a quadratic form that does not change sign.

We know q is never negative, but is it ever zero?  We would like to show that q vanishes precisely on the kernel of M.  One direction is easy.  If x is in the kernel of M, then x*M*xT is zero.  Because M is symmetric, the kernel from the left is the kernel from the right.  Thus x*M = 0 iff M*xT = 0.  Is there some other x with q(x) = 0?

Let q(x) = 0 for some unit vector x.  Let y be any nonzero vector.  Compute q(x+sy) for an arbitrary scaling factor s.  Expand q(x+sy) using matrix notation.

q(x+sy) =
(x+sy) * M * (x+sy)T =
x*M*xT + sy*M*xT + sx*M*yT + s2y*M*yT =
q(x) + sy*M*xT + sx*M*yT + s2y*M*yT =
0 + sy*M*xT + sx*M*yT + s2y*M*yT =
2sy*M*xT + s2*q(y) { M is hermitian }

Let s range from -∞ to +∞.  We have a quadratic expression in s that is nowhere negative, since q is nowhere negative.  Furthermore, when s = 0, q(x+sy) = 0.  The parabola in s attains its minimum at s = 0.  The first derivative at 0 must be 0, implying y*M*xT = 0.  The choice of y was arbitrary, so when y = M*xT, (M*xT).(M*xT) = 0.  The length of M*xT is 0, hence M*xT is the zero vector.  The quadratic form vanishes on the vector x iff x is in the kernel of M.

The above proof can be generalized just a bit.  Let q(x) = 0, and let q have the same sign around x.  This is a local property; it holds in a neighborhood of the unit sphere about x.  Within some open set, q is not both positive and negative.  Let y be any vector and evaluate q(x+sy).  For small s, q(x+sy) is always ≥ 0, or ≤ 0, and q(x+0) is still 0.  The parabola in s still has its minimum or maximum at 0, the first derivative is 0, and x is in the kernel of M.  If this local property holds everywhere, i.e. q has unchanging sign throughout the unit sphere, then q vanishes precisely on the kernel of M.

If q(x) has unchanging sign, what does the surface look like?  Rotate coordinates, so that q is in standard form.  In other words, M is a diagonal matrix.  Some of the diagonal entries may be zero.  The coordinates corresponding to these 0 entries span the kernel of M, and represent the subspace where q(x) is zero.  The remaining entries on the main diagonal are all positive, or all negative.  Set q(x) equal to a constant of the same sign and find an ellipsoid in n space.  The entries of M establish the shape of the ellipsoid.  If they are all equal, for instance, the ellipsoid becomes a sphere.

In contrast, let q = xy.  This is a quadratic form in 2 dimensions with changing sign.  It is positive in the first and third quadrants, negative in the second and fourth quadrants, and 0 on the axes.  Restrict attention to the unit circle, and q is positive or negative on the open 90 degree arcs, and 0 on the 4 points north south east and west.  At x = 1, q is positive above and negative below.  Perhaps q vanishes on something other than the kernel of M.  Indeed it does.  M is the symmetric matrix [0,] [,0].  This has a nonzero determinant, and a 0 kernel, yet q vanishes on two lines in the plane.

Extremal Values Table of Contents Start of chapter

Let q(x) be a quadratic form in n dimensions. The sphere is a closed bounded region in Rn, hence q, a continuous function, attains its minimum and maximum somewhere on the sphere.  Let e be the minimum value of q on the sphere.  Let u be a unit vector such that q(u) = e.

Note that e = e*1 = e*x.x (for x on the unit sphere) = x.ex.  Thus q(x) ≥ x*e*xT, for any unit vector x on the sphere.  Write the inequality this way.

q(x) - e ≥ 0

q(x) - x*e*xT ≥ 0

x*M*xT - x*e*xT ≥ 0

x*(M-e)*xT ≥ 0

Here M-e is notation for the matrix M with e subtracted from the main diagonal.  Since M-e is symmetric, it defines a new quadratic form, which I will call r(x).  Note that r(x) is never negative.  It is positive where q(x) exceeds e, and 0 where q(x) = e.  Apply the previous theorem to show that r vanishes on x iff x is in the kernel of r.  This means x is an eigen vector of M, with eigen value e.  In particular, u, with r(u) = 0, is an eigen vector of M with eigen value e.

If u points to the maximum value e, you can run the same proof.  The new quadratic form r(x) becomes nonpositive instead of nonnegative, but that's ok.  The previous theorem still holds, and once again u vanishes on the kernel of M-e, and u is an eigen vector of M, with eigen value e.

To summarize, the minimum and maximum of q on the unit sphere are eigen values of the underlying matrix M, and the vectors that point to these extremal values are eigen vectors of M.

Apply this result recursively to find all the axes of the surface.  Start by finding an orthogonal set of eigen vectors for the underlying matrix M.  This can be done because M is unitarily diagonalizable.  Let e be the minimum of q on the unit sphere.  If none of the eigen vectors has eigen value e, then a new eigen vector u, with the distinct eigen value e, is independent of the others.  We can't have n+1 independent vectors, so this is a contradiction.  One of our original eigen vectors has eigen value e.  Call it u.

Restrict attention to the space perpendicular to u.  All the remaining eigen vectors lie in this perpendicular subspace.  Once again q(x) has some minimum on the unit sphere contained in this subspace, and it is another eigen vector.  Continue through all n dimensions, finding eigen values and eigen vectors.  Thus all eigen values lie between the minimum and maximum of q(x) inclusive.

For a more intuitive proof, rotate coordinates so that M becomes diagonal.  This does not change the maximum or minimum of q on the sphere, nor does it change the eigen values.  Now the diagonal entries are the eigen values.  q(x) is everywhere positive on the unit sphere iff all eigen values are positive, and the same for q(x) negative.  Furthermore, q(x), on the unit sphere, is trapped between the smallest and largest entry on the main diagonal, i.e. the least and greatest eigen value.  Assume all eigen values are positive, thus an ellipsoid.  Travel along the ith axis, the ith coordinate, and the value of q on the unit sphere, in that direction, is the coefficient on xi2, or the eigen value Mi,i.

Try to picture all of this geometrically.  An ellipsoid in 3 space might take the form x2+2y2+3z2.  The minimum value occurs on the x axis, and the maximum value occurs on the z axis.  These values are 1 and 3, and they are eigen values of the diagonal matrix with entries 1 2 and 3.  The y axis represents the remaining eigen vector.  This points to the minimum or maximum if we restrict attention to the yz or xy plane respectively.