Matrix Polynomials, Power Series and Exponential Functions

Power Series

Let f be an analytic function with a radius of convergence of r. Let M be a matrix with spectral radius less than r. Expand f into its power series and evaluate it at M. Does it converge?

Let a nonsingular matrix P convert M to jordan canonical form. Apply P to the power series that defines f(M). The original series converges iff the new series converges. If the original sum was l, the new sum is Pl/P.

Concentrate on a simple jordan block with eigen value z. Remember that z is inside our circle of convergence. The main diagonal becomes f(z), which is convergent. If there are ones below the main diagonal, apply the binomial theorem. The subdiagonal of the n^th term is a_nnz^n-1. Add these up and the subdiagonal becomes f′(z). Since f is analytic, the derivative exists, hence the subdiagonal converges. The next diagonal converges to f′′(z)/2!, and so on. Put this all together and f(M) converges.

Exponential

The exponential of a matrix M, written exp(M), or E^M, is computed using the power series for E^z. The circle of convergence is the entire plane, hence E^M converges for every matrix M.

When A and B Commute

It doesn't happen often, but assume the matrices A and B commute, and expand E^A+B. Then expand each term using the binomial theorem. On the other side, expand E^A and E^B, and multiply these series together. This gives a product series, which converges, regardless of the order of the terms, because E^A and E^B converge absolutely.

How do we make the terms correspond? Consider the n^th diagonal on the right. This includes Aⁿ/n! times 1, and 1 times Bⁿ/n!, and everything in between. Multiply this by n!, for our convenience. The result is the sum of (n:i)AⁱB^n-i, which is simply (A+B)ⁿ. Divide by n! and get (A+B)ⁿ/n!. This is the n^th term of E^A+B. The terms correspond, and E^A+B = E^AE^B.

Using Jordan Form to Compute the Exponential

Given a matrix M, convert to jordan form, evaluate E^M, then apply the inverse transformation to recover the true sum.

Let's consider a simple jordan block. If the eigen value is l, the diagonal becomes E^l, and the k^th subdiagonal converges to the k^th derivative over k!, which is E^l/k!.

Adjusting the Exponential by t

Hold M fixed, and let s and t be scalars, i.e. complex numbers, or if you prefer, scaled versions of the identity matrix. consider E^sM.

Use P to convert M to jordan form, then multiply by s. Concentrate on a jordan block with eigen value z. Multiply by s and evaluate, and the diagonal becomes E^sz.

Move to the subdiagonal. Remember that the diagonal of the jordan block has become sz, and the subdiagonal has become s, instead of 1. Now the subdiagonal on the n^th term becomes sn(sz)^n-1/n!. add these terms together, factor out s, and get s times the derivative of the exponential, evaluated at sz. This is sE^sz. The k^th subdiagonal becomes s^kE^sz/k!.

Use the same procedure to find E^tM, and consider E^sM times E^tM. Base change commutes with product, so apply the transformation that turns M into jordan form. Look at the product of two corresponding blocks. The diagonals are, respectively, E^sz and E^tz. The diagonal of the product is E^szE^tz, or E^(s+t)z.

Bravely move on to the subdiagonal. The subdiagonal of the product is the main diagonal of the first block times the subdiagonal of the second, plus the subdiagonal of the first times the main diagonal of the second. This simplifies to (s+t)E^(s+t)z. This happens to agree with the subdiagonal that results from E^(s+t)M.

You can probably see where this is going, but let's check the next diagonal down. The common factor E^(s+t)z is multiplied by s²/2+st+t²/2, or (s+t)²/2. This agrees with the evaluation of E^(s+t)M.

You'll need some combinatorics to verify the k^th subdiagonal. Pull out the common factor E^(s+t)z, and look at the remaining sum. When multiplied by k!, you have the sum of (k:i)s^k-itⁱ. This is (s+t)^k. Divide by k! and the expression becomes (s+t)^kE^(s+t)z/k!, which agrees with E^(s+t)M. In conclusion, E^sME^tM = E^(s+t)M.

Of course we didn't have to work this hard. Since sM and tM commute, we can invoke the earlier theorem, whence E^sM+tM = E^sME^tM.

As a corollary, set s = 1 and t = -1, and E^M is invertible, with inverse E^-M.

Making t a Variable

Instead of being a fixed scalar, let t be a real or complex variable. Consider E^Mt as a matrix function of t, and differentiate. Expand E^Mt using its power series, which is analytic across the entire matrix, and along each entry. The entry functions can be differentiated term by term, hence each matrix function in the power series can be differentiated term by term. The n^th term becomes Mⁿt^n-1/(n-1)!. The first term was constant, and dropped out, so we're really starting with n = 1. Reindex, and start with n = 0. Now the terms are Mⁿ⁺¹tⁿ/n!. This is M times E^Mt.

Since M commutes with Mⁿ, M commutes with E^Mt. Therefore the derivative of E^Mt can be expressed as mE^Mt, or E^MtM.

Differential Equation

Consider the differential equation y′(t) = y(t)M, where M is a fixed matrix, and y(t) is a matrix function of t. This looks like one differential equation in one variable, when written in matrix notation, but there are really many functions of t, one for each entry in the matrix, and many equations that relate these functions to their derivatives. So this is really a system of linear differential equations. Whatever you call it, let's look for solutions.

The function y(t) = E^Mt is a solution, as shown above. Furthermore, if c is a constant matrix, cE^Mt is a solution. If matrices are n×n, then c provides n² linearly independent solutions. This makes sense, since there are n² individual functions of t inside the matrix function y(t). To complete the characterization, we need to show there are no other solutions.

Suppose there is some other solution z(t). Since E^Mt is everywhere nonzero, divide z by E^Mt and call the quotient q. In other words, our solution is q(t)E^Mt. Use the product rule to differentiate.

q′E^Mt + qE^MtM = qE^MtM

q′E^Mt = 0

Q′ = 0 (multiplying both sides by E^-Mt)

q = c (for a constant matrix c)

All solutions have been identified, and the solutions form a vector space of dimension n², as dictated by the constant matrix c.

The analogous differential equation y′ = My has solutions E^Mtc, where the constant matrix c is multiplied on the right.

Generalizing Mt

If M is a matrix of constants, the derivative of Mt, with respect to t, is M. Thus the derivative of E^Mt is E^Mt times the derivative of the exponent. This begins to look like the familiar formula from calculus. Let's try to generalize this to arbitrary functions in the exponent.

Let U(t) be a matrix of functions in t, where each function is differentiable, or analytic if t is complex. Consider the derivative of E^U(t), at a particular time t, which I will call t = 0 for convenience. Let U′(0) = M, a matrix of constants.

Start with the difference quotient: (E^U(h)-E^U(0))/h. Focus on the first exponent U(h). Replace this with h×(U(h)-U(0))/h + U(0). The first term becomes h times something arbitrarily close to M, say h times Q, where Q is M±ε. Now the difference quotient looks like this.

{[E^hQ × E^U(0)] - E^U(0)} / h

Replace the first exponential with its power series, and pull 1 out. This yields 1 times E^U(0), which cancels -E^U(0) in the numerator. That leaves us free to divide through by h.

{Q + hQ²/2 + h²Q³/6 + …} × E^U(0)

Now it's a matter of continuity. As h moves to 0, all the terms, other than Q, move to 0. To be rigorous, bound all the entries of Q below b for some positive norm b. Then each entry of Q² is below nb², and each entry in Q³ is below n²b³, and so on. Absolute convergence is no problem, bounded below E^nb. Multiply through by h, which is bigger than any higher powers of h for small h. Then let h go to 0 and the series goes to 0. That leaves Q, which goes to M as h approaches 0. Therefore the derivative of E^U is ME^U, or U′E^U.

It is easy to run the same proof with the terms in a different order, so that the derivative becomes E^UU′.

Now return to our differential equation. This time consider y′ = yV, where V is a function of t. A proof, similar to the one shown above, characterizes the solutions as cE^U, where c is a matrix of constants, and U′ = V. Similarly, the solutions to y′ = Vy are E^Uc.

This E^U result is bizarre! And beautiful! And Useful! Useful, because it is essential for the proof of the fundamental theorem of ordinary differential equations. If the n^th derivative of y is a linear combination of lesser derivatives of y, where functions of x act as coefficients, then there is indeed a unique solution. The connecction to E^U is not obvious at all, until you have seen the proof, then it all kinda makes sense.