Let a nonsingular matrix P convert M to jordan canonical form. Apply P to the power series that defines f(M). The original series converges iff the new series converges. If the original sum was l, the new sum is Pl/P.
Concentrate on a simple jordan block with eigen value z. Remember that z is inside our circle of convergence. The main diagonal becomes f(z), which is convergent. If there are ones below the main diagonal, apply the binomial theorem. The subdiagonal of the nth term is annzn-1. Add these up and the subdiagonal becomes f′(z). Since f is analytic, the derivative exists, hence the subdiagonal converges. The next diagonal converges to f′′(z)/2!, and so on. Put this all together and f(M) converges.
How do we make the terms correspond? Consider the nth diagonal on the right. This includes An/n! times 1, and 1 times Bn/n!, and everything in between. Multiply this by n!, for our convenience. The result is the sum of (n:i)AiBn-i, which is simply (A+B)n. Divide by n! and get (A+B)n/n!. This is the nth term of EA+B. The terms correspond, and EA+B = EAEB.
Let's consider a simple jordan block. If the eigen value is l, the diagonal becomes El, and the kth subdiagonal converges to the kth derivative over k!, which is El/k!.
Use P to convert M to jordan form, then multiply by s. Concentrate on a jordan block with eigen value z. Multiply by s and evaluate, and the diagonal becomes Esz.
Move to the subdiagonal. Remember that the diagonal of the jordan block has become sz, and the subdiagonal has become s, instead of 1. Now the subdiagonal on the nth term becomes sn(sz)n-1/n!. add these terms together, factor out s, and get s times the derivative of the exponential, evaluated at sz. This is sEsz. The kth subdiagonal becomes skEsz/k!.
Use the same procedure to find EtM, and consider EsM times EtM. Base change commutes with product, so apply the transformation that turns M into jordan form. Look at the product of two corresponding blocks. The diagonals are, respectively, Esz and Etz. The diagonal of the product is EszEtz, or E(s+t)z.
Bravely move on to the subdiagonal. The subdiagonal of the product is the main diagonal of the first block times the subdiagonal of the second, plus the subdiagonal of the first times the main diagonal of the second. This simplifies to (s+t)E(s+t)z. This happens to agree with the subdiagonal that results from E(s+t)M.
You can probably see where this is going, but let's check the next diagonal down. The common factor E(s+t)z is multiplied by s2/2+st+t2/2, or (s+t)2/2. This agrees with the evaluation of E(s+t)M.
You'll need some combinatorics to verify the kth subdiagonal. Pull out the common factor E(s+t)z, and look at the remaining sum. When multiplied by k!, you have the sum of (k:i)sk-iti. This is (s+t)k. Divide by k! and the expression becomes (s+t)kE(s+t)z/k!, which agrees with E(s+t)M. In conclusion, EsMEtM = E(s+t)M.
Of course we didn't have to work this hard. Since sM and tM commute, we can invoke the earlier theorem, whence EsM+tM = EsMEtM.
As a corollary, set s = 1 and t = -1, and EM is invertible, with inverse E-M.
Since M commutes with Mn, M commutes with EMt. Therefore the derivative of EMt can be expressed as mEMt, or EMtM.
The function y(t) = EMt is a solution, as shown above. Furthermore, if c is a constant matrix, cEMt is a solution. If matrices are n×n, then c provides n2 linearly independent solutions. This makes sense, since there are n2 individual functions of t inside the matrix function y(t). To complete the characterization, we need to show there are no other solutions.
Suppose there is some other solution z(t). Since EMt is everywhere nonzero, divide z by EMt and call the quotient q. In other words, our solution is q(t)EMt. Use the product rule to differentiate.
q′EMt + qEMtM = qEMtM
q′EMt = 0
Q′ = 0 (multiplying both sides by E-Mt)
q = c (for a constant matrix c)
All solutions have been identified, and the solutions form a vector space of dimension n2, as dictated by the constant matrix c.
The analogous differential equation y′ = My has solutions EMtc, where the constant matrix c is multiplied on the right.
Let U(t) be a matrix of functions in t, where each function is differentiable, or analytic if t is complex. Consider the derivative of EU(t), at a particular time t, which I will call t = 0 for convenience. Let U′(0) = M, a matrix of constants.
Start with the difference quotient: (EU(h)-EU(0))/h. Focus on the first exponent U(h). Replace this with h×(U(h)-U(0))/h + U(0). The first term becomes h times something arbitrarily close to M, say h times Q, where Q is M±ε. Now the difference quotient looks like this.
{[EhQ × EU(0)] - EU(0)} / h
Replace the first exponential with its power series, and pull 1 out. This yields 1 times EU(0), which cancels -EU(0) in the numerator. That leaves us free to divide through by h.
{Q + hQ2/2 + h2Q3/6 + …} × EU(0)
Now it's a matter of continuity. As h moves to 0, all the terms, other than Q, move to 0. To be rigorous, bound all the entries of Q below b for some positive norm b. Then each entry of Q2 is below nb2, and each entry in Q3 is below n2b3, and so on. Absolute convergence is no problem, bounded below Enb. Multiply through by h, which is bigger than any higher powers of h for small h. Then let h go to 0 and the series goes to 0. That leaves Q, which goes to M as h approaches 0. Therefore the derivative of EU is MEU, or U′EU.
It is easy to run the same proof with the terms in a different order, so that the derivative becomes EUU′.
Now return to our differential equation. This time consider y′ = yV, where V is a function of t. A proof, similar to the one shown above, characterizes the solutions as cEU, where c is a matrix of constants, and U′ = V. Similarly, the solutions to y′ = Vy are EUc.
This EU result is bizarre! And beautiful! And Useful! Useful, because it is essential for the proof of the fundamental theorem of ordinary differential equations. If the nth derivative of y is a linear combination of lesser derivatives of y, where functions of x act as coefficients, then there is indeed a unique solution. The connecction to EU is not obvious at all, until you have seen the proof, then it all kinda makes sense.