Notice that each tterm, each product of 3 matrix elements, represents each row once and each column once. We don't go straight down or straight across, we multiply along diagonal lines. For the 3×3 case, there are 6 ways to do this, hence 6 terms in the formula. Choose any entry from the first row and mark that column as "used". Then choose an enttry from the second row, avoiding the used column. Mark that column used, and in the third row, there is only one entry available. That's 3×2×1 or 6 combinations.
If you write down the numbers of the columns, in the order they are "used" by the rows, you'll find a permutation on the numbers 1 through n. Each product is associated with one of these permutations. The product is negated when the permutation is odd. At least that's how it looks in the 3×3 case.
If we derive a formula for the determinant of a 4×4 matrix, it will have 24 terms, each term a product of 4 entries according to a permutation on 4 columns. Half the terms are negated, according to the parity of the permutations. A 5×5 matrix gives a formula with 120 terms, and so on. This quickly becomes impractical, but let's prove it anyways.
Let det2(M) be the sum of all permutation products, where a product is negated when its associated permutation is odd. This definition agrees with det1 for n = 1, 2, and 3. Proceed by induction on n.
Given an n×n matrix, delete the first row and jth column, and write the subdeterminant as a sum of permutation products. For each permutation, add 1 to each column number that is j or larger. We are pushing them over to make room for column j. Now place j at the beginning of these permutations. After all, det1 says to multiply the subdeterminant by the entry in row 1 column j. Have we changed the parity of the permutation by putting j in front? Only if an odd number of transpositions are needed to move j from its home in position j, to the front in position 1. This happens only when j is even. When j is 4, swap it with 3 2 and 1 to move it to the front, giving 3 transpositions. The remaining numbers, 1 through j-1 and j+1 through n, are permuted as before, with their preexisting parity. As if in confirmation, det1 tells us to negate the product precisely when j is even. The application of det1(M) produces det2(M). The two formulas are equivalent.