The approach is straightforward. Select an x at random, and raise it to various small powers, such as the first thousand primes. You'll want to repeat some of the smaller exponents. After all, x could be an 8th root of 1 mod p, whence x should be squared 3 times. A good rule of thumb is to raise x to the u, where u is the lcm of the first b integers, where b depends on your patience, and the computer resources you have at hand.
Every now ant then, stop and run gcd(xu-1,n), to see if you've reached 1 mod p. If you reach 1 mod n, then all quotients have become 1 simultaneously, and you need to back up and run gcd on the previous result. If all quotients become 1 at the same time, select a different x and start again.
If qk is the largest prime factor of p-1, you're not likely to stumble upon a qth root of 1 by accident, so b has to equal qk.
As a first approximation, the running time is proportional to the minimum over all p dividing n, of the maximum of all q dividing p-1. Of course we usually measure performance against the size of the problem, which is the number of digits of n, or the log of n, so as you might expect, this algorithm is exponential in time.
Of course we don't know p in advance, and in particular, we don't know that p = 3 mod 4. So adjoin the square root of d for various values of d. After two tries (on average), d will be a nonsquare mod p, and off you go.
If p2+p+1 is smooth, adjoin e, where e is the root of an irreducible cubic mod p. Everything else carries forward. However, this is an interesting mathematical exercise with little practical utility. If p-1 is not smooth, and p+1 is not smooth, higher degree expressions in p are not likely to be smooth either.
If p+1 is smooth, adjoin j, where j2 is a nonsquare mod p. This gives the finite field of order p2, with p2-1 units. If p+1 is smooth, raising x to small powers produces a p-1 root of 1. This belongs to a lesser subfield, i.e. the integers mod p. In other words, it is real, with no imaginary component. Take the gcd of the coefficient on j with n and find p. This is the p+1 method.
Extend Zp by an irreducible cubic, and there are p3-1 units. If p2+p+1 is smooth, we will once again find a p-1 root of 1. Use the coefficient on j or j2; either one will work, as they are both 0 mod p.
Create an extension of dimension 4 by adjoining j (where j2 is an integer), and then k, where k2 = j. This has p4-1 units, and if p2+1 is smooth, you wind up in the subfield of dimension 2, spanned by 1 and j. Use the coefficient on k to find p. This continues through higher extensions, hoping that fragments of the cyclotomic polynomials in p will be smooth. These fragments are at least quadratic, and p is already large, so these numbers are very unlikely to be smooth; and they're the only game in town. Better to use elliptic curve factoring, where the numbers (hopefuly smooth) are many to choose from, and close to p.
Must finite rings always behave this badly? I spent a considerable amount of time on the quaternions mod p, hoping a noncommutative example would be more promising, but once again we find orders of p, p-1, and p+1. Can't I go "off the board"? The answer seems to be no. Here's why.
Let R be a finite ring of characteristic p. Since multiplication by an integer is repeated addition, the integers commute with everything in R. Thus R is a ring extension of Zp. It is also a Zp vector space, and that means R, under addition, looks like so many copies of Zp running in parallel. Not very interesting.
The units of R form a group, with various subgroups; yet the size of any of these groups determines, and is determined by, the order of the individual elements of R. Let x be such an element and adjoin it to the integers, creating S, a finite commutative ring inside R. The units of S are a subgroup of the units of R.
Since the extension is finite, x is algebraic over its base field. Let f be the least polynomial that satisfies x. Remember that these polynomials form a pid. Split f uniquely into a product of polynomials, where each factor is an irreducible polynomial or a power of an irreducible polynomial. Then apply the chinese remainder theorem. S is now the direct product of smaller ring extensions. Its units are the direct product of the units in the component rings. If we can get a handle on one of these extensions, we will understand S, and since x is arbitrary, we understand all of R.
Adjoin y to Zp, where y satisfies the irreducible polynomial g(y). This is a finite field extension, and the multiplicative group has order pd-1, where d is the degree of g. This is territory that has already been explored. We're not likely to find anything new here.
Next let y satisfy the polynomial g(y)k. Mod out by g, and get the previous case. Thus a ring homomorphism has the aforementioned finite field as its image. The kernel is the ideal generated by g. This is basically g times the powers of y, up to gk. As an additive group, the coefficients on the powers of y form parallel cycles of length p. But we are more interested in the multiplicative group. Again, the image is a finite field, whose multiplicative group is cyclic of order pd-1. The kernel consists of polynomials that are 1 mod g. As a warm up, let k = 2. Multiply ag+1 times bg+1 and get 1 + (a+b)g + abg2. (Here a and b are polynomials.) Thus group multiplication carries a and b to a+b. The kernel is several p cycles running in parallel, according to the degree of g.
What about larger values of k? Write a polynomial in base g. If the constant term is 1, it lives in the kernel. Raise to the p, using the frobenius homomorphism. Since k is much less than p, all the g terms drop out, leaving 1. Everything has order p, and the kernel is d×(k-1) p cycles running in parallel. So everything is p, or 1 less than a power of p. This holds for elements, subgroups, quotient groups, and even the number of sylow subgroups, as this is equal to the index of the normalizer in the larger group.