yes, my proposed algorithm will build the full p-1 entry table of inverses in O(p) arithmetic operations and using basically no memory. Methods like extended euclidean algorithm, continued fraction alg, (p-2)-power alg, and "brute force" trying all possible candidate-reciprocals, all take longer than linear time, ranging from p*log(p) to p^2 time, plus mine is easier (or not much harder) to program. I also saw advocated building tables of discrete logs then using them to compute the inverses, which achieves O(p) time again (with a larger constant) but only at the cost of using much more memory and more code complexity. For example, these bad approaches were advocated on the Numerical Recipes blog. However, if you do not want the full table, only want one, then yes, I agree with Gosper the continued fraction trick is very elegant, in fact more elegant than I had realized.