LU因数分解のアルゴリズム

アルゴリズムの詳細と証明を書きたいです。
In each iteration of the Doolittle algorithm, elements below the main diagonal in one of A's columns are eliminated. The result is an upper triangular matrix U, and the complexity is (2/3)n**3 (yet to be verified).
An atomic matrix is one in which all off-diagonal entries are zero except for those in a single column. In each step of the algorithm, the output of the previous iteration is multiplied by an atomic lower triangular matrix L.
A(n) = L(n)A(n-1),where A(0) = A
If A is an NxN matrix, the N-1st iteration will eliminate the N-2nd column's sub-diagonal entries. As the N-1st column has no entries below the diagonal, A(N-1) = U.
A(1) = L(1)A => A = L(1)**(-1)A(1)
. . .
A(N-1) = L(N-1)L(N-2)...L(1)A
A = L(1)**(-1)L(2)**(-1)...L(N-1)**(-1)A(N-1)
If we can show that L(1)**(-1)...L(N-1)**(-1) is lower triangular, where L(i)**(-1) is the inverse of L(i), we have found an LU factorization for A. In fact, since the product of lower triangular matrices is lower triangular, and the inverse of a lower triangular matrix is lower triangular, the above product is a lower triangular matrix. Proofs will be given later.
We now need to determine the L(i) value at each iteration of the algorithm, and we then need to obtain its inverse. We want L(i) to be an atomic matrix, so only one column will contain non-zero entries off its diagonal. As we will multiply it from the left side, in the column with non-zero entries, we need only have such entries below the diagonal, as we want the entries in rows below the diagonal of the product to be zero. Considering how to do this via Gaussian elimination, we could multiply the row of the matrix containing the diagonal for the column in question (nth row, if the nth column) by (-a(in)/a(nn)) and add it to the row in question (ith row here). This would zero out the element in the ith row and nth column. We can effect this by creating a matrix L with 1s on the diagonal and (-a(in)/a(nn)) in the ith row below the diagonal in the nth column, where n+1 is the iteration number. The diagonal element of L(1) will be multiplied by a(in) and the multiplier will be multiplied by a(nn). Thus the element a(in) in the n+1st iteration's output matrix will be 0. e.g.
A is:
1 2 3
4 5 6
7 8 9
Then L1 is
1 0 0

4 1 0
7 0 1

Proceeding in this way for each iteration, we can eliminate all elements below the diagonal and arrive at an upper triangular matrix A(N-1). It seems that the cost of matrix multiplication should be O(n**3). Thus, I am not sure why the complexity of the entire algorithm is not O(n**4), as the multiplication occurs O(n) times.
Thus, we obtain A(N-1)=L(N-1)L(N-2)...L(1)A
To obtain the L matrix in LU, we need L(1)**(-1)L(2)**(-1)...L(N-1)**(-1). Note that, if the product of any two lower triangular matrices is a lower triangular matrix, we can easily show by induction that the product of N-1 such matrices is lower triangular. Thus, as long as each matrix inverse above is lower triangular (which has been shown), we will have the lower triangular matrix L desired.
Each L**(-1) inverse can be obtained via one of several methods for inverting a matrix. A simple way is simply to solve LL**(-1) = I for the elements of L**(-1). Another method is Gauss-Jordan elimination, which follows the process:
[LI] => L**(-1)[LI]=>[IL**(-1)]
Once the individual L(i)**(-1) are obtained, we simply find their product and are finished.