"Normal Equation" method is another way of minimizing J except of Gradient descent method. In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration:
There is no need to do feature scaling with the normal equation.
Tip1:With the normal equation, computing the inversion has complexity O(n3).
So if we have a very large number of features, the normal equation will be slow.
In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.
Tip2:If XTX is noninvertible, the common causes might be having :
- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).
Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.