Probabilistic Interpretation of Sum of Square Loss Function
Square Loss Function (in Linear Regression) For linear regression, the way that we used to find the optimal parameters $\overrightarrow \theta$ is called gradient descent, which we seek for $\overrightarrow \theta$ that minimize the loss function: $$ \mathcal{J}(\theta) = \frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2 $$ That is: $$ \hat \theta = \underset{\theta}{\mathrm{argmin}}[\frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2] $$ Interpret the Loss Function as MLE In linear regression, we assume the model to be: $$ \overrightarrow y = \theta^T x^{(i)} + \epsilon^{(i)} $$ where $\epsilon$ is called the error term which conposes of unmodelled factors and random noise....