Probabilistic Interpretation of Sum of Square Loss Function

Square Loss Function (in Linear Regression) For linear regression, the way that we used to find the optimal parameters $\overrightarrow \theta$ is called gradient descent, which we seek for $\overrightarrow \theta$ that minimize the loss function: $$ \mathcal{J}(\theta) = \frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2 $$ That is: $$ \hat \theta = \underset{\theta}{\mathrm{argmin}}[\frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2] $$ Interpret the Loss Function as MLE In linear regression, we assume the model to be: $$ \overrightarrow y = \theta^T x^{(i)} + \epsilon^{(i)} $$ where $\epsilon$ is called the error term which conposes of unmodelled factors and random noise....

April 13, 2020 · 2 min · Yiheng "Terry" Li

Positive Semidefinite Matrix in Machine Learning

What is Positive Semidefinite (PSD) Matrix Definition Matrix $A \in \mathbb{R}^{n\times n}$ is positive semi-definite (PSD), denoted $A \succeq 0$, is defined as: $A = A^{T}$ ($A$ is symmetric) $x^{T}Ax \geq 0$ for all $x \in \mathbb{R}$ So from the definition, we can infer some properties of PSD matrix. Properties If $A \succeq 0 $ then $A$ is invertible and $A^{-1} \succeq 0$. If $A \succeq 0 $ , then $\forall Q \in \mathbb{R}^{n\times n}$, we have $Q^{T}AQ \succeq 0$....

April 12, 2020 · 3 min · Yiheng "Terry" Li

SMOTE with Python

Motivation Working on classification problem, especially in medical data practice, we are often faced with problem that there are imbalanced cases in each categories. While it might be OK for other machine learning model builders to overlook this, it is essential that we pay attention to imbalanced data problem in medical applications, because in most scenarios, prediction and accuracy of the minority categories is far more important than the most common classes....

April 11, 2020 · 3 min · Yiheng "Terry" Li