Survival Analysis -- the Basics

What is Survival Analysis and When to Use It? Survival analysis can be generalized as time-to-event analysis, which is analyzing the time to certain event (e.g. disease, death, or some positive events, etc.). Survival analysis gives us the estimation of, for example: the time it takes before some events occur; the probability of having experienced certain event at certain time point; or which factors have what effect on the time to certain event....

June 14, 2020 · 8 min · Yiheng "Terry" Li

EM Algorithm Notes

EM And GMM Expectation-maximization(EM) algorithm, provides a way to do MLE or MAE when there is incomplete data or unobserved latent variables. NOTE: EM is general way of getting MLE or MAE estimations, not necessarily for clustering. Gaussian mixture model(GMM) is a statistical model that can serve as a clustering algorithm. It assumes the data points to be from several gaussian distributions and uses EM algorithm to obtain the MLE estimations of those gaussians....

June 1, 2020 · 5 min · Yiheng "Terry" Li

Kernel Method Note

Motivation of Kernel Method In classifications, it is often the case that we want to obtain a non-linear decision boundary. For example, for this problem (figure 2), we want a desicion boundary that is somewhat like a circle, however, our model only yields linear boundaries. In order to let our model to have more flexibility without changing the basic algorithms, we can apply a transformation onto the feature space $X$. like the figure on the right....

May 12, 2020 · 7 min · Yiheng "Terry" Li

Generative Models -- Gaussian Discriminant Analysis

Brief Introduction to Generative Models To talk about generative models (v.s. discriminative models), we can first learn from this story: A father has two kids, Kid A and Kid B. Kid A has a special character whereas he can learn everything in depth. Kid B have a special character whereas he can only learn the differences between what he saw. One fine day, The father takes two of his kids (Kid A and Kid B) to a zoo....

April 24, 2020 · 10 min · Yiheng "Terry" Li

Logistic Regression Updated with Newton's Method

Logistic regression is a very important binary classification algorithm, in this article, some essential details inside the algorithm will be discussed. Plain language will be used to discuss the most detail aspects so that beginners of machine learning can easily get the idea. Assumptions of Logistic Regression Logistic regression does not require as many assumptions as linear regression. There are a few that are interested and we will shortly discussed about....

April 20, 2020 · 5 min · Yiheng "Terry" Li

Store Almost Any Objects of Python in Files

The module pickle implements binary protocols for serializing and de-serializing a Python object structure. We can store almost any type of object of Python using pickle. Quick Example For example, we want to save a dictionary dict_obj to file. # Save file def saveFile(obj): out_file = open('obj.pickle','wb') pickle.dump(obj, out_file) out_file.close() # Read file def readFile(obj_file_name): file = open(obj_file_name, 'rb') obj = pickle.load(file) return obj >>> dict_obj = {'itemA': ['item', 'A'], 'itemB':[1, 3]} >>> saveFile(dict_obj) >>> obj_file_name = 'obj....

April 15, 2020 · 2 min · Yiheng "Terry" Li

Probabilistic Interpretation of Sum of Square Loss Function

Square Loss Function (in Linear Regression) For linear regression, the way that we used to find the optimal parameters $\overrightarrow \theta$ is called gradient descent, which we seek for $\overrightarrow \theta$ that minimize the loss function: $$ \mathcal{J}(\theta) = \frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2 $$ That is: $$ \hat \theta = \underset{\theta}{\mathrm{argmin}}[\frac{1}{2} \sum_{i=1}^{n}(y^{(i)} - \theta^T x^{(i)})^2] $$ Interpret the Loss Function as MLE In linear regression, we assume the model to be: $$ \overrightarrow y = \theta^T x^{(i)} + \epsilon^{(i)} $$ where $\epsilon$ is called the error term which conposes of unmodelled factors and random noise....

April 13, 2020 · 2 min · Yiheng "Terry" Li