Further Discussion of Relative Importance

In this article, two more methods will be discussed that takes not only linear correlation of a single predictor variable with the dependent variable, but also considers the intertwined effects. These two methods are Commonality Analysis (CA) and Dominance Analysis (DA). They share some similarities yet differs in focuses of analyzing relative importance. Commonality Analysis Idea Commonality Analysis is a statistical technique within multiple linear regression that decomposes a model’s R2 statistic (i....

August 10, 2020 · 10 min · Yiheng "Terry" Li

Multivariate Linear Regression -- Collinearity and Feature Importance

This article will discuss some problems in implementing multivariate linear regression. These problems are: What to do with the collinearity phenomenon in linear regression? How to extract feature importance of a linear regression, is it just the coefficient of the model? Collinearity What is collinearity Collinearity is a condition in which some of the independent variables are highly correlated. Collinearity will occur more often when the number of predictors is very large, as we can imagine that there is a higher chance that some predictor are correlated with others....

July 15, 2020 · 12 min · Yiheng "Terry" Li

CS229 Problem Sets Spring 2020

CS229 Machine Learning at Stanford has been an inspiring course that built the basics of my machine learning knowledge base. I would like to record my answers to all the problem sets in Spring 2020 quarter. My answers have two parts: Theory questions: answers were hand written, presented in pdf format below. Includes the questions in problem sets; Coding questions: written in python and will be included in my GitHub repo....

June 28, 2020 · 1 min · Yiheng "Terry" Li

Survival Analysis Case Study Using R

From this article, we will use R language to perform survival analysis to a data set, in order to demonstrate some syntax and show the procedural of survival analysis using R. The Data A very classic data set is used in the purpose of demonstration. Here is a glance of the data set. > library(survival) > data(lung) > head(lung) inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss 1 3 306 2 74 1 1 90 100 1175 NA 2 3 455 2 68 1 0 90 90 1225 15 3 3 1010 1 56 1 0 90 90 NA 15 4 5 210 2 57 1 1 90 60 1150 11 5 1 883 2 60 1 0 100 90 NA 0 6 12 1022 1 74 1 1 50 80 513 0 This data set is about “survival in patients with advanced lung cancer from the North Central Cancer Treatment Group....

June 16, 2020 · 5 min · Yiheng "Terry" Li

Survival Analysis -- the Basics

What is Survival Analysis and When to Use It? Survival analysis can be generalized as time-to-event analysis, which is analyzing the time to certain event (e.g. disease, death, or some positive events, etc.). Survival analysis gives us the estimation of, for example: the time it takes before some events occur; the probability of having experienced certain event at certain time point; or which factors have what effect on the time to certain event....

June 14, 2020 · 8 min · Yiheng "Terry" Li

EM Algorithm Notes

EM And GMM Expectation-maximization(EM) algorithm, provides a way to do MLE or MAE when there is incomplete data or unobserved latent variables. NOTE: EM is general way of getting MLE or MAE estimations, not necessarily for clustering. Gaussian mixture model(GMM) is a statistical model that can serve as a clustering algorithm. It assumes the data points to be from several gaussian distributions and uses EM algorithm to obtain the MLE estimations of those gaussians....

June 1, 2020 · 5 min · Yiheng "Terry" Li

Kernel Method Note

Motivation of Kernel Method In classifications, it is often the case that we want to obtain a non-linear decision boundary. For example, for this problem (figure 2), we want a desicion boundary that is somewhat like a circle, however, our model only yields linear boundaries. In order to let our model to have more flexibility without changing the basic algorithms, we can apply a transformation onto the feature space $X$. like the figure on the right....

May 12, 2020 · 7 min · Yiheng "Terry" Li