Brief Walkthrough of Famous ImageNet Contenders

Image nets are often refer to neural networks that takes in one image (usually RGB image) and are supposed to output the class of the object shown in the image. There are a lot of famous and published image nets. They were pre-trained on slightly different datasets, developed by different teams in different time, but all widely used in not only object classification, but also many other applications. This article will go through several famous image neural networks (AlexNet, VGG, ResNet, InceptionNet, EfficientNet)....

July 24, 2021 · 8 min · Yiheng "Terry" Li

Using HDF5 format for python file saving and loading

What’s the advantages of using HDF5 for file saving and loading? I wrote something about pickle or JSON before, which are python packages for serialization. More specifically, pickle is a binary serialization format for python objects, saving objects to an unreadable file, can be loaded inside the same machine and is not sharable with other programming languages. And JSON is a text serialization which saves basically python dictionaries, text, list like object in a readable format....

April 21, 2021 · 4 min · Yiheng "Terry" Li

LSTM Walk Through

Thanks to nice illustrative pictures of LSTMs and RNNs by colah’s blog. Recurrent neural networks (RNNs) use the same set of parameters to deal with inputs that are sequential. Inputs are usually broke into pars of same lengths, and fed into RNNs sequentially. In this way, the model learned and preserve the information from sequences of arbitrary lengths. This trait becomes very useful in natural language use cases where a model that is capable of dealing with sentence of any length is needed....

February 6, 2021 · 5 min · Yiheng "Terry" Li

Notes About the Logics Behind the Development of Tree-Based Models

Tree-based methods contains a lot of tricks that are easily tested in data/machine learning related interviews, but very often mixed up. Go through these tricks while knowing the reasons behind could be very helpful in understanding + memorization. Overview of Tree-based Methods Overall speaking, simple decision/regression trees are for better interpretation (as they can be visualized), with some loss of performance (when compared to regression with regularization and non-linear regression methods, e....

December 8, 2020 · 6 min · Yiheng "Terry" Li

Pyradiomics Simple Usage

Pyradiomics is an open-source python package for the extraction of radiomics data from medical images. Image loading and preprocessing (e.g. resampling and cropping) are first done using SimpleITK. Loaded data is then converted into numpy arrays for further calculation using multiple feature classes. Optional filters are also built-in. Ways to Deal with Medical Image Data The reasons that we choose pyradiomics could be its openness, widely recognized and reasonably good performance with a variety of features available....

September 11, 2020 · 5 min · Yiheng "Terry" Li

Further Discussion of Relative Importance

In this article, two more methods will be discussed that takes not only linear correlation of a single predictor variable with the dependent variable, but also considers the intertwined effects. These two methods are Commonality Analysis (CA) and Dominance Analysis (DA). They share some similarities yet differs in focuses of analyzing relative importance. Commonality Analysis Idea Commonality Analysis is a statistical technique within multiple linear regression that decomposes a model’s R2 statistic (i....

August 10, 2020 · 10 min · Yiheng "Terry" Li

Multivariate Linear Regression -- Collinearity and Feature Importance

This article will discuss some problems in implementing multivariate linear regression. These problems are: What to do with the collinearity phenomenon in linear regression? How to extract feature importance of a linear regression, is it just the coefficient of the model? Collinearity What is collinearity Collinearity is a condition in which some of the independent variables are highly correlated. Collinearity will occur more often when the number of predictors is very large, as we can imagine that there is a higher chance that some predictor are correlated with others....

July 15, 2020 · 12 min · Yiheng "Terry" Li