Using HDF5 format for python file saving and loading

What’s the advantages of using HDF5 for file saving and loading? I wrote something about pickle or JSON before, which are python packages for serialization. More specifically, pickle is a binary serialization format for python objects, saving objects to an unreadable file, can be loaded inside the same machine and is not sharable with other programming languages. And JSON is a text serialization which saves basically python dictionaries, text, list like object in a readable format....

April 21, 2021 · 4 min · Yiheng "Terry" Li

Pyradiomics Simple Usage

Pyradiomics is an open-source python package for the extraction of radiomics data from medical images. Image loading and preprocessing (e.g. resampling and cropping) are first done using SimpleITK. Loaded data is then converted into numpy arrays for further calculation using multiple feature classes. Optional filters are also built-in. Ways to Deal with Medical Image Data The reasons that we choose pyradiomics could be its openness, widely recognized and reasonably good performance with a variety of features available....

September 11, 2020 · 5 min · Yiheng "Terry" Li

Store Almost Any Objects of Python in Files

The module pickle implements binary protocols for serializing and de-serializing a Python object structure. We can store almost any type of object of Python using pickle. Quick Example For example, we want to save a dictionary dict_obj to file. # Save file def saveFile(obj): out_file = open('obj.pickle','wb') pickle.dump(obj, out_file) out_file.close() # Read file def readFile(obj_file_name): file = open(obj_file_name, 'rb') obj = pickle.load(file) return obj >>> dict_obj = {'itemA': ['item', 'A'], 'itemB':[1, 3]} >>> saveFile(dict_obj) >>> obj_file_name = 'obj....

April 15, 2020 · 2 min · Yiheng "Terry" Li

SMOTE with Python

Motivation Working on classification problem, especially in medical data practice, we are often faced with problem that there are imbalanced cases in each categories. While it might be OK for other machine learning model builders to overlook this, it is essential that we pay attention to imbalanced data problem in medical applications, because in most scenarios, prediction and accuracy of the minority categories is far more important than the most common classes....

April 11, 2020 · 3 min · Yiheng "Terry" Li