Tips when running ETL
1. Set all non-numerical columns to json format. (different data type; potential space/blank issue for splits etc)
2. Test with the mini case, but use as much assert and try-except as possible to capture as much corner cases as possible.
3. Also check if the record number is consistent.
Tips in MySQL
Set all non-numerical columns to json format.
Global variables in Spark
A corner case that global var doesn't work in Spark.
PDB generator
A script to generate a valid PDB file from the coordinates.
Pydicom & Dicom
Some difference between pydicom and dicom.