Tips when running ETL

1. Set all non-numerical columns to json format. (different data type; potential space/blank issue for splits etc)

2. Test with the mini case, but use as much assert and try-except as possible to capture as much corner cases as possible.

3. Also check if the record number is consistent.

Tips in MySQL

Set all non-numerical columns to json format.

Global variables in Spark

A corner case that global var doesn't work in Spark.

PDB generator

A script to generate a valid PDB file from the coordinates.

Pydicom & Dicom

Some difference between pydicom and dicom.


Copyright © 2016 - 2019, Long Wang. All rights reserved.