This is a log of my pledge to code or study machine learning for a minimum of 1 hour every day for the next 100 days.

Based on Siraj Raval’s 100 Days of ML Code Challenge.

### Day 1: Nov 19, 2018

**Progress:** Continue reading about assessing model accuracy in *An Introduction to Statistical Learning*.**Thoughts:** Good model performance requires a method with low variance and low squared bias. There is a balance to be maintained because it is easy to have low bias, but high variance -or- low variance, but high bias. The challenge is finding a method with low variance *and* low bias.

### Day 2: Nov 20, 2018

**Progress:** Working on code for Kaggle competition, *Google Analytics Customer Revenue Prediction*.**Thoughts:** The competition had an update to the data sets. The training set alone is now 24 GB in size. I will alter the data processing to accommodate the large file size. My computer can’t

### Day 3: Nov 21, 2018

**Progress:** Watched Marios Michailidis: How to become a Kaggle #1: An introduction to model stacking**Thoughts:** The discussion was on advanced methods for creating winning models for Kaggle – deliberately creating 100s of models that *feed* additional models. With an emphasis on predictability, I do wonder what is the model interpretation tradeoff with the ensemble method?

### Day 4: Nov 22, 2018

**Progress:** Perform data analysis on a survey dataset that asks people their favorite Thanksgiving dishes.**Thoughts:** I find data formatting and writing the analysis to be the most time-consuming parts of the data workflow. Formatting takes time to think about how you are going to use the data and if the data needs to transform multiple times (eg. EDA versus Modeling). Data analysis takes time because you have to continuously relate what you are saying back to the objective and also maintain clear and easy to understand language for the reader.**Codebase:** *Coming soon*

### Day 5: Nov 23, 2018

**Progress:** Completed data analysis on a survey dataset that asks people their favorite Thanksgiving dishes.**Thoughts:** I took extra effort with communicating the data analysis results with graphics. I think data storytelling is important to engage the reader.**Codebase:** *Coming soon*

### Day 6: Nov 24, 2018

**Progress**: Working on code for Kaggle competition, Google Analytics Customer Revenue Prediction.

### Day 7: Nov 25, 2018

**Progress**: Working on code for Kaggle competition, Google Analytics Customer Revenue Prediction.

### Day 8: Nov 26, 2018

**Progress**: Working on code for Kaggle competition, Google Analytics Customer Revenue Prediction.**Thoughts:** I have decided to stop working on this Kaggle competition. There was an issue with people using public data to assist with prediction. A new dataset published, but unfortunately, there are unresolved issues with the data set. One of the biggest issues with the new data set is two columns with almost identical Target variable names.

### Day 9: Nov 27, 2018

**Progress**: Linear algebra for machine learning refresher.**Thoughts:** Linear algebra is the study of vectors, vector spaces and mapping between vector spaces.

### Day 10: Nov 28, 2018

**Progress**: Linear algebra for machine learning refresher.**Thoughts:** Vector addition is associative. If you are adding multiple vectors, the ordering does not matter.

### Day 11: Nov 28, 2018

**Progress**: Linear algebra for machine learning refresher.

### Day 12: Nov 30, 2018

**Progress**: Reading about the K-means clustering algorithm.**Thoughts:** The k-means clustering algorithm works by finding like groups based on Euclidean distance, a measure of distance or similarity.