Hi Data Friends,

A few people have been raving about the “Machine Learning A-Z™: Hands-On Python & R In Data Science” available on Udemy:

https://www.udemy.com/machinelearning/

I thought I’d check it out over the weekend, but it is a massive course. I mean we are talking 41 hours of video, so at the moment I am only 51% through it. I’ll continue to review if I have some time next weekend.

Here’s what I have covered so far:

Part 1 Data Preprocessing

Handling missing data

Categorical data

Splitting into train and test datasets

Feature scaling

Part 2 Regression

Simple linear regression

Multiple linear regression

Polynomial regression

Support Vector Regression

Decision Tree Regression

Random Forest Regression

Evaluating Regression Model Performance

Part 3 Classification

Logistic Regression

K-Nearest Neighbors (KNN)

Support Vector Machine

Kernel SVM

Naive Bayes

Decision Tree Classification

Random Forest Classification

Evaluating Classification Model Performance

Part 10 Model Selection

Model Selection

Firstly, how am I able to get through 20 hours of videos on the weekend?

Simple, I get up early and do a few hours before everyone else gets up. You can insert the following javascript snippet into the console window of your browser to speed up the videos:

document.getElementsByTagName("video")[0].playbackRate = x

My x is usually 3, but sometimes I go up to 4 if I am familiar with a concept. So let’s say I’m averaging about 3.2x then 20 hours of video takes about over 6 hours and 15 mins.

Should you buy this course?

So far I think it is insanely good value and yes you should definitely buy it.

I don’t get any commissions, or kickbacks or anything at all from recommending it to you, I just think it is great.

Pros

It has almost a “cheat sheet” approach to covering all the algorithms you’d need quickly

The course is for the practitioner, so it takes a no nonsense approach to getting the algorithms working quickly. There’s no heavy math, it aims to be practical

The videos offering explanations and intuition are short, well illustrated and to the point.

They plot decision boundaries of the algorithms they implement. This is a great idea to see what the algorithms are doing, how they are different and how issues like over-fitting come into play.

Python and R are covered in the course.

You could apply what you learn to work problems.

Suggestions

I thought the course has been great, but here are a few little things that I made a note of.

For me some of the code practices leave a bit to be desired. I imagine teaching good code practices as well as the algorithms might have been a bit too much. For instance rather than copy/ pasting the same code for training and test datasets in practice you’d write a function and then pass the train and test datasets as arguments to that function.

https://google.github.io/styleguide/Rguide.xml#functiondefinition

http://columbia-applied-data-science.github.io/pages/lowclass-python-style-guide.html

It would have been great to have a few words about documentation such as the use of R Markdown or Jupyter notebooks for documentation. Maybe not even a video, just a link or two might be good.

https://rmarkdown.rstudio.com/lesson-15.html

https://www.datacamp.com/community/blog/jupyter-notebook-cheat-sheet

There is a section on optimization of model hyperparameters at the end of the course in Section 10, I’d really like to see this moved upwards to just past the regression section. It would be really sad if someone abandoned the course without learning about cross validation and model tuning.

It would be great if there were more exercises, or different datasets to work on. I’m not a fan of “follow-along” coding as a way to learn. I aim to get the concepts as quickly as I can and then try to work on exercises or new problems.

Preprocessing of datasets is such a fundamental part of data science, but doing the course you get the feeling this is a non-event. Whereas in practice it can take a great deal of time, expertise and thought to wrangle together a dataset for model development.

http://caret.r-forge.r-project.org/

https://pandas.pydata.org/pandas-docs/stable/

Notably missing were sections on the ridge and lasso regression techniques, they were mentioned but not really covered. I understand that there is only so much you can pack into one course. If you are curious head here

https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

If the section on model selection appeared earlier perhaps an exercise would be to tune each model with caret in R or GridSearchCV in Python.

Exploratory data analysis is such a crucial step in the data science pipeline, there was unfortunately very little in the way of EDA in the course:

http://r4ds.had.co.nz/exploratory-data-analysis.html

https://www.datacamp.com/community/tutorials/exploratory-data-analysis-python

Stay tuned for Part 2.

Recent Posts

Archive

Tags