Machine Learning A-Z™: Hands-On Python & R In Data Science Part 1

September 30, 2018

Hi Data Friends,

 

A few people have been raving about the “Machine Learning A-Z™: Hands-On Python & R In Data Science” available on Udemy:

 

https://www.udemy.com/machinelearning/

 

I thought I’d check it out over the weekend, but it is a massive course. I mean we are talking 41 hours of video, so at the moment I am only 51% through it. I’ll continue to review if I have some time next weekend.

 

Here’s what I have covered so far:

 

  • Part 1 Data Preprocessing

    • Handling missing data

    • Categorical data

    • Splitting into train and test datasets

    • Feature scaling

 

  • Part 2 Regression

    • Simple linear regression

    • Multiple linear regression

    • Polynomial regression

    • Support Vector Regression

    • Decision Tree Regression

    • Random Forest Regression

    • Evaluating Regression Model Performance

 

  • Part 3 Classification

    • Logistic Regression

    • K-Nearest Neighbors (KNN)

    • Support Vector Machine

    • Kernel SVM

    • Naive Bayes

    • Decision Tree Classification

    • Random Forest Classification

    • Evaluating Classification Model Performance

 

  • Part 10 Model Selection

    • Model Selection

 

Firstly, how am I able to get through 20 hours of videos on the weekend?

 

Simple, I get up early and do a few hours before everyone else gets up. You can insert the following javascript snippet into the console window of your browser to speed up the videos:

 

document.getElementsByTagName("video")[0].playbackRate = x

 

My x is usually 3, but sometimes I go up to 4 if I am familiar with a concept. So let’s say I’m averaging about 3.2x then 20 hours of video takes about over 6 hours and 15 mins.

 

 

 

Should you buy this course?

 

So far I think it is insanely good value and yes you should definitely buy it.

 

I don’t get any commissions, or kickbacks or anything at all from recommending it to you, I just think it is great.

 

 

Pros

 

  • It has almost a “cheat sheet” approach to covering all the algorithms you’d need quickly

 

  • The course is for the practitioner, so it takes a no nonsense approach to getting the algorithms working quickly. There’s no heavy math, it aims to be practical

 

  • The videos offering explanations and intuition are short, well illustrated and to the point.

 

  • They plot decision boundaries of the algorithms they implement. This is a great idea to see what the algorithms are doing, how they are different and how issues like over-fitting come into play.

 

  • Python and R are covered in the course.

 

  • You could apply what you learn to work problems.

 

 

 

Suggestions

 

I thought the course has been great, but here are a few little things that I made a note of.

 

  • For me some of the code practices leave a bit to be desired. I imagine teaching good code practices as well as the algorithms might have been a bit too much. For instance rather than copy/ pasting the same code for training and test datasets in practice you’d write a function and then pass the train and test datasets as arguments to that function.

 

https://google.github.io/styleguide/Rguide.xml#functiondefinition

 

http://columbia-applied-data-science.github.io/pages/lowclass-python-style-guide.html

 

  • It would have been great to have a few words about documentation such as the use of R Markdown or Jupyter notebooks for documentation. Maybe not even a video, just a link or two might be good.

 

https://rmarkdown.rstudio.com/lesson-15.html

 

https://www.datacamp.com/community/blog/jupyter-notebook-cheat-sheet

 

  • There is a section on optimization of model hyperparameters at the end of the course in Section 10, I’d really like to see this moved upwards to just past the regression section. It would be really sad if someone abandoned the course without learning about cross validation and model tuning.

 

  • It would be great if there were more exercises, or different datasets to work on. I’m not a fan of “follow-along” coding as a way to learn. I aim to get the concepts as quickly as I can and then try to work on exercises or new problems.

 

  • Preprocessing of datasets is such a fundamental part of data science, but doing the course you get the feeling this is a non-event. Whereas in practice it can take a great deal of time, expertise and thought to wrangle together a dataset for model development.

 

http://caret.r-forge.r-project.org/

 

https://pandas.pydata.org/pandas-docs/stable/

 

  • Notably missing were sections on the ridge and lasso regression techniques, they were mentioned but not really covered. I understand that there is only so much you can pack into one course. If you are curious head here

 

https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

 

  • If the section on model selection appeared earlier perhaps an exercise would be to tune each model with caret in R or GridSearchCV in Python.

 

  • Exploratory data analysis is such a crucial step in the data science pipeline, there was unfortunately very little in the way of EDA in the course:

 

 

http://r4ds.had.co.nz/exploratory-data-analysis.html

 

 

 

https://www.datacamp.com/community/tutorials/exploratory-data-analysis-python

 

 

 

Stay tuned for Part 2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Please reload

Recent Posts

Please reload

Archive

Please reload

Tags

  • Black Facebook Icon

©2018 BY DATAFRIENDS.