Recent Posts



Machine Learning A-Z™: Hands-On Python & R In Data Science Part 2

Hi Data Friends!

So, I was able to finish the second half of the course, missed a week when my folks visited and we went on a holiday.

I was at 51% before, but powered through the rest on the weekend.

What I covered

  • Part 4 Clustering

  • K-Means Clustering

  • Hierarchical Clustering

  • Part 5 Association Rule Learning

  • Apriori

  • Eclat

  • Part 6 Reinforcement Learning

  • Upper Confidence Bound (UCB)

  • Thompson Sampling

  • Part 7 Natural Language Processing

  • Part 8 Deep Learning

  • Artificial Neural Networks

  • Convolutional Neural Networks

  • Part 9 Dimensionality Reduction

  • Principal Component Analysis

  • Linear Discriminant Analysis

  • Kernel PCA

  • Part 10 Model Selection and Boosting

  • XGBoost

What did I get out of it

So, as a data scientist who has been doing this stuff for a while was it worth doing the course? Absolutely yes! I mean when you are working for a company you tend to focus in on your problem domain and become an expert in a niche area. For me a few years ago I doubt anyone would have spent more time building credit risk models for loan decisions using bank statement transactional data as features. Natural language processing of bank statement data was my world, which is fine, but it is a very niche focus.

This course has been brilliant to take a helicopter view at the different techniques available to data scientists, there are some that I have never encountered before, and some that I have completely forgot about. So, I feel that I once again have a set of tools at my fingertips to work on real problems.

Not wanting to chant here, or get philosophical but the whole unlearn, relearn etc comes to mind. So, although this is a beginners course I think even people who have been in the industry for a while can benefit. I mean if all you are doing is building regression models (and there is nothing wrong with that, plenty of people make careers out of regression) you might want to look at the course as a quick survey of other techniques.

What were the highlights

  • I have briefly worked in online advertising, I have encountered the dreaded slow start problem with advertisements where you really don’t know which ad to show the customer when there is insufficient history of the ad. The Upper Confidence Bound/ Thompson Sampling approach would have saved me a lot of pain back in the day. I think we eventually settled on some kind of t-test to work out if an advertisement had been seen by a sufficient audience.

  • The intuitive explanation of CNNs, well especially the steps was great. Again I wish I had seen these videos before I jumped into the Chollet book: 1) convolution 2) pooling 3) flattening 4) full connection. The explanations of the techniques before the practical videos were all great.

  • The final practical lectures on XGBoost was brilliant, it brought everything together in one place. The technique, cross validation, tuning parameters as an exercise. Fantastic stuff!

Should you buy the course?

For the price point the course is at I think the value it gives you is just nuts. I’m keen to take more of a look at some of the other courses these guys have. I had flicked through it as a reference before, but taking the time to go end to end with it sure has been fun.

I would recommend beginners go through something like this course to get across the techniques. I wouldn’t think diving into the math from day 1 to be a great idea for most people. The prerequisite some people have mentioned is insufficient Python / R experience. So maybe take an introduction to Python or R course and then hit up this course. It would pay to know how to work with either Pandas DataFrames or R’s data.frame.

A couple of other courses I have are Deep Learning A-Z and Artificial Intelligence A-Z. I’ll take a good look at them soon.