Recent Posts



Machine Learning: Don't Focus on the Icing On The Top

Many years ago I was consulting, building credit and risk models for banks and lenders in Australia. I remember we used to receive perfectly cut data sets ready for model development from the client. I was incredulous. It was written in the contract that the client would provide the dataset, a data dictionary and a performance/outcome variable for us to model.

"I don't understand, 90% of the work has already been done for us, why don't they just fit the model in-house?" I asked my boss one day.

"I literally have no idea at all" he told me.

Now many years older I see the same mistake happening with junior and aspiring data scientists. The reality is the bulk of the work is in:

  • understanding the business problem

  • specing the solutions

  • white-boarding different scenarios about how to cut the data, what will solve the problem, how you are best able to implement the solution.

  • cutting the data set and preparing the variables for model development

  • exploratory data analysis

  • prototype model development

  • and then the actual model development is just the icing on the cake.

I have seen PhD's and brilliant people who lacked a basic (and I mean BASIC) level of data manipulation skills. Don't be that guy/gal. Don't focus your study on the 'icing on the cake'. Learn how to bake the whole cake.

Once you give enough thought to the other components of the project all the rest follows. As one of my buddies says (speaking about full stack dev.)

" you need to front-load the thought in the spec/whiteboard stage before you write a line of code".