Recent Posts



80/20 Data Science

Hi Data Friends,

As someone who has worked in the analytics field a long time it frustrates me that there are unrealistic expectations placed on new data scientists. Many people find it hard to get that first start because simply the expectations of the job are crazy!

To put it in perspective as a guy with 10+ years in the field, I probably wouldn’t meet the expectations of half the grad jobs I see – and I have some serious game!

So, let's get real here with credentials for new data scientists. By “getting real” I mean what is important right now in data science, or what is the everyday bread and butter data science now? Deep Learning will come, but it isn’t here yet. So, let’s just think about the 80/20 rule.

  • What are the 20% of skills that are going to get you the 80% of jobs?

  • What are the 80% of companies that would need these skills?

I know there are some amazing places that are doing computer vision and self-driving cars etc (20% or less), but to me the 80% of companies need this:

1. You know a bit about command line:

Something like this should be fine to get you started:

The Learn Enough Society – Learn Enough Command Line to be Dangerous

2. You know a bit about Git

Check this course out from Coursera, it should be good!

Coursera – Version Control with Git

3. You know a bit about SQL, because even in 2018 most of the data you will see is in a database and you'll access it via SQL:

Coursera - SQL for Data Science:

4. You can read data into R or Python

a) Reading data into R:

Datacamp – R Data Import Tutorial

b) Reading data into Python:

Linear Data Blog – How do you import data into Python?

5. You know how to manipulate data in either R or Python

a) Manipulate data in R:

Coursera - Getting and Cleaning Data:

b) Manipulate data in python:

Coursera – Introduction to Python for Data Analysis – Data Analysis with Python and Pandas

6. You can do some plotting in R or Python

a) Plotting in R:

Coursera – Exploratory Data Analysis

b) Plotting in Python:

Coursera – Applied Plotting, Charting & Data Representation in Python

7. You can fit basic models like GLMs in R or Python.

a) Modelling in R:

Coursera – Regression Models

b) Modelling in Python:

Coursera – Regression Modeling in Practice:

8. You can document your modelling results ideally using reproducible research ideas in either R or Python

a) Reproducible Research in R:

Coursera – Reproducible Research

b) Reproducible research in Python:

Data Carpentry – Reproducible Research using Jupyter Notebooks

9. You can talk about and present results (hard to link resources here). Most of this is practice but here are a few things that could help:

Coursera - Effective Business Presentations with Powerpoint

Kaggle – Communicating Data Science: A Guide to Presenting Your Work

That's it!

We need to get real with our aspiring data scientists. This is a fair bit to learn and master, if you do so it will really place you in the upper end of analysts out there.

It kind of seems like the basics, but there is still a lot to it as you can see.