PATHWAYS TO DATA SCIENCE
Do It Yourself
Your pathway to data science will depend on your experience and where you want to go. Below you will find six different pathways with links to relevant courses.
Please note that this page contains affiliate links, but only ones that I think are awesome and worthwhile.
I have at various times used the following courses for reference. I haven’t signed up for the accreditation they offer, just note you have an option to audit these courses for free. In that way you can see the videos and get the information, often assignments are locked for students auditing the courses.
If, however you want the accreditation as a signal of your mastery of the subject to a future employer by all means go ahead and sign up for the paid track.
The 80/20 Data Science Path:
As someone who has worked in the analytics field a long time it frustrates me that there are unrealistic expectations placed on new data scientists. Many people find it hard to get that first start because simply the expectations of the job are crazy!
To put it in perspective as a guy with 10+ years in the field, I probably wouldn’t meet the expectations of half the grad jobs I see – and I have some serious game!
So, let's get real here with credentials for new data scientists. By “getting real” I mean what is important right now in data science, or what is the everyday bread and butter data science now? Deep Learning will come, but it isn’t here yet. So, let’s just think about the 80/20 rule.
What are the 20% of skills that are going to get you the 80% of jobs?
What are the 80% of companies that would need these skills?
I know there are some amazing places that are doing computer vision and self-driving cars etc (20% or less), but to me the 80% of companies need this:
1. You know a bit about command line:
Something like this should be fine to get you started:
The Learn Enough Society – Learn Enough Command Line to be Dangerous
2. You know a bit about Git
Check this course out from Coursera
3. You know a bit about SQL, because even in 2018 most of the data you will see is in a database and you'll access it via SQL:
Coursera - SQL for Data Science
4. You can read data into R or Python
a) Reading data into R:
Datacamp – R Data Import Tutorial
b) Reading data into Python:
Linear Data Blog – How do you import data into Python?
5. You know how to manipulate data in either R or Python
a) Manipulate data in R:
Coursera - Getting and Cleaning Data
b) Manipulate data in python:
Coursera – Introduction to Python for Data Analysis
pythonprogramming.net – Data Analysis with Python and Pandas
6. You can do some plotting in R or Python
a) Plotting in R:
Coursera – Exploratory Data Analysis
b) Plotting in Python:
7. You can fit basic models like GLMs in R or Python.
a) Modelling in R:
Coursera – Regression Models
b) Modelling in Python:
Coursera – Regression Modeling in Practice:
8. You can document your modelling results ideally using reproducible research ideas in either R or Python
a) Reproducible Research in R:
Coursera – Reproducible Research
b) Reproducible research in Python:
Data Carpentry – Reproducible Research using Jupyter Notebooks
9. You can talk about and present results (hard to link resources here). Most of this is practice but here are a few things that could help:
The 80/20 Data Analyst Path:
We have to start somewhere, I was originally a spreadsheet guy, but there are better ways of working with and automating Excel.
Power BI is great software, it’s an excellent and affordable way to democratize data in an organisation. You can leverage what you have learned with Excel and use it here.
SQL is one of the most important tools of a data analyst and even a data scientist.
Doing the analysis is great, but how do you get buy in from your stakeholders to make sure your insights and data products make an impact?
I know programming, but I am weak on statistics
Well my friend, Python will be your friend. The Python language is going to be something you can work with very quickly and easily, R would be a bit quirky for you.
Here’s what I think is the best option for you to get you up to speed quickly:
With this series you will learn basic data munging in Python, data visualization and model fitting as well as text mining and social network analysis, which are both really interesting applications of machine learning.
I know statistics, but I am weak in programming
You need to know version control to work effectively in a team.
Version control is critical for working with others on projects.
If you don’t know SQL, well… I mean you just have to know SQL.
The R Programming language is going to be easier for you to use than Python. It will get you up and running sooner and it has a great ecosystem of libraries to help you with your analysis. This series of courses from John Hopkins is end to end simply a superb introduction to Data Science in R.
I want to get some credentials in Data Science using Microsoft Products
This program takes you through the Microsoft stack beginning with Excel or Power BI, it goes through the essential math/ stats you need to know then teaches you data manipulation, analysis and modeling in either R or Python. It even teaches you about building and deploying models in Azure and big data technologies like Spark and Microsoft R Server and RevoScaleR.
I’m a Data Science Manager, give me the info but not the code