Hi Data Friends,

As someone who has worked in the analytics field a long time it frustrates me that there are unrealistic expectations placed on new data scientists. Many people find it hard to get that first start because simply the expectations of the job are crazy!

To put it in perspective as a guy with 10+ years in the field, I probably wouldn’t meet the expectations of half the grad jobs I see – and I have some serious game!

So, let's get real here with credentials for new data scientists. By “getting real” I mean what is important right now in data science, or what is the everyday bread and butter data science now? Deep Learning will come, but it isn’t here yet. So, let’s just think about the 80/20 rule.

What are the 20% of skills that are going to get you the 80% of jobs?

What are the 80% of companies that would need these skills?

I know there are some amazing places that are doing computer vision and self-driving cars etc (20% or less), but to me the 80% of companies need this:

1. You know a bit about command line:

Something like this should be fine to get you started:

The Learn Enough Society – Learn Enough Command Line to be Dangerous

https://www.learnenough.com/command-line-tutorial

2. You know a bit about Git

Check this course out from Coursera, it should be good!

Coursera – Version Control with Git

https://www.coursera.org/learn/version-control-with-git

3. You know a bit about SQL, because even in 2018 most of the data you will see is in a database and you'll access it via SQL:

Coursera - SQL for Data Science:

https://www.coursera.org/learn/sql-for-data-science

4. You can read data into R or Python

a) Reading data into R:

Datacamp – R Data Import Tutorial

https://www.datacamp.com/community/tutorials/r-data-import-tutorial

b) Reading data into Python:

Linear Data Blog – How do you import data into Python?

http://lineardata.net/how-do-you-import-data-into-python/

5. You know how to manipulate data in either R or Python

a) Manipulate data in R:

Coursera - Getting and Cleaning Data:

https://www.coursera.org/learn/data-cleaning

b) Manipulate data in python:

Coursera – Introduction to Python for Data Analysis

https://www.coursera.org/learn/python-data-analysis

pythonprogramming.net – Data Analysis with Python and Pandas

https://www.youtube.com/watch?v=Iqjy9UqKKuo&list=PLQVvvaa0QuDc-3szzjeP6N6b0aDrrKyL-

6. You can do some plotting in R or Python

a) Plotting in R:

Coursera – Exploratory Data Analysis

https://www.coursera.org/learn/exploratory-data-analysis

b) Plotting in Python:

Coursera – Applied Plotting, Charting & Data Representation in Python

https://www.coursera.org/learn/python-plotting

7. You can fit basic models like GLMs in R or Python.

a) Modelling in R:

Coursera – Regression Models

https://www.coursera.org/learn/regression-models

b) Modelling in Python:

Coursera – Regression Modeling in Practice:

https://www.coursera.org/learn/regression-modeling-practice

8. You can document your modelling results ideally using reproducible research ideas in either R or Python

a) Reproducible Research in R:

Coursera – Reproducible Research

https://www.coursera.org/learn/reproducible-research

b) Reproducible research in Python:

Data Carpentry – Reproducible Research using Jupyter Notebooks

https://reproducible-science-curriculum.github.io/workshop-RR-Jupyter/

9. You can talk about and present results (hard to link resources here). Most of this is practice but here are a few things that could help:

Coursera - Effective Business Presentations with Powerpoint

https://www.coursera.org/learn/powerpoint-presentations

Kaggle – Communicating Data Science: A Guide to Presenting Your Work

http://blog.kaggle.com/2016/06/29/communicating-data-science-a-guide-to-presenting-your-work/

That's it!

We need to get real with our aspiring data scientists. This is a fair bit to learn and master, if you do so it will really place you in the upper end of analysts out there.

It kind of seems like the basics, but there is still a lot to it as you can see.

Nic

Recent Posts

Archive

Tags