Data scientists are not software developers, but I’d argue that as time pushes on they are having more and more in common with software developers. Many data scientists are shipping data products into production systems, but even for those who are extracting insights having a knowledge of version control is critical.
I have explained in previous posts the central role of version control for executing the Microsoft Team Data Science Process. (which I consider to be the best way to run a data science team.)
There is also a strong argument to preserve your own sanity. If you have ever seen those directories that have multiple versions of the same file, or if you are sending code via email for someone to use having a Git repository for your team could save you a lot of time and pain.
Overview of Git
Here is one of the best and most straightforward explanations of Git I have seen. The diagrams are great, and he covers a lot of ground very quickly. An excellent high level overview of Git! Watch the first 30 minutes of this course:
CS 50 Web Development Course: Lecture 0
Simple Summary of Git
Then you should take a look at this great and hilarious summary of Git:
Git - the Simple Guide
More Detailed Summary of Git, GitHub and Setup
Of course Hadley Wickham has put together a fantastic summary of Git and GitHub. There is a bit about RStudio, but push through it because there are command line equivalents. He does a good job of introducing more complex topics like pull requests.
Hadley’s Git Guide
Practice with Git Commands
This website is more than just Git branching, it is a collection of practical exercises that can be run in your browser. Well worth your time going through these exercises.
Learn Git Branching
Now, if you want to get your hands dirty not in the browser I’d recommend Coursera’s Git Course.