I’m not going to learn Julia yet, but I am interested in seeing what happens with the community and the data science ecosystem, please let me know if you agree/ disagree.
Julia version 1.0 released
On 8th August 2018 a version 1.0 of the Julia programming language was released. The aims of the contributors are ambitious and noble at the same time:
“We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.”
So, should you learn Julia?
To me the utility of a language for data science is in its modules and libraries. If it can make my life easier than great I’ll use it, otherwise I may not. You can call R and Python libraries from Julia, so I’d suggest this is the way to go, however you need some familiarity with what is available in R and Python, hence you probably should learn R and Python first.
Julia is going to be very familiar to anyone who has used Python before, so I would argue that learning Python is favorable to learning Julia.
There are so many great tutorials out there for R and Python, a whole heap of MOOCs, YouTube videos etc that learning is easy and relatively cheap. Julia isn’t as widespread in the analytics community, so there simply aren’t as many resources out there either. So both the Python and R communities are huge, and that’s an advantage.
If you are interested in Julia
Have a read of this tutorial on using Julia for Data Science:
A Comprehensive Tutorial to Learn Data Science with Julia from Scratch
This cheatsheet will provide you with an idea of Julia’s features and syntax:
The Fast Track to Julia
There are more learning resources for Julia if you are interested at:
There is also a Coursera course available:
To me, Julia is definitely one to watch, but if I were new to data science I wouldn’t learn Julia in place of either R or Python.
Some other languages for Data Science
Back in 2010 Ross Ihaka one of the creators of R suggested that rather than fixing R’s problems it might be better to start again:
R: Lessons Learned, Directions for the Future
One of the possible replacements suggested at the time was Clojure with Incanter. Clojure allowed access to the JVM, with nice Lisp-like syntax. It seemed to be an attractive thing to convert people from R to Clojure:
Ross Ihaka was thinking along the same lines back in 2008 when he wrote “Back to the Future: Lisp as a Base for a Statistical Computing System”
Although it seems Incanter is not under active development at the moment, so I wouldn’t encourage people to look at Incanter.
F# for data science has been advocated by Microsoft, and maybe this isn’t a bad choice for C# developers comfortable with the Visual Studio IDE to take steps into data science and functional programming.
A good resource I have found about machine learning in F# is the following:
However F# and Incanter aren’t on my list of languages to learn either.
Matlab is a great language for understanding machine learning algorithms, it’s no mistake that Andrew Ng uses Octave/ Matlab in his Machine Learning course on Coursera:
The only problem is that most places where most data scientists are employed tend to use R or Python, so to maximise your chances of having the largest number of job opportunities again I’d say go for R or Python.
So, what am I doing?
For the time being I am sticking with R and Python, I might dabble a bit with Julia but for the time being the data science ecosystem available in R and Python is just too handy. Knowing a scripting language like R or Python at the moment is enough to get the job done. That’s not to say that Julia isn’t something to look into in the future, and it is certainly on my “to learn” list, but it isn’t at the top yet.
I’m interested in some of the Google Cloud technologies for “No Ops” Data Science. So, by “No Ops” I mean you create a Google Storage instance, suck in your data, run a process and walk away. Google handles scaling, load balancing, performance and everything else I don’t understand. The APIs for speech, image and text look to be absolutely awesome too. So, Google AI is solving a problem I see a lot where different companies have different ways to deploy machine learning models, sometimes even within different business units.
A link to the Google Cloud Specialization is here, I’m just ticking it over one module per day after work starting with:
Data Engineering on Google Cloud Platform Specialization
Architecting with Google Cloud Platform Specialization