Learn about data science and Julia while solving real-life problems
Read our book "Data Science in Julia for Hackers" at: https://datasciencejuliahackers.com/
We put together this post to share the release of our first book on data science methods, which focuses on solving real-life problems. This release is actually a beta version, as we are looking to receive constructive criticism and feedback, in order to improve the book until the first revised version is ready.
We come from different backgrounds. Federico Carrone is a developer with over 15 years of experience and founder of a startup that is running since 2014, who is currently studying towards a degree in Mathematics. Herman Obst is an Industrial Engineer who is just learning to program as is our Physicist, Mariano Nicolini. Martina Cantaro coordinated the writing process and our technical consultants, Manuel Puebla and Lucas Fernández Piana, are making sure our definitions are simple to understand yet accurate.
We do not come from academia, nor are we experts in statistics, which may make some people wonder whether we have enough authority to concern ourselves with such complex issues. But we have one thing in common: above all, we are doers. Collectively, we have experience in setting ourselves big goals and achieving them through, learning, hacking, tinkering and thinking. Data science is a toolkit that has enabled us to solve different real-life problems and we want to share what we have learned in the process.
Speaking of real-life problems, few disciplines have as much impact when it comes to solving them as data science. Currently, our society is constantly generating massive amounts of information encoding complex behaviors and relationships from a wide array of fields. And people are developing the tools to use that information to our benefit.
That was why we were so struck by the fact that almost all the books on data science and statistics had a very theoretical approach, focusing on understanding the mathematics of the algorithms and never talking about their applications to situations we might encounter in work or life in general.
Because of this (and because of our love for making and generating things) we decided to embark on the adventure of writing a book. A book whose first and foremost premise was to propose diverse and interesting problems, and solve them using the ingenuity and tools of data science. And theory does not play a minor role, not at all, but it is developed only to the extent that the resolution of the problem requires it. In this way, a real connection between theory and reality is achieved.
What we want you to learn
One of the ideas we are most interested in conveying is that of taking action. Nowadays there is a dominant way of thinking, widespread in the academic world, which maintains that before being able to carry any meaningful work we must first acquire a broad theoretical knowledge of the subject. We think this can be counterproductive, as it makes people afraid of taking risks and daring to immerse themselves in practice.
We decided to disprove this way of thinking by making a book about Bayesian statistics, machine learning and artificial intelligence, without (at first) knowing much about any of these fields.
That was our goal. We figured out the rest as we went along.
What we learned
With our goal always in mind, we started searching and solving a wide variety of problems that could be interesting to tackle using Bayesian inference, which at the beginning was the only topic we planned to talk about. That way we got used to the Bayesian mindset and the different probabilistic programming tools in Julia, our language of choice.
But as we profressed, we changed the focus of the book from something like “Bayesian methods from a practical perspective” to include data science topics in general, from time series prediction to the powerful Scientific Machine Learning ecosystem. Finally, although we wanted to focus on these -not so widespread- methods, we thought it pertinent to add a section on more classical methods, such as deep learning and machine learning — the latter currently in production.
As the dificulty of the scenarios we created began to increase (we often found that a problem was way more complicated than we initially thought), it became clear that we had to incorporate a little more solid knowledge about Bayesianism. That’s where reading Bayesian Methods for Hackers and Statistical Rethinking gave us a much better understanding and tools to deal with the complexity.
Our writing process also got better over time. Although we already had some experience writing blog posts, writing a book was a brand new challenge for everyone, especially since we are not native English speakers.
At the beginning, our explanations of models and the theory behind them were somewhat lacking. Several pages were discarded, and the rest were re-written several times until we found them passable. It was a process that involved some deal of frustration, but as we iterated over them, the quality of the pages increased substantially. Finally, we found a comfort zone in terms of diagraming, coding and writing the chapters.
The road was winding and some keys to keep the progress up were to not let ourselves be overwhelmed by the enormous task, to divide the work, to go chapter by chapter and, above all, to always keep moving forward. One strategy we used was to complete the writing of the chapters until we felt they were 80% complete. That way, the progress curve always kept a good positive slope, since (by Pareto law) that last 20% would take 80% of the time to complete. Only when we had completed 80% of the book, we went back chapter by chapter to finish polishing it.
Where you can read it
And that’s it! As of now, the book is up for everyone to read at:
We hope it will be useful to as many people as possible and we hope to receive lots of feedback and constructive criticism, so we can keep improving edition after edition.