Data Science deal with processing, analysing, modelling and visualizing data. It requires knowledge and skills from many different domains and is almost impossible to be expert at all of them. This path will give you the solid foundation, so you can decide which subfield of Data Science interest you the most and continue you study in this direction.
This path will take you roughly 450 hours of work.START LEARNING
To better understand problems and capabilities of Data Science watch these playlist of several TED Talkes.
Mathematics is essential to formalize the description of processes and behaviours. You will learn mathematical concepts that will enable you to study and fully understand the Data Analysis.
In Data Science we often deal with objects described by many attributes (e.g. people wth age, gender, education, etc.). We can think of them as elements of multidimensional space, where each coordinate describes one attribute. Linear Algebra is mathematical tool to formalize and work with those spaces.
Probability and Statistics
The next field on mathematics, which is heavily used to analyze data is statistics.
Josh Willis defined data scientist as: "Person who is better at statistics than any software engineer and better at software engineering than any statistician".
During following courses you will learn some areas of Computer Science, which are crucial in Data Science.
Even though you don't have to use UNIX based operating system to work as a data scientist, it is definitely more convenient. This tutorial will introduce you to the basic commands of UNIX.
Control version helps to organize your code and Git in one of the most widely used tool for that.
Intro to Computer Science
CS50 in famous Harvad introductory course to Computer Science. It covers basics of algorithm, computer structures, programming in C and web development.
Python is very popular programming language, which is also widely used among data scientists.
The data has to be stored somewhere and typically the relational databases serves that purpose. However there are also other ways to strore and quary data we will learn about.
Data Analysis is the main skill that every data scientist have.
This course will give you solid foundation of Data Analysis as well as basic knowledge of R programming language. Next to Python it is the most popular language among data scientists.
Now we will focus purely on Machine Learning, get to know typical algorithms and a little of theory behind learning. Next we will introduce the sklearn, the python library for Machine Learning.
Kaggle is a platform which hosts several machine learning competitions. Just choose the topic that interest you and participate in one of the competitions. It will give you ability to use what you've learned and compare yourself with other data scientist.
Data Scientists should be able to present their insights in a way that other will understand. One approach to do this is Data Visualization. It also helps you explore data.
Let's start with some examples of presenting data.
CS171 is a wonderfull course from Harvard which will guide you through Data Visualization. Make sure to visit course website for additional information: http://www.cs171.org/2014/
There is no course here. Just go into the world and do something fun. Find interesting dataset (or even better - obtain it), analyze it, prepare visualization or build a product based on those data.
Some ideas to help you start:
- use twitter API to visualize users activity in particular topic (favourite sport team?),
- predict the rating of the movie based on movie description, actors, etc.,
- get data from data.gov and prepare the interactive visualization to find the best city to live in
Additional resources useful to study Data Science.
This path was last modified: April 2, 2016, 10:44 p.m.