Data Science


Data Science deal with processing, analysing, modelling and visualizing data. It requires knowledge and skills from many different domains and is almost impossible to be expert at all of them. This path will give you the solid foundation, so you can decide which subfield of Data Science interest you the most and continue you study in this direction.

This path will take you roughly 450 hours of work.

START LEARNING

VOTE

Introduction


To better understand problems and capabilities of Data Science watch these playlist of several TED Talkes.

Mathematics


Mathematics is essential to formalize the description of processes and behaviours. You will learn mathematical concepts that will enable you to study and fully understand the Data Analysis.

VOTE

Linear Algebra


In Data Science we often deal with objects described by many attributes (e.g. people wth age, gender, education, etc.). We can think of them as elements of multidimensional space, where each coordinate describes one attribute. Linear Algebra is mathematical tool to formalize and work with those spaces.

■ □ □
20 h

VOTE

Probability and Statistics


The next field on mathematics, which is heavily used to analyze data is statistics.

Computer Science


Josh Willis defined data scientist as: "Person who is better at statistics than any software engineer and better at software engineering than any statistician".

During following courses you will learn some areas of Computer Science, which are crucial in Data Science.

VOTE

UNIX


Even though you don't have to use UNIX based operating system to work as a data scientist, it is definitely more convenient. This tutorial will introduce you to the basic commands of UNIX.

■ □ □
2 h

VOTE

Git


Control version helps to organize your code and Git in one of the most widely used tool for that.

■ □ □
1 h

VOTE

Intro to Computer Science


CS50 in famous Harvad introductory course to Computer Science. It covers basics of algorithm, computer structures, programming in C and web development.

VOTE

Python


Python is very popular programming language, which is also widely used among data scientists.

■ ■ □
3 h
■ ■ □
5 h

VOTE

Databases


The data has to be stored somewhere and typically the relational databases serves that purpose. However there are also other ways to strore and quary data we will learn about.

■ □ □
1 h

Data Analysis


Data Analysis is the main skill that every data scientist have.

VOTE

Data Analysis


This course will give you solid foundation of Data Analysis as well as basic knowledge of R programming language. Next to Python it is the most popular language among data scientists.

■ ■ □
100 h

VOTE

Machine Learning


Now we will focus purely on Machine Learning, get to know typical algorithms and a little of theory behind learning. Next we will introduce the sklearn, the python library for Machine Learning.

■ ■ ■
50 h
■ ■ □
10 h

VOTE

Kaggle competition


Kaggle is a platform which hosts several machine learning competitions. Just choose the topic that interest you and participate in one of the competitions. It will give you ability to use what you've learned and compare yourself with other data scientist.

■ ■ □
20 h

Data Visualization


Data Scientists should be able to present their insights in a way that other will understand. One approach to do this is Data Visualization. It also helps you explore data.

VOTE

Introduction


Let's start with some examples of presenting data.

VOTE

Visualization


CS171 is a wonderfull course from Harvard which will guide you through Data Visualization. Make sure to visit course website for additional information: http://www.cs171.org/2014/

■ □ □
40 h

Final Project


There is no course here. Just go into the world and do something fun. Find interesting dataset (or even better - obtain it), analyze it, prepare visualization or build a product based on those data.

Some ideas to help you start:
- use twitter API to visualize users activity in particular topic (favourite sport team?),
- predict the rating of the movie based on movie description, actors, etc.,
- get data from data.gov and prepare the interactive visualization to find the best city to live in

Additional resources


Additional resources useful to study Data Science.

This path was last modified: April 2, 2016, 10:44 p.m.