Data Scientist I
Progress: 0%
Welcome to the Machine Learning Track! Continue
Progress: 0%%
Description: # Welcome to the Machine Learning Track! This track is intended to introduce an exhaustive range of concepts that are applied in the day-to-day work of a Data Analyst. It will cover all concepts and follows a learning-by-doing methodology for skill-building, by providing many exercises and milestone labs for practicing application of the concepts previously learned. The objective of this track is to develop data analysis skills to collect, manipulate and present data for easy consumption by business users. ## Data Life Cycle and the role of a Data Analyst
Read more..

Intro to Python Continue
Progress: 0%%
Description: # Intro to Python Python is a simple, easy-to-learn, pseudo-code resembling programming language. It is rich with all the features of any object oriented language. Scientists and Mathematicians have been using python since its inception and hence is popular for analytical tasks. Python is also known for its brevity.  ## Why is Python a Favorite of Data Scientists?
Read more..

Python Advanced Continue
Progress: 0%%
Description: # Python Advanced A continuation of the previous lesson "Introduction to Python", here we introduce more advanced concepts. ## Operators
Read more..

Dataframes Continue
Progress: 0%%
Description: # Dataframes ## Dataframe Basics ### The Pandas Library
Read more..

SQL in Python. Continue
Progress: 0%%
Description: # SQL in Python. As mentioned in earlier lessons, Python is very flexible and has a wide range of libraries and third-party modules to support many operations. SQL (Structured Query Language) can be executed from within Python using sqlite3. The sqlite3 module offers support to connect to an external database and execute SQL queries. However, this module does not offer the complete querying capability of a typical SQL engine and functions as a light-weight API version of the querying engine. Other modules like MySQLdb (same as mysql-python), offer a more extensive range of functions and query processing abilities. We will be discussing sqlite3 module, as it is the widely used. Though it is a light-weight module, it supports almost all basic sql operations and can be implemented for a database of up to 140 Terabytes in size.
Read more..

Advanced SQL Continue
Progress: 0%%
Description: # Advanced SQL After handling basic sql queries through Python, let us look at slightly more advanced SQL queries and their execution through Python's sqlite module. SQL concepts like nested queries, merges, join operations are some of the basic concepts among advanced sql queries. In order to execute queries and practice these concepts, we need to first load data. Most of these concepts involve more than one table, so we will load and work with data among two tables. ## Introduction to Fandango dataset
Read more..

Data Modeling Continue
Progress: 0%%
Description: # Data Modeling Data modeling is, providing a structure to the data in order to allow efficient storage, easy access and better comprehension of data. One of the fundamental concepts of data modeling is the entity-relationship model. ## The Entity-Relationship Model
Read more..

Introduction to Statistics Continue
Progress: 0%%
Description: # Introduction to Statistics ## Mean, Median and Mode When working with a large data set, it can be useful to represent the entire data set with a single value that describes the "middle" or "average" value of the entire set. In statistics, that single value is called the central tendency and mean, median and mode are all ways to describe it.
Read more..

Linear Algebra - Basics Continue
Progress: 0%%
Description: # Linear Algebra - Basics # Introduction to Linear Algebra Linear algebra is a branch of mathematics that deals with equations of straight lines. A line is made up of multiple points. A point in 2 dimensional (2D) space is represented using two coordinates (x,y).
Read more..

Numpy Continue
Progress: 0%%
Description: # Numpy ## Arrays and Lists Before getting introduced to Numpy library, we need to be familiar with a very widely used data structure called 'array'. An array is a collection of homogenous variables. Here homogenous means variables of the same data type. And so an array can be a collection of integers (int datatypes), collection of fractions/decimal values (float datatypes) or a collection of characters (char datatype) also referred to as a string.
Read more..

Array Shape Manipulation Continue
Progress: 0%%
Description: # Array Shape Manipulation Sometimes a Data Analyst might be required to run shape manipulation algorithms on the data which needs to be changed in shape, dimensions or property so as to be merged or concatenated with another array. Therefore it is really important to know how to perform these fucntions in Python. Array shape manipulation is performed by the following functions -
Read more..

N-dimensional array in Python Continue
Progress: 0%%
Description: # N-dimensional array in Python An array is a list or collection of homogenous elements, i.e., same type of items. An N-dimensional array is a collection of such arrays, and in simplest terms can be described as an array of arrays. A two dimensional array, also called a matrix (plural: matrices), is very common and most of us would be familiar with it. An array of matrices can be visualized as a 3 dimensional array. An array can be defined using the '.array' method of the numpy module. A range of functions such as dtype, shape, size, etc., are available to find out about various attributes of the array.
Read more..

Machine Learning Continue
Progress: 0%%
Description: # Machine Learning ## Introduction to Machine Learning ### Machine Learning
Read more..

Data Visualization Continue
Progress: 0%%
Description: # Data Visualization ## Data Visualization ### Data Visualization
Read more..

Introduction to Linear Regression Continue
Progress: 0%%
Description: # Introduction to Linear Regression Linear regression is a supervised learning algorithm. Given a single feature, a line is fit that best predicts the independent variable. When many features are involved, a hyperplane is fit that minimizes the error between predicted values and the ground truth. Given an input vector Xn = (X1, X2, ..., Xn) that we want to use to predict the output y, the regression equation is given by: $$y=\beta_0 + \sum_{i=1}^nX_i\beta_i$$
Read more..

You are Done! Continue
Progress: 0%%
Description: # You are Done! ## Structured vs Unstructured Data As discussed in the initial lesson, there are millions of sources of data today. A few decades ago, sources of information were not as many and the type of data collected/generated was very static and consistent in nature. This allowed them to be easily stored in a given structure. With progress in data collection, more of more varied data is being collected today. Multimedia data such as images, audio and video are also collected, stored and analyzed. JSON, XML Objects, BLOB are some of the new age data. The combination of all the structured and unstructured data collected by all data sources, both private and public, put together is termed as 'Big Data'.
Read more..

Lab: Gender Analysis of Twitter Dataset Continue
Progress: 0%%
Description: # Lab: Gender Analysis of Twitter Dataset ## History This data set was used to train a CrowdFlower AI gender predictor. Contributors were asked to simply view a Twitter profile and judge whether the user was a male, a female, or a brand (non-individual). The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.
Read more..