Data Scientist II
Progress: 0%
Progress: 0%%
Description: # Intro to Python Python is a simple, easy-to-learn, pseudo-code resembling programming language. It is rich with all the features of any object oriented language. Scientists and Mathematicians have been using python since its inception and hence is popular for analytical tasks. Python is also known for its brevity.  ## Why is Python a Favorite of Data Scientists?

Progress: 0%%
Description: # Numpy ## Arrays and Lists Before getting introduced to Numpy library, we need to be familiar with a very widely used data structure called 'array'. An array is a collection of homogenous variables. Here homogenous means variables of the same data type. And so an array can be a collection of integers (int datatypes), collection of fractions/decimal values (float datatypes) or a collection of characters (char datatype) also referred to as a string.

Progress: 0%%
Description: # Dataframes ## Dataframe Basics ### The Pandas Library

Progress: 0%%
Description: # Linear Algebra - Basics # Introduction to Linear Algebra Linear algebra is a branch of mathematics that deals with equations of straight lines. A line is made up of multiple points. A point in 2 dimensional (2D) space is represented using two coordinates (x,y).

Progress: 0%%
Description: # Data Science workflow ## Origins of Data Science ### History

Progress: 0%%
Description: # Machine Learning ## Introduction to Machine Learning ### Machine Learning

Progress: 0%%
Description: # Data Visualization ## Data Visualization ### Data Visualization

Progress: 0%%
Description: # Introduction to Linear Regression Linear regression is a supervised learning algorithm. Given a single feature, a line is fit that best predicts the independent variable. When many features are involved, a hyperplane is fit that minimizes the error between predicted values and the ground truth. Given an input vector Xn = (X1, X2, ..., Xn) that we want to use to predict the output y, the regression equation is given by: $$y=\beta_0 + \sum_{i=1}^nX_i\beta_i$$

Progress: 0%%
Description: # Logistic Regression ## Logistic Regression ### Introduction to Classification

Progress: 0%%
Description: # Logistic Regression: Model Building and Implementation <br/><br/><br/> ## Titanic Survivors - Data Selection & Preparation

Progress: 0%%
Description: # Support Vector Machines (SVMs) ## Introduction Support Vector Machines are classifiers that can classify datasets by a introducing an optimal hyperplane between the multi-dimensional data points. An hyperplane is a multi-dimensional structure that extends a two-dimensional plane. If the datasets consists of two dimensional dataset, then an estimate line is fit that provides the best classification on the dataset. By "best classification", it is to be noted that a plane that not necessarily provides perfect classification of all points in the training dataset but fits a criterion such that the line is farthest from all points. You can see from the figure below that a hyperplane classifies the dataset as shown.