Data Intelligence Workshop
Progress: 0%
The Data Story.. Continue
Progress: 0%%
Description: # The Data Story.. ## Clean and Messy Data Here are a few examples of what kinds of data we encounter:
Read more..

Advanced k-means Continue
Progress: 0%%
Description: # Advanced k-means ## Let us prepare the 'normal' data To start with clustering, let us consider the datasets which follow the Gaussian 'normal' distribution with a low variance. To do so, we can synthesize a dataset using sklearn make_blob feature. The centers of these gaussian blobs need to be specified. In two dimensions, we need to specify the centers, standard deviation and number of samples as 2000. Here is the gaussian normal distribution function:
Read more..

Maximum Likelihood Estimation (MLE) Continue
Progress: 0%%
Description: # Maximum Likelihood Estimation (MLE) * A model hypothesizes a relation between the unknown parameter(s) and the observed data. * The goal of a statistical analysis is to estimate the unknown parameter(s) in the hypothetical model * The likelihood function is a popular and latest method to estimate unknown parameters.
Read more..

Gaussian Mixture Models (GMMs) Continue
Progress: 0%%
Description: # Gaussian Mixture Models (GMMs) ## The Three Archers - Not So 'normal' Data In an archery training session, three archers are told to shoot at the target. Assume that the archers are shooting the same arrows and later at the end of the competition they need to count their scores. What would be the best estimate of their scores? Assume that inner yellow has the highest score and the score lowers as the circle your arrow lands in moves away from the center.
Read more..

Linearly Inseparable Datasets Continue
Progress: 0%%
Description: # Linearly Inseparable Datasets ## The Non-Convex Regions ### Non-Convex Regions
Read more..

DBScan Continue
Progress: 0%%
Description: # DBScan ## DBScan DBSCAN has the ability to capture densely packed data points. It is similar to KNNs with variable parameters.
Read more..

Spectral Clustering Continue
Progress: 0%%
Description: # Spectral Clustering ## Spectral Clustering Spectral Clustering works by transforming the data into a subspace prior to clustering. This is incredibly useful when the data is high dimensional. This saves the effort of doing a PCA or a dimensionality reduction ourselves prior to clustering. Spectral clustering works by determining an affinity matrix between the datasets. The data is represented as a graph and an affinity matrix is computed. For the affinity function, we can use the rbf kernel function or nearest neighbors.
Read more..

Agglomerative Clustering Continue
Progress: 0%%
Description: # Agglomerative Clustering ## Agglomerative Clustering ### Algorithm
Read more..