Graphical Models with PGMPY
Progress: 0%
Progress: 0%%
Description: # Probability and Bayesian Theory ## Additive and Multiplicative Rules of Probability Consider $A$,$B$ are two events, and $P$ denotes the probability of occurence of an event.

Progress: 0%%
Description: # Maximum Likelihood Estimation (MLE) * A model hypothesizes a relation between the unknown parameter(s) and the observed data. * The goal of a statistical analysis is to estimate the unknown parameter(s) in the hypothetical model * The likelihood function is a popular and latest method to estimate unknown parameters.

Progress: 0%%
Description: # Conjugate Priors & The Beta Distribution Consider a scenario where we count the number of heads in a coin toss that conforms to a bernoulli distribution. Suppose, the number of samples are low such as an event with 5 coin tosses where we obtain 4 heads and 1 tails, or all 5 heads, then the Maximum Likelihood Estimate (MLE) of the bernoulli distribution dictates the probability of coin toss to be either a 0.8 or 1.0 respectively. However, we know that the unbiased coin has a probability of 0.5 and hence the MLE isn't the accurate estimate and infact much farther from it. Therefore, we must search for a probabability distribution for the coin toss activity so that our final estimate is more realistic. One of the ways to obtain accurate estimates is by conducting more experiments and in our case, more coin tosses so that the probability of the coin toss converges to its true value. To resolve this issue, we can assume that the probability of the coin toss arises from a distribution. This prior distribution is known as conjugate prior. The conjugate prior distribution has interesting properties and has similar form as the binomial distribution or the posterior. We can now provide a bayesian treatment to this probability distribution. Probability of obtaining k heads while N toss experiments are carried out and the probability of the coin being x and data, \theta is given by: $$p(\theta | x) = \cfrac{p(x|\theta)p(\theta)}{\int_0^1 p(x|\theta')p(\theta')}$$

Progress: 0%%
Description: # Bayesian Theory Problems ### Binary Communication System Consider a binary communication system where the input is either a 0 or a 1 with probability p. The receiver has an error with probability $\epsilon$ which would mean that the received data gets flipped [Alberto-1]. This can be illustrated as shown in the figure below:

Progress: 0%%
Description: # Joint Distributions Consider two discrete random variables X and Y. The function given by f (x, y) = P(X = x, Y = y) for each pair of values (x, y) within the range of X is called the joint probability distribution of X and Y.

Progress: 0%%
Description: # Independent and Identical Random Variables When observations drawn from a random sample are independent of each other, but have the same probability distribution, then the observations are said to be independent and identical random variables. Lets say from a population of transactions, we drew a random sample and we try to observe if the transaction is a fradulent transaction or not. Assuming that the probability of a transaction to be 'p', we can say each transaction from the random sample, has the same probability distribution but the transactions themselves are independent of each other. Hence, these are said to be independent and identical random variables. An interesting point to note is that the random variables are said to be independent only when the joint probability distribution of the variables is a product of the individual marginal probability distributions. (For further reading refer to the following link: https://inst.eecs.berkeley.edu/~cs70/sp13/notes/n17.pdf)

Progress: 0%%
Description: # Bayesian Networks

Progress: 0%%
Description: # Fraud Modeling Example with pgmpy pgmpy is one of the popular packages to do Bayesian Network modeling. We shall continue to use the fraud modeling example to visualize our network. pgmpy is good for simpler problems, to visualize the indepencies and CPDs. It doesn't work very well for large dimensional problems. There are other toolkits which are available such as: * WINMINE by Microsoft: https://www.microsoft.com/en-us/research/project/winmine-toolkit/

Progress: 0%%
Description: # Features of a Bayesian Network So far we have seen that: * A Bayesian Network is a joint probability distribution of a set of random variables.

Progress: 0%%
Description: # Credit Approval Model using a Bayesian Network Let us look at a credit approval process example. Please note that the model/process shown here does not closely follow any real life approval process. This model is a completely generated from scratch solely for the purpose of practice and easy explanation. There are two factors, Outstanding Loan (OL) and Payment History (PH) which are independent of each other and influence another factor Credit Rating (CR). Credit Rating and Income Level (IL) are in turn two independent factors which influence Interest Rate (IR) of a credit line that would be extended to a customer. Depending upon CR and IL, a customer may receive a credit/loan at a premium rate, par rate or discounted interest rate.

Progress: 0%%
Description: # Gibbs Sampling Bayesian inference generates full posterior probability distribution over a set of random variables. Gibbs Sampling algorithm is based on Monte Carlo Markov Chain (MCMC) technique. The underlying logic of MCMC sampling is that we can estimate any desired expectation by ergodic averages [Gibbs]. ## Gibbs Sampling in pgmpy

Progress: 0%%
Description: # Gaussian Mixture Models (GMMs) ## The Three Archers - Not So 'normal' Data In an archery training session, three archers are told to shoot at the target. Assume that the archers are shooting the same arrows and later at the end of the competition they need to count their scores. What would be the best estimate of their scores? Assume that inner yellow has the highest score and the score lowers as the circle your arrow lands in moves away from the center.

Progress: 0%%
Description: # Jensen's Inequality that Guarantees Convergence of the EM Algorithm Jensen's Inequality states that given g, a strictly convex function, and X a random variable, then $$E[g(X)] ≥ g(E[X])$$

Progress: 0%%
Description: # Generative Models Based on application, machine learning models can be broadly divided into two categories - Discriminative and Generative. A Discriminative Model is one which classifies, seggregates or differentiates the data. A Generative Model is a model which given a training dataset generates new sample data following the same distribution. It is a class of models belonging to unsupervised classification. <img src="../images/types_of_models.png">

Progress: 0%%
Description: # Markov Network ## Introduction to Markov Random Fields A Markov Network or Markov Random Field is an undirected graph where the nodes represent the random variables and the edges represent the connection between the random variables. These graphs could be cyclic unlike the bayesian networks. <br>

Progress: 0%%
Description: # Probablisitic Inferences in Graphical Models The advantage of PGMs over standard probabilistic ways of determining conditional probability distributions. The advantages of PGMs is that it allows expression of the graphical model as a joint distribution over all random variables. This then allows us to marginalize over the random variables to determine quantities of interest. The joint probability distribution associated with a given graph can be expressed as a product over potential functions associated with subsets of nodes in the graph. ## Bayesian Networks