Graphical Models with PGMPY

Progress: 0%

Probability and Bayesian Theory | Continue |
---|

Progress: 0%%

Description: # Probability and Bayesian Theory ## Additive and Multiplicative Rules of Probability Consider $A$,$B$ are two events, and $P$ denotes the probability of occurence of an event.

Read more..

Description: # Probability and Bayesian Theory ## Additive and Multiplicative Rules of Probability Consider $A$,$B$ are two events, and $P$ denotes the probability of occurence of an event.

Read more..

Maximum Likelihood Estimation (MLE) | Continue |
---|

Progress: 0%%

Description: # Maximum Likelihood Estimation (MLE) * A model hypothesizes a relation between the unknown parameter(s) and the observed data. * The goal of a statistical analysis is to estimate the unknown parameter(s) in the hypothetical model * The likelihood function is a popular and latest method to estimate unknown parameters.

Read more..

Description: # Maximum Likelihood Estimation (MLE) * A model hypothesizes a relation between the unknown parameter(s) and the observed data. * The goal of a statistical analysis is to estimate the unknown parameter(s) in the hypothetical model * The likelihood function is a popular and latest method to estimate unknown parameters.

Read more..

Conjugate Priors & The Beta Distribution | Continue |
---|

Progress: 0%%

Description: # Conjugate Priors & The Beta Distribution Consider a scenario where we count the number of heads in a coin toss that conforms to a bernoulli distribution. Suppose, the number of samples are low such as an event with 5 coin tosses where we obtain 4 heads and 1 tails, or all 5 heads, then the Maximum Likelihood Estimate (MLE) of the bernoulli distribution dictates the probability of coin toss to be either a 0.8 or 1.0 respectively. However, we know that the unbiased coin has a probability of 0.5 and hence the MLE isn't the accurate estimate and infact much farther from it. Therefore, we must search for a probabability distribution for the coin toss activity so that our final estimate is more realistic. One of the ways to obtain accurate estimates is by conducting more experiments and in our case, more coin tosses so that the probability of the coin toss converges to its true value. To resolve this issue, we can assume that the probability of the coin toss arises from a distribution. This prior distribution is known as conjugate prior. The conjugate prior distribution has interesting properties and has similar form as the binomial distribution or the posterior. We can now provide a bayesian treatment to this probability distribution. Probability of obtaining k heads while N toss experiments are carried out and the probability of the coin being x and data, \theta is given by: $$ p(\theta | x) = \cfrac{p(x|\theta)p(\theta)}{\int_0^1 p(x|\theta')p(\theta')}$$

Read more..

Description: # Conjugate Priors & The Beta Distribution Consider a scenario where we count the number of heads in a coin toss that conforms to a bernoulli distribution. Suppose, the number of samples are low such as an event with 5 coin tosses where we obtain 4 heads and 1 tails, or all 5 heads, then the Maximum Likelihood Estimate (MLE) of the bernoulli distribution dictates the probability of coin toss to be either a 0.8 or 1.0 respectively. However, we know that the unbiased coin has a probability of 0.5 and hence the MLE isn't the accurate estimate and infact much farther from it. Therefore, we must search for a probabability distribution for the coin toss activity so that our final estimate is more realistic. One of the ways to obtain accurate estimates is by conducting more experiments and in our case, more coin tosses so that the probability of the coin toss converges to its true value. To resolve this issue, we can assume that the probability of the coin toss arises from a distribution. This prior distribution is known as conjugate prior. The conjugate prior distribution has interesting properties and has similar form as the binomial distribution or the posterior. We can now provide a bayesian treatment to this probability distribution. Probability of obtaining k heads while N toss experiments are carried out and the probability of the coin being x and data, \theta is given by: $$ p(\theta | x) = \cfrac{p(x|\theta)p(\theta)}{\int_0^1 p(x|\theta')p(\theta')}$$

Read more..

Bayesian Theory Problems | Continue |
---|

Progress: 0%%

Description: # Bayesian Theory Problems ### Binary Communication System Consider a binary communication system where the input is either a 0 or a 1 with probability p. The receiver has an error with probability $\epsilon$ which would mean that the received data gets flipped [Alberto-1]. This can be illustrated as shown in the figure below:

Read more..

Description: # Bayesian Theory Problems ### Binary Communication System Consider a binary communication system where the input is either a 0 or a 1 with probability p. The receiver has an error with probability $\epsilon$ which would mean that the received data gets flipped [Alberto-1]. This can be illustrated as shown in the figure below:

Read more..

Joint Distributions | Continue |
---|

Progress: 0%%

Description: # Joint Distributions Consider two discrete random variables X and Y. The function given by f (x, y) = P(X = x, Y = y) for each pair of values (x, y) within the range of X is called the joint probability distribution of X and Y.

Read more..

Description: # Joint Distributions Consider two discrete random variables X and Y. The function given by f (x, y) = P(X = x, Y = y) for each pair of values (x, y) within the range of X is called the joint probability distribution of X and Y.

Read more..

Independent and Identical Random Variables | Continue |
---|

Progress: 0%%

Description: # Independent and Identical Random Variables When observations drawn from a random sample are independent of each other, but have the same probability distribution, then the observations are said to be independent and identical random variables. Lets say from a population of transactions, we drew a random sample and we try to observe if the transaction is a fradulent transaction or not. Assuming that the probability of a transaction to be 'p', we can say each transaction from the random sample, has the same probability distribution but the transactions themselves are independent of each other. Hence, these are said to be independent and identical random variables. An interesting point to note is that the random variables are said to be independent only when the joint probability distribution of the variables is a product of the individual marginal probability distributions. (For further reading refer to the following link: https://inst.eecs.berkeley.edu/~cs70/sp13/notes/n17.pdf)

Read more..

Description: # Independent and Identical Random Variables When observations drawn from a random sample are independent of each other, but have the same probability distribution, then the observations are said to be independent and identical random variables. Lets say from a population of transactions, we drew a random sample and we try to observe if the transaction is a fradulent transaction or not. Assuming that the probability of a transaction to be 'p', we can say each transaction from the random sample, has the same probability distribution but the transactions themselves are independent of each other. Hence, these are said to be independent and identical random variables. An interesting point to note is that the random variables are said to be independent only when the joint probability distribution of the variables is a product of the individual marginal probability distributions. (For further reading refer to the following link: https://inst.eecs.berkeley.edu/~cs70/sp13/notes/n17.pdf)

Read more..

Bayesian Networks | Continue |
---|

Fraud Modeling Example with pgmpy | Continue |
---|

Progress: 0%%

Description: # Fraud Modeling Example with pgmpy pgmpy is one of the popular packages to do Bayesian Network modeling. We shall continue to use the fraud modeling example to visualize our network. pgmpy is good for simpler problems, to visualize the indepencies and CPDs. It doesn't work very well for large dimensional problems. There are other toolkits which are available such as: * WINMINE by Microsoft: https://www.microsoft.com/en-us/research/project/winmine-toolkit/

Read more..

Description: # Fraud Modeling Example with pgmpy pgmpy is one of the popular packages to do Bayesian Network modeling. We shall continue to use the fraud modeling example to visualize our network. pgmpy is good for simpler problems, to visualize the indepencies and CPDs. It doesn't work very well for large dimensional problems. There are other toolkits which are available such as: * WINMINE by Microsoft: https://www.microsoft.com/en-us/research/project/winmine-toolkit/

Read more..

Features of a Bayesian Network | Continue |
---|

Progress: 0%%

Description: # Features of a Bayesian Network So far we have seen that: * A Bayesian Network is a joint probability distribution of a set of random variables.

Read more..

Description: # Features of a Bayesian Network So far we have seen that: * A Bayesian Network is a joint probability distribution of a set of random variables.

Read more..

Credit Approval Model using a Bayesian Network | Continue |
---|

Progress: 0%%

Description: # Credit Approval Model using a Bayesian Network Let us look at a credit approval process example. Please note that the model/process shown here does not closely follow any real life approval process. This model is a completely generated from scratch solely for the purpose of practice and easy explanation. There are two factors, Outstanding Loan (OL) and Payment History (PH) which are independent of each other and influence another factor Credit Rating (CR). Credit Rating and Income Level (IL) are in turn two independent factors which influence Interest Rate (IR) of a credit line that would be extended to a customer. Depending upon CR and IL, a customer may receive a credit/loan at a premium rate, par rate or discounted interest rate.

Read more..

Description: # Credit Approval Model using a Bayesian Network Let us look at a credit approval process example. Please note that the model/process shown here does not closely follow any real life approval process. This model is a completely generated from scratch solely for the purpose of practice and easy explanation. There are two factors, Outstanding Loan (OL) and Payment History (PH) which are independent of each other and influence another factor Credit Rating (CR). Credit Rating and Income Level (IL) are in turn two independent factors which influence Interest Rate (IR) of a credit line that would be extended to a customer. Depending upon CR and IL, a customer may receive a credit/loan at a premium rate, par rate or discounted interest rate.

Read more..

Gibbs Sampling | Continue |
---|

Progress: 0%%

Description: # Gibbs Sampling Bayesian inference generates full posterior probability distribution over a set of random variables. Gibbs Sampling algorithm is based on Monte Carlo Markov Chain (MCMC) technique. The underlying logic of MCMC sampling is that we can estimate any desired expectation by ergodic averages [Gibbs]. ## Gibbs Sampling in pgmpy

Read more..

Description: # Gibbs Sampling Bayesian inference generates full posterior probability distribution over a set of random variables. Gibbs Sampling algorithm is based on Monte Carlo Markov Chain (MCMC) technique. The underlying logic of MCMC sampling is that we can estimate any desired expectation by ergodic averages [Gibbs]. ## Gibbs Sampling in pgmpy

Read more..

Gaussian Mixture Models (GMMs) | Continue |
---|

Progress: 0%%

Description: # Gaussian Mixture Models (GMMs) ## The Three Archers - Not So 'normal' Data In an archery training session, three archers are told to shoot at the target. Assume that the archers are shooting the same arrows and later at the end of the competition they need to count their scores. What would be the best estimate of their scores? Assume that inner yellow has the highest score and the score lowers as the circle your arrow lands in moves away from the center.

Read more..

Description: # Gaussian Mixture Models (GMMs) ## The Three Archers - Not So 'normal' Data In an archery training session, three archers are told to shoot at the target. Assume that the archers are shooting the same arrows and later at the end of the competition they need to count their scores. What would be the best estimate of their scores? Assume that inner yellow has the highest score and the score lowers as the circle your arrow lands in moves away from the center.

Read more..

Jensen's Inequality that Guarantees Convergence of the EM Algorithm | Continue |
---|

Progress: 0%%

Description: # Jensen's Inequality that Guarantees Convergence of the EM Algorithm Jensen's Inequality states that given g, a strictly convex function, and X a random variable, then $$ E[g(X)] ≥ g(E[X]) $$

Read more..

Description: # Jensen's Inequality that Guarantees Convergence of the EM Algorithm Jensen's Inequality states that given g, a strictly convex function, and X a random variable, then $$ E[g(X)] ≥ g(E[X]) $$

Read more..

Generative Models | Continue |
---|

Progress: 0%%

Description: # Generative Models Based on application, machine learning models can be broadly divided into two categories - Discriminative and Generative. A Discriminative Model is one which classifies, seggregates or differentiates the data. A Generative Model is a model which given a training dataset generates new sample data following the same distribution. It is a class of models belonging to unsupervised classification. <img src="../images/types_of_models.png">

Read more..

Description: # Generative Models Based on application, machine learning models can be broadly divided into two categories - Discriminative and Generative. A Discriminative Model is one which classifies, seggregates or differentiates the data. A Generative Model is a model which given a training dataset generates new sample data following the same distribution. It is a class of models belonging to unsupervised classification. <img src="../images/types_of_models.png">

Read more..

Markov Network | Continue |
---|

Progress: 0%%

Description: # Markov Network ## Introduction to Markov Random Fields A Markov Network or Markov Random Field is an undirected graph where the nodes represent the random variables and the edges represent the connection between the random variables. These graphs could be cyclic unlike the bayesian networks. <br>

Read more..

Description: # Markov Network ## Introduction to Markov Random Fields A Markov Network or Markov Random Field is an undirected graph where the nodes represent the random variables and the edges represent the connection between the random variables. These graphs could be cyclic unlike the bayesian networks. <br>

Read more..

Probablisitic Inferences in Graphical Models | Continue |
---|

Progress: 0%%

Description: # Probablisitic Inferences in Graphical Models The advantage of PGMs over standard probabilistic ways of determining conditional probability distributions. The advantages of PGMs is that it allows expression of the graphical model as a joint distribution over all random variables. This then allows us to marginalize over the random variables to determine quantities of interest. The joint probability distribution associated with a given graph can be expressed as a product over potential functions associated with subsets of nodes in the graph. ## Bayesian Networks

Read more..

Description: # Probablisitic Inferences in Graphical Models The advantage of PGMs over standard probabilistic ways of determining conditional probability distributions. The advantages of PGMs is that it allows expression of the graphical model as a joint distribution over all random variables. This then allows us to marginalize over the random variables to determine quantities of interest. The joint probability distribution associated with a given graph can be expressed as a product over potential functions associated with subsets of nodes in the graph. ## Bayesian Networks

Read more..

Markov Models | Continue |
---|

Progress: 0%%

Description: # Markov Models ## Introduction to Markov Model A Markov Model is a stochastic model that models sequential or temporal data. In other words, it is used for modelling events that may occur repeatedly over time or predictable events that occur over time.

Read more..

Description: # Markov Models ## Introduction to Markov Model A Markov Model is a stochastic model that models sequential or temporal data. In other words, it is used for modelling events that may occur repeatedly over time or predictable events that occur over time.

Read more..

Continue |
---|

Boltzmann_Machines | Continue |
---|

Progress: 0%%

Description: # Boltzmann_Machines A Boltzmann machine is a stochastic, recurrent neural network, consisting of atleast two layers - visible and hidden. The visible and hidden layers consist visible (**V<sub>1</sub>, V<sub>2</sub>, V<sub>3</sub>...V<sub>n</sub>**) and hiddens nodes (**H<sub>1</sub>, H<sub>2</sub>, H<sub>3</sub>...H<sub>m</sub>**). Each node in the visible layer is connected to every node in the hidden layer. A Restricted Boltzmann machine is a type of Boltzmann machine where visible-visible and hidden-hidden relationships, i.e., relationships within a layer are restricted. Restriction reduces the number of connections between input nodes (visible layer is the input nodes layer) and hidden nodes and hence enables practical application of a Boltzmann machine. <img src="../images/BM.png", style="height:50vh;">

Read more..

Description: # Boltzmann_Machines A Boltzmann machine is a stochastic, recurrent neural network, consisting of atleast two layers - visible and hidden. The visible and hidden layers consist visible (**V<sub>1</sub>, V<sub>2</sub>, V<sub>3</sub>...V<sub>n</sub>**) and hiddens nodes (**H<sub>1</sub>, H<sub>2</sub>, H<sub>3</sub>...H<sub>m</sub>**). Each node in the visible layer is connected to every node in the hidden layer. A Restricted Boltzmann machine is a type of Boltzmann machine where visible-visible and hidden-hidden relationships, i.e., relationships within a layer are restricted. Restriction reduces the number of connections between input nodes (visible layer is the input nodes layer) and hidden nodes and hence enables practical application of a Boltzmann machine. <img src="../images/BM.png", style="height:50vh;">

Read more..