It is a measure of likeliness that an event will occur. The probability that event A occurs is denoted by P(A) or Pr(A).
Probability axioms:
$$ 0 \leq P(E)\leq 1\quad, \forall E \in \{events\} $$
$$ P(\Omega) = 1 $$
$$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$
The probability of two independent events $E$ and $F$ (flip coins) happening is given by
$$ P(E, F) = P(E) * P(F) $$
Given that the event $F$ happens, the probability of the event $E$ happens reads
$$ P(E, F) = P(E|F) * P(F) \neq P(E) * P(F) $$
where the $P(E|Y)$ is the probability of $E$, "conditional on $F$" .
Based on the fact $$ P(E, F) = P(F, E), $$ one finds the relation $$ P(E|F) * P(F) = P(F|E) * P(E) $$ which can be rewritten as
\begin{align} P(E|F) &= \dfrac{P(F|E) * P(E)}{P(F)} \nonumber\\ &= \dfrac{P(F|E) * P(E)}{P(F|E)*P(E) + P(F|\bar E)*P(\bar E)}, \end{align}
with $P(\bar E) = 1 - P(E)$.
The probability of the random variable falling within a particular range of values. This probability is given by the integral of this variable’s pdf over that range.
The standard normal distribution has probability density: $$ f(x) = \frac{1}{\sqrt{2\pi}\sigma}\; e^{-(x-\mu)^2/(2\sigma^2)}. $$
The $f(x)$ should satisfy:
$$ f(x)\ge0,\quad \int f(x) dx =1. $$
$$ P(a\leq x \leq b) = \int^b_a f(x)dx $$
The probability of the $X$ taking a value less than or equal to $x$
$$ F_X(x) = P(X\leq x)=\int^x_{-\infty} f_X(t)dt $$
If a random variable $x$ is given and its distribution admits a probability density function $f(x)$, then the expected (mean) value of $x$ (if the expected value exists) can be calculated as $$ \operatorname{E}[x] = \int_{-\infty}^\infty x\,f(x)\,dx. $$
The variance of the random variable $X$ is defined as $$ V[x] = E[(x-E[x])^2] = E[x^2]-E^2[x] $$
where $c$ is a constant.
The standard deviation is defined as $$ D[x] = \sqrt{V[x]}. $$
$$ \mu_k = E[x^k] $$
$$ M_x(t) = E[e^{tx}]=\sum_x e^{tx}f(x), $$ where $$ e^{tx} = 1 + tx + \dfrac{t^2x^2}{2!}+\cdots $$ one can prove that $\dfrac{d^{(k)}}{d^k t}M_x(t)\vert_{t=0}=\mu_k$, from which one finds
The central limit theorem states that when an infinite number of successive random samples are taken from a population, the sampling distribution of the means of those samples will become approximately normally distributed with mean $\mu$ and standard deviation $\sigma/\sqrt{N}$ as the sample size (N) becomes larger, irrespective of the shape of the population distribution.
$$ z = \dfrac{\bar x-\mu}{\sigma/\sqrt{n}} \sim N(0,1) $$
import matplotlib.pyplot as plt
from scipy.stats import norm
import numpy as np
import collections
$$ \Pr(X=1) = p = 1 - \Pr(X=0) = 1 - q. $$
$$ f(k, p) = \begin{cases} p & \text{if }k=1, \\ q = 1-p & \text {if } k = 0. \end{cases} $$
Note: $\{X_1, X_2,\cdot, X_n\}$ are $n$ independent variables and thus
\begin{eqnarray} \mu_1 = E[X] \equiv \dfrac{1}{n}\sum^n_i X_i = \sum^n_{k=0} k \cdot f(k, n, p) \end{eqnarray} where the probability of getting exactly $k$ successes ($X=1$) in n trials is given by the probability mass function $$ f(k, n, p) = C^k_n p^k (1-p)^{n-k}. \quad $$
The above relation can be derived in a simpler way as $$ \mu = E[X] = E[X_1+X_2+\cdot+X_n]= \sum^n_{i=1}E[X_i]=np. $$
\begin{align} V[X] \equiv E[X^2] - E^2[X] = \sum^n_{i=1} (E[X^2_i] -E^2[X_i]) &=\sum^n_{i=1} V[X_i] \nonumber\\ &= \sum^n_{i=1} p(1-p)\nonumber\\ &=np(1-p). \end{align}
The moment-generating function $M_x(t)$ of $B(n,p)$ can be obtained by using the binomial theorem,
$$ M_x(t) = \sum_x e^{tx}f(x) =\sum^n_{x=0} e^{tx} C^x_n p^x q^{n-x} =\sum^n_{x=0} C^x_n (pe^{t})^x q^{n-x} =(pe^t+q)^n. $$ The mean and variance are simply given by
where $$ \mu_2=M^{(2)}(t)\vert_{t=0}=n(n-1)p^2+np $$
# =1 with probability p
# =0 with probability (1-p)
# use the np.random.random() to return random floats in the half-open interval [0.0, 1.0)
def bernoulli_trial(p):
return 1 if np.random.random()<p else 0
# sum over the x values for each 100 times:
# binomial(n,p) returns the sum^100_i x_i, where x_i=1 or 0.
def binomial(n,p):
return sum(bernoulli_trial(p) for _ in range(n)) # generate n numbers of data; use _ to ignore the index
def make_hist(p,n,num_points):
# data collects the frequencies (/number_points) of occuring values obtained from binomial(n,p)
data = [binomial(n,p) for _ in range(num_points)]
histogram = collections.Counter(data)
print(len(data))
plt.bar([x-0.4 for x in histogram.keys()],
[v/num_points for v in histogram.values()])
z = np.linspace(norm.ppf(0.0001), norm.ppf(0.99999), 100) # Percent point function (inverse of cdf — percentiles).
fz = norm.pdf(z)
mu = n*p
sig2 = n*p*(1.0-p)
x = np.array(np.sqrt(sig2)*z+mu)
fx = fz/np.sqrt(sig2)
plt.plot(x, fx,'r-', lw=5, alpha=0.6, label='norm pdf')
make_hist(0.5,100,10000) # p=0.5, q=0.5, n=100
def make_hist_abs(p,n,num_points):
data = [binomial(n,p) for _ in range(num_points)] # num_points of data
histogram = collections.Counter(data)
print(len(data))
plt.bar([x-0.4 for x in histogram.keys()],
[v for v in histogram.values()])
make_hist_abs(0.2,100,10000)
fig,(ax1,ax2)= plt.subplots(2,1)
x = np.linspace(norm.ppf(0.01), norm.ppf(0.99), 100) # Percent point function (inverse of cdf — percentiles).
ax1.plot(x, norm.pdf(x),'r-', lw=5, alpha=0.6, label='norm pdf')
ax1.set_ylabel('pdf',size=20)
ax2.plot(x, norm.cdf(x))
ax2.set_ylabel('cdf',size=20)
ax2.set_xlabel('x',size=20)
Student's $t$-distribution (or simply the $t$-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.
Let $X_1, \cdots, X_n$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value $\mu$ and variance $\sigma^2$.
The sample mean: $\bar X = \frac 1 n \sum_{i=1}^n X_i$
The sample variance: $S^2 = \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)^2$
Then the random variable $t=\frac{ \bar X - \mu } { \sigma /\sqrt{n}}$ has a standard normal distribution (i.e. normal with expected value 0 and variance 1), and the random variable $t=\frac{ \bar X - \mu} {S /\sqrt{n}}$ (where $S$ has been substituted for $\sigma$ ) has a Student's $t$-distribution with $n-1$ degrees of freedom.
$$ f(t, \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}} $$
where $\nu$ is the number of degrees of freedom (statistics) and $\Gamma$ is the gamma function.
As $\nu = \infty$ (central limit theorem), the pdf of the $t$-distribution becomes normal distribution $$ f(t, \infty) = \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}}. $$ Note: the variance of $t$ distribution depends on the defree-of-freedom $\nu$.
from scipy.stats import t
# degree of freedom nu
fig,(ax1,ax2)=plt.subplots(1,2,figsize=(14,6))
dofs = [2,10,20,100]
t_all = np.linspace(-10,10,100)
linestyles = ['-','--',':','-.']
variance=[]
for line,dof in zip(linestyles,dofs):
# ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew and ‘k’ = (Fisher’s) kurtosis. (default=’mv’)
mean, var, skew, kurt = t.stats(dof, moments='mvsk')
variance.append(var)
print("For nu={}, mean={},variance={},skew={},kurt={}".format(dof,mean, var, skew, kurt))
ax1.plot(t_all,t.pdf(t_all,dof),line,label=r"$\nu$={}".format(dof))
ax1.plot(t_all,norm.pdf(t_all),'k',label="normal")
ax1.legend(loc=0)
ax1.set_xlim(-5,5)
ax1.set_ylabel(r'$f(t,\nu)$',size=14)
ax1.set_xlabel(r'$t$',size=14)
ax2.plot(dofs,variance)
ax2.set_ylim(1,1.3)
ax2.set_xticks(dofs)
ax2.set_ylabel('variance',size=14)
ax2.set_xlabel(r'$\nu$',size=14)
$$ \alpha = P(t>t_{\alpha}(\nu)) = \int^\infty_{t_{\alpha}(\nu)} f(t, \nu) dt $$
from scipy.integrate import simps
t_all = np.linspace(-10,10,100)
dof = 10
plt.title('t distribution',size=20)
plt.plot(t_all, t.pdf(t_all,dof),'k-', lw=5, label=r'$\nu={}$'.format(dof))
plt.xlim(-4,+4)
plt.ylim(0,0.6)
#Fills the area under the curve
alpha=0.025
t_alpha = round(-t.ppf(alpha,dof),3)
section = np.arange(t_alpha, 30, 1/2000)
percent_data = round(simps(t.pdf(section,dof),section),3)
plt.fill_between(section, t.pdf(section,dof),0)
plt.annotate(r'$\alpha$ = {}'.format(percent_data), xy=(3, 0.1), ha='center',size=14)
plt.xticks([0, t_alpha], [0, r'$t_\alpha={}$'.format(t_alpha)] )
plt.legend(loc=0)
dof=10
alpha=[0.005,0.025,0.05]
z=t.ppf(p,dof)
print(z)
independence: $f(x,y)=g(x)*h(y)$.
$$ g(x) = \int dy f(x,y), \\ h(y)=\int dx f(x,y) $$
$$ g(x|y)=\dfrac{f(x,y)}{h(y)} $$
A contingency table summarizes information of multiple discrete random variables
The posterior probability ${\rm Pr}(x|y)$ of $x$ is calculated by $$ {\rm Pr}(x|y)=\dfrac{{\rm Pr}(y|x)*{\rm Pr}(x)}{{\rm Pr}(y)} =\dfrac{{\rm Pr}(y|x)*{\rm Pr}(x)}{{\rm Pr}(y|x){\rm Pr}(x) +{\rm Pr}(y|\bar x){\rm Pr}(\bar x)} $$ where ${\rm Pr}(x)$ is called prior probability of $x$, the probability of cause $x$ before effect $y$ is known.
$$ V[x+y]=V[x]+V[y]+2{\rm cov}[x,y] $$ where ${\rm cov}[x,y]$ is the covariance of $x$ and $y$ $$ {\rm cov}[x,y] = E[x-E[x]]* E[y-E[y]] $$
Generally, assuming $$\mathbf{X}=(X_1, X_2, ... , X_n)^{\mathrm T}$$ are random variables, each with finite variance, then the variance–covariance matrix $\operatorname{K}_{\mathbf{X}\mathbf{X}}$ is the matrix whose $(i,j)$ entry is the covariance:
$$ \operatorname{K}_{X_i X_j} = \operatorname{cov}[X_i, X_j] = \operatorname{E}[(X_i - \operatorname{E}[X_i])(X_j - \operatorname{E}[X_j])] $$ where the operator $\operatorname{E}$ denotes the expected value (mean) of its argument.
For $$\operatorname{K}_{\mathbf{X}\mathbf{X}}=\operatorname{V}(\mathbf{X}) = \operatorname{E} \left[ \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right) \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right)^{\rm T} \right]$$
and $\mathbf{\mu_X} = \operatorname{E}$(\textbf{X}), where $\mathbf{X} = (X_1,\ldots,X_n)$ is a $n$-dimensional random variable, the following basic properties apply
$$\operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}(\mathbf{X X^{\rm T}}) - \mathbf{\mu_X}\mathbf{\mu_X}^{\rm T} $$
The correlation coeffcient $\rho_{x,y}$ between $x$ and $y$, $$ \rho_{x,y}=\rho=\dfrac{K_{xy}}{\sqrt{K_{xx}}\cdot \sqrt{K_{yy}}}=\dfrac{K_{xy}}{D[x]\cdot D[y]} $$
It may be the case that variables are correlated (not independent).
Suppose one constructs an order-$n$ Gaussian vector out of random variables $(x_1,\ldots,x_n)$, where each variable has means given by $(\mu_1, \ldots, \mu_n)$. Let the covariance matrix be denoted by $\mathit\Sigma$.
The joint probability density function of these $n$ random variables is then given by:
$$ f(x_1,\ldots,x_n)=\frac{1}{(2\pi)^{n/2}\sqrt{\text{det}(\mathit\Sigma)}} \exp\left( -\frac{1}{2} \left[x_1-\mu_1,\ldots,x_n-\mu_n\right]\mathit\Sigma^{-1}\left[x_1-\mu_1,\ldots,x_n-\mu_n\right]^\mathrm{T} \right) $$
If the correlation coefficient $\rho=0$, the two variables become independent $$ f(x,y) = \frac{1}{2\pi \sigma_x \sigma_y} \exp\left[ -\frac{1}{2} \left(\frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2}\right) \right] =f(x)*f(y) $$
with the probability of obtaining each side $\mathbf{p} = (p_1, \cdots, p_d )^T$,
Let $\mathbf{x} = (x^{(1)},\cdots, x^{(d)})^T$ be the number of times each side appears when the dice is thrown $n$ times, where
$$ \sum_i x^{(i)} = n. $$ The probability distribution that $x$ follows is the multinomial distribution and is denoted by Mult(n, p)
$$ f(\mathbf{x}) =\dfrac{n!}{x^{(1)}! x^{(2)}!\cdots x^{(d)!}} p^{x^{(1)}}_1p^{x^{(2)}}_2\cdots p^{x^{(d)}}_d $$
When $d = 2$, Mult(n, p) is reduced to Bi(n, $p_1$).
normalization: $$ \sum_{\mathbf{x}} f(\mathbf{x}) = (p_1+p_2+\cdots p_d)^n = 1. $$
moment-generating function of Mult(n, p): $$ M_{\mathbf{x}}(t) = E[e^{\mathbf{t}^T\mathbf{x}}]=\sum_{\mathbf{x}} f(\mathbf{x})e^{\mathbf{t}^T\mathbf{x}} =(p_1 e^{t_1}+p_2e^{t_2}+\cdots+p_de^{t_d})^n. $$
$$ {\rm Cov}[x^{(i)}, x^{(j)}] =\begin{cases} np_i(1-p_i), & i=j \\ -np_ip_j, & i\neq j \end{cases} $$
The random variable $X$ is the sum of two values of 1-6. Thus, the random variable $X$ takes the value of: 2,3,4,...,12 with the probability $(1/6)^2, 2\times (1/6)^2, 3\times (1/6)^2, ..., \times (1/6)^2$.
\begin{align} E[X] &= 2\times (1/36) + 3\times (2/36) + 4\times (3/36) + 5\times (4/36) + 6\times (5/36) \nonumber\\ &+ 7\times (6/36) + 8\times (5/36)+ 9\times (4/36)+ 10\times (3/36)+ 11\times (2/36) + 12\times (1/36) \nonumber\\ &=7. \end{align}
Or it can be calculated simply with $E[X]= 2\times [1+2+3+\cdots+6]/6=2\times (1+6)\times6/2/6=7$.
$$ \sigma^2 = \sum_i (X_i-E[X])^2 \times p_i. $$ Note: $p_i$, instead of $1/N$ is used here because the probability $p_i$ of each $X_i$ is different.
X_dist = pd.DataFrame(index=[2,3,4,5,6,7,8,9,10,11,12])
X_dist['prob'] = [1,2,3,4,5,6,5,4,3,2,1]
X_dist['prob']=X_dist['prob']/36
X_dist
compute the population mean and population variance
mean = pd.Series(X_dist.index*X_dist.prob).sum()
variance = pd.Series((X_dist.index-mean)**2*X_dist.prob).sum()
mean,variance
Sampling 5000 times which generates 5000 samples
each time takes two values from [0,1,2,3,4,5,6] randomly and calculate their sum.
import pandas as pd
die = pd.DataFrame([1, 2, 3, 4, 5, 6])
trial = 5000
results = [die.sample(2, replace=True).sum().loc[0] for i in range(trial)]
results[:10]
compute the sample mean and sample variance using the methods of pd.Series():
.mean() and .var()
pd.Series(results).mean(), pd.Series(results).var()
freq = pd.DataFrame(results)[0].value_counts()
sort_result=freq.sort_index()
sort_result
sort_result.plot(kind='bar')