Introduction to probability

Probability

It is a measure of likeliness that an event will occur. The probability that event A occurs is denoted by P(A) or Pr(A).

Probability axioms:

  • Non-negative:

$$ 0 \leq P(E)\leq 1\quad, \forall E \in \{events\} $$

  • Unitarity: For entire sample space $\Omega$

$$ P(\Omega) = 1 $$

  • additive law:

$$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$

Independence

The probability of two independent events $E$ and $F$ (flip coins) happening is given by

$$ P(E, F) = P(E) * P(F) $$

Dependence and conditional probability

Given that the event $F$ happens, the probability of the event $E$ happens reads

$$ P(E, F) = P(E|F) * P(F) \neq P(E) * P(F) $$

where the $P(E|Y)$ is the probability of $E$, "conditional on $F$" .

Bayes

Based on the fact $$ P(E, F) = P(F, E), $$ one finds the relation $$ P(E|F) * P(F) = P(F|E) * P(E) $$ which can be rewritten as

\begin{align} P(E|F) &= \dfrac{P(F|E) * P(E)}{P(F)} \nonumber\\ &= \dfrac{P(F|E) * P(E)}{P(F|E)*P(E) + P(F|\bar E)*P(\bar E)}, \end{align}

with $P(\bar E) = 1 - P(E)$.

Probability density (mass) function (pdf)

The probability of the random variable falling within a particular range of values. This probability is given by the integral of this variable’s pdf over that range.

The standard normal distribution has probability density: $$ f(x) = \frac{1}{\sqrt{2\pi}\sigma}\; e^{-(x-\mu)^2/(2\sigma^2)}. $$

The $f(x)$ should satisfy:

$$ f(x)\ge0,\quad \int f(x) dx =1. $$

$$ P(a\leq x \leq b) = \int^b_a f(x)dx $$

Cumulative density function (cdf)

The probability of the $X$ taking a value less than or equal to $x$

$$ F_X(x) = P(X\leq x)=\int^x_{-\infty} f_X(t)dt $$

Mean, variance and standard deviation

If a random variable $x$ is given and its distribution admits a probability density function $f(x)$, then the expected (mean) value of $x$ (if the expected value exists) can be calculated as $$ \operatorname{E}[x] = \int_{-\infty}^\infty x\,f(x)\,dx. $$

  • $E[c*x]$ = $c*E[x]$
  • $E[x+b]$ = $E[x]+b$

The variance of the random variable $X$ is defined as $$ V[x] = E[(x-E[x])^2] = E[x^2]-E^2[x] $$

  • $V[c]$=$0$
  • $V[x$+$c]$=$V[x]$
  • $V[c$*$x]$=$c^2V[x]$

where $c$ is a constant.

The standard deviation is defined as $$ D[x] = \sqrt{V[x]}. $$

The skewness and kurtosis

  • skewness: $E[(x-E[x])^3]/D^3[x]$
  • kurtosis: $E[(x-E[x])^4]/D^4[x]-3$, where $-3$ is introduced so that the kurtosis of the normal distribution is zero.

Moments

  • The $k$-th moments

$$ \mu_k = E[x^k] $$

  • The $k$-th central moments: a moment of a probability distribution of a random variable about the random variable's mean $$ \nu_k = E[(x-E[x])^k] $$

The moment-generating function

$$ M_x(t) = E[e^{tx}]=\sum_x e^{tx}f(x), $$ where $$ e^{tx} = 1 + tx + \dfrac{t^2x^2}{2!}+\cdots $$ one can prove that $\dfrac{d^{(k)}}{d^k t}M_x(t)\vert_{t=0}=\mu_k$, from which one finds

  • mean $E[X]=\mu_1$
  • variance $V[X]=E[X^2]-E^2[X]=\mu_2-\mu^2_1$

The central limit theorem

The central limit theorem states that when an infinite number of successive random samples are taken from a population, the sampling distribution of the means of those samples will become approximately normally distributed with mean $\mu$ and standard deviation $\sigma/\sqrt{N}$ as the sample size (N) becomes larger, irrespective of the shape of the population distribution.

$$ z = \dfrac{\bar x-\mu}{\sigma/\sqrt{n}} \sim N(0,1) $$

In [3]:
import matplotlib.pyplot as plt
from scipy.stats import norm
import numpy as np
import collections

An example: Binomial distribution

  1. Binomial distribution: the discrete probability distribution of the number of successes in a sequence of $n$ independent experiments, each asking a yes–no question with outcome of a random variable containing a single bit of information: success/yes/true/one (with probability $p$) or failure/no/false/zero (with probability $q = 1 − p$).
  1. For $n=1$ (a single success/failure experiment) case, the Binomial distribution is simplified to be Bernoulli distribution. If $X$ is a random variable with this distribution, then:

$$ \Pr(X=1) = p = 1 - \Pr(X=0) = 1 - q. $$

  • The pdf of this distribution over possible outcomes $X=k$, is

$$ f(k, p) = \begin{cases} p & \text{if }k=1, \\ q = 1-p & \text {if } k = 0. \end{cases} $$

  • The mean $\mu_1=E[X]=p$ and the standard deviation $$ V[X] = E[X^2]-E^2[X] = E[X^2] - \mu^2 = \sum_{k=0,1} f(k, p)\cdot k^2 -p^2 = p - p^2 = p(1-p). $$
  1. Considering a set of binomially distributed $n$ random variables $\{X_1, X_2,\cdot, X_n\}$, denoted as $X\sim B(n,p)$. Each variable $X_i$ has the $p$ probability of taking the value 1 and $q=1-p$ probability of taking the value 0.

Note: $\{X_1, X_2,\cdot, X_n\}$ are $n$ independent variables and thus

\begin{align} E[X] = E[\sum_i X_i] = \sum_i E[X_i], \\ V [X] = V [\sum_i X_i] = \sum_i V[X_i]. \end{align}
  • The mean of the $n$ random variables is

\begin{eqnarray} \mu_1 = E[X] \equiv \dfrac{1}{n}\sum^n_i X_i = \sum^n_{k=0} k \cdot f(k, n, p) \end{eqnarray} where the probability of getting exactly $k$ successes ($X=1$) in n trials is given by the probability mass function $$ f(k, n, p) = C^k_n p^k (1-p)^{n-k}. \quad $$

\begin{align} \mu &= \sum_{k=0}^n k f(k)=\sum_{k=0}^n k\binom nk p^k (1-p)^{n-k}\\ &= \sum_{k=0}^n k\frac{n(n-1)!}{(n-k)!k!}p\cdot p^{k-1} (1-p)^{(n-1)-(k-1)}\\ &= np\sum_{k=1}^n \frac{(n-1)!}{((n-1)-(k-1))!(k-1)!}p^{k-1} (1-p)^{(n-1)-(k-1)}\\ &= np\sum_{\ell=0}^{n-1} \binom{n-1}\ell p^\ell (1-p)^{(n-1)-\ell} && \text{with } \ell:=k-1\\ &= np(p+(1-p))^m \\ &=np \end{align}

The above relation can be derived in a simpler way as $$ \mu = E[X] = E[X_1+X_2+\cdot+X_n]= \sum^n_{i=1}E[X_i]=np. $$

  • The standard deviation of binomially distributed variables reads

\begin{align} V[X] \equiv E[X^2] - E^2[X] = \sum^n_{i=1} (E[X^2_i] -E^2[X_i]) &=\sum^n_{i=1} V[X_i] \nonumber\\ &= \sum^n_{i=1} p(1-p)\nonumber\\ &=np(1-p). \end{align}

The moment-generating function $M_x(t)$ of $B(n,p)$ can be obtained by using the binomial theorem,

$$ M_x(t) = \sum_x e^{tx}f(x) =\sum^n_{x=0} e^{tx} C^x_n p^x q^{n-x} =\sum^n_{x=0} C^x_n (pe^{t})^x q^{n-x} =(pe^t+q)^n. $$ The mean and variance are simply given by

  • $E[X]=\mu_1=M^{(1)}(t)\vert_{t=0}=n(pe^t+q)^{n-1}p\vert_{t=0}=np$
  • $V[X]=\mu_2-\mu^2_1=n(n-1)p^2+np -n^2p^2 = np-np^2 = np(1-p)=npq$,

where $$ \mu_2=M^{(2)}(t)\vert_{t=0}=n(n-1)p^2+np $$

In [1]:
# =1 with probability p 
# =0 with probability (1-p)
# use the np.random.random() to return random floats in the half-open interval [0.0, 1.0)
def bernoulli_trial(p):
    return 1 if np.random.random()<p else 0

# sum over the x values for each 100 times:
# binomial(n,p) returns the sum^100_i x_i, where x_i=1 or 0.
def binomial(n,p):
    return sum(bernoulli_trial(p) for _ in range(n)) # generate n numbers of data; use _ to ignore the index

def make_hist(p,n,num_points):
    # data collects the frequencies (/number_points) of occuring values obtained from binomial(n,p)
    data = [binomial(n,p) for _ in range(num_points)]  
    histogram = collections.Counter(data)
    print(len(data))
    plt.bar([x-0.4 for x in histogram.keys()],
           [v/num_points for v in histogram.values()])

    z = np.linspace(norm.ppf(0.0001), norm.ppf(0.99999), 100) # Percent point function (inverse of cdf — percentiles).
    fz  = norm.pdf(z)
    mu  = n*p
    sig2 = n*p*(1.0-p) 
    x   = np.array(np.sqrt(sig2)*z+mu)
    fx  = fz/np.sqrt(sig2) 
    plt.plot(x, fx,'r-', lw=5, alpha=0.6, label='norm pdf')
In [35]:
make_hist(0.5,100,10000) # p=0.5, q=0.5, n=100
10000
In [54]:
def make_hist_abs(p,n,num_points):
    data = [binomial(n,p) for _ in range(num_points)] # num_points of data
    histogram = collections.Counter(data)
    print(len(data))
    plt.bar([x-0.4 for x in histogram.keys()],
           [v for v in histogram.values()]) 
    
In [56]:
make_hist_abs(0.2,100,10000)
10000
In [28]:
fig,(ax1,ax2)= plt.subplots(2,1)
x = np.linspace(norm.ppf(0.01), norm.ppf(0.99), 100) # Percent point function (inverse of cdf — percentiles).
ax1.plot(x, norm.pdf(x),'r-', lw=5, alpha=0.6, label='norm pdf')
ax1.set_ylabel('pdf',size=20)
ax2.plot(x, norm.cdf(x))
ax2.set_ylabel('cdf',size=20)
ax2.set_xlabel('x',size=20)
Out[28]:
Text(0.5,0,'x')

$t$ distribution

Student's $t$-distribution (or simply the $t$-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.

Let $X_1, \cdots, X_n$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value $\mu$ and variance $\sigma^2$.

  • The sample mean: $\bar X = \frac 1 n \sum_{i=1}^n X_i$

  • The sample variance: $S^2 = \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)^2$

Then the random variable $t=\frac{ \bar X - \mu } { \sigma /\sqrt{n}}$ has a standard normal distribution (i.e. normal with expected value 0 and variance 1), and the random variable $t=\frac{ \bar X - \mu} {S /\sqrt{n}}$ (where $S$ has been substituted for $\sigma$ ) has a Student's $t$-distribution with $n-1$ degrees of freedom.

The pdf of the student's $t$-distribution

$$ f(t, \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}} $$

where $\nu$ is the number of degrees of freedom (statistics) and $\Gamma$ is the gamma function.

As $\nu = \infty$ (central limit theorem), the pdf of the $t$-distribution becomes normal distribution $$ f(t, \infty) = \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}}. $$ Note: the variance of $t$ distribution depends on the defree-of-freedom $\nu$.

In [36]:
from scipy.stats import t

# degree of freedom nu

fig,(ax1,ax2)=plt.subplots(1,2,figsize=(14,6))
dofs = [2,10,20,100]
t_all = np.linspace(-10,10,100) 
linestyles = ['-','--',':','-.']
variance=[]
for line,dof in zip(linestyles,dofs): 
    #  ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew and ‘k’ = (Fisher’s) kurtosis. (default=’mv’)
    mean, var, skew, kurt = t.stats(dof, moments='mvsk')
    variance.append(var)
    print("For nu={}, mean={},variance={},skew={},kurt={}".format(dof,mean, var, skew, kurt))
    ax1.plot(t_all,t.pdf(t_all,dof),line,label=r"$\nu$={}".format(dof)) 

ax1.plot(t_all,norm.pdf(t_all),'k',label="normal")
ax1.legend(loc=0)
ax1.set_xlim(-5,5)
ax1.set_ylabel(r'$f(t,\nu)$',size=14)
ax1.set_xlabel(r'$t$',size=14)
ax2.plot(dofs,variance)
ax2.set_ylim(1,1.3) 
ax2.set_xticks(dofs)
ax2.set_ylabel('variance',size=14)
ax2.set_xlabel(r'$\nu$',size=14)
For nu=2, mean=0.0,variance=inf,skew=nan,kurt=nan
For nu=10, mean=0.0,variance=1.25,skew=0.0,kurt=1.0
For nu=20, mean=0.0,variance=1.1111111111111112,skew=0.0,kurt=0.375
For nu=100, mean=0.0,variance=1.0204081632653061,skew=0.0,kurt=0.0625
Out[36]:
Text(0.5,0,'$\\nu$')

$\alpha$ value of $t$ distribution

$$ \alpha = P(t>t_{\alpha}(\nu)) = \int^\infty_{t_{\alpha}(\nu)} f(t, \nu) dt $$

In [58]:
from scipy.integrate import simps

t_all = np.linspace(-10,10,100) 
dof = 10
plt.title('t distribution',size=20)
plt.plot(t_all, t.pdf(t_all,dof),'k-', lw=5, label=r'$\nu={}$'.format(dof)) 
plt.xlim(-4,+4)
plt.ylim(0,0.6)

#Fills the area under the curve

alpha=0.025
t_alpha = round(-t.ppf(alpha,dof),3)
section = np.arange(t_alpha, 30, 1/2000) 
percent_data = round(simps(t.pdf(section,dof),section),3)


plt.fill_between(section, t.pdf(section,dof),0)
plt.annotate(r'$\alpha$ = {}'.format(percent_data), xy=(3, 0.1), ha='center',size=14)


plt.xticks([0, t_alpha],  [0, r'$t_\alpha={}$'.format(t_alpha)] )
plt.legend(loc=0)
Out[58]:
<matplotlib.legend.Legend at 0x1a1b356c88>
In [59]:
dof=10
alpha=[0.005,0.025,0.05]
z=t.ppf(p,dof)
print(z)
[-3.16927267 -2.22813885 -1.81246112]

Multi-dimensional distributions

Joint probability distribution

  • $f(x,y)\geq0$
  • $\sum_{x,y}f(x,y)=1$

independence: $f(x,y)=g(x)*h(y)$.

Marginal probability density functions

$$ g(x) = \int dy f(x,y), \\ h(y)=\int dx f(x,y) $$

Conditional probability distribution

$$ g(x|y)=\dfrac{f(x,y)}{h(y)} $$

  • mean: $E[x|y]=\int x g(x|y) dx$
  • variance: $V[x|y]=E[(x-E[x|y])^2|y]$

Contigency table

A contingency table summarizes information of multiple discrete random variables

Bayesian theorem

The posterior probability ${\rm Pr}(x|y)$ of $x$ is calculated by $$ {\rm Pr}(x|y)=\dfrac{{\rm Pr}(y|x)*{\rm Pr}(x)}{{\rm Pr}(y)} =\dfrac{{\rm Pr}(y|x)*{\rm Pr}(x)}{{\rm Pr}(y|x){\rm Pr}(x) +{\rm Pr}(y|\bar x){\rm Pr}(\bar x)} $$ where ${\rm Pr}(x)$ is called prior probability of $x$, the probability of cause $x$ before effect $y$ is known.

Covariance and correlation

$$ V[x+y]=V[x]+V[y]+2{\rm cov}[x,y] $$ where ${\rm cov}[x,y]$ is the covariance of $x$ and $y$ $$ {\rm cov}[x,y] = E[x-E[x]]* E[y-E[y]] $$

  • positively correlated: ${\rm cov}[x,y]>0$
  • negatively correlated: ${\rm cov}[x,y]<0$
  • uncorrelated: ${\rm cov}[x,y]\sim 0$

Generally, assuming $$\mathbf{X}=(X_1, X_2, ... , X_n)^{\mathrm T}$$ are random variables, each with finite variance, then the variance–covariance matrix $\operatorname{K}_{\mathbf{X}\mathbf{X}}$ is the matrix whose $(i,j)$ entry is the covariance:

$$ \operatorname{K}_{X_i X_j} = \operatorname{cov}[X_i, X_j] = \operatorname{E}[(X_i - \operatorname{E}[X_i])(X_j - \operatorname{E}[X_j])] $$ where the operator $\operatorname{E}$ denotes the expected value (mean) of its argument.

For $$\operatorname{K}_{\mathbf{X}\mathbf{X}}=\operatorname{V}(\mathbf{X}) = \operatorname{E} \left[ \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right) \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right)^{\rm T} \right]$$

and $\mathbf{\mu_X} = \operatorname{E}$(\textbf{X}), where $\mathbf{X} = (X_1,\ldots,X_n)$ is a $n$-dimensional random variable, the following basic properties apply

$$\operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}(\mathbf{X X^{\rm T}}) - \mathbf{\mu_X}\mathbf{\mu_X}^{\rm T} $$

The correlation coeffcient $\rho_{x,y}$ between $x$ and $y$, $$ \rho_{x,y}=\rho=\dfrac{K_{xy}}{\sqrt{K_{xx}}\cdot \sqrt{K_{yy}}}=\dfrac{K_{xy}}{D[x]\cdot D[y]} $$

Non-independent variables

It may be the case that variables are correlated (not independent).

Suppose one constructs an order-$n$ Gaussian vector out of random variables $(x_1,\ldots,x_n)$, where each variable has means given by $(\mu_1, \ldots, \mu_n)$. Let the covariance matrix be denoted by $\mathit\Sigma$.

The joint probability density function of these $n$ random variables is then given by:

$$ f(x_1,\ldots,x_n)=\frac{1}{(2\pi)^{n/2}\sqrt{\text{det}(\mathit\Sigma)}} \exp\left( -\frac{1}{2} \left[x_1-\mu_1,\ldots,x_n-\mu_n\right]\mathit\Sigma^{-1}\left[x_1-\mu_1,\ldots,x_n-\mu_n\right]^\mathrm{T} \right) $$

  • In the two variable case, $$ \mathit\Sigma^{-1} = \dfrac{1}{\text{det}(\mathit\Sigma)} \begin{bmatrix} K_{\mathit{XX}} & K_{\mathit{XY}} \\ K_{\mathit{YX}} & K_{\mathit{YY}} \end{bmatrix} = \dfrac{1}{\text{det}(\mathit\Sigma)} \begin{bmatrix} \sigma^2_x & \operatorname{cov}(X,Y) \\ \operatorname{cov}(X,Y) & \sigma^2_y \end{bmatrix} $$ where $\operatorname{cov}(X,Y)=\rho \sigma_x\sigma_y$ and $\dfrac{1}{\text{det}(\mathit\Sigma)}=\sigma^2_x\sigma^2_y(1-\rho^2)$. The joint probability density function is given by: $$f(x,y) = \frac{1}{2\pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp\left[ -\frac{1}{2(1-\rho^2)} \left(\frac{(x-\mu_x)^2}{\sigma_x^2} - \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y} + \frac{(y-\mu_y)^2}{\sigma_y^2}\right) \right] $$

If the correlation coefficient $\rho=0$, the two variables become independent $$ f(x,y) = \frac{1}{2\pi \sigma_x \sigma_y} \exp\left[ -\frac{1}{2} \left(\frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2}\right) \right] =f(x)*f(y) $$

An example: $d$-sided dice

with the probability of obtaining each side $\mathbf{p} = (p_1, \cdots, p_d )^T$,

  • $0\leq p_i \leq1$.
  • $\sum^d_i p_i=1$

Let $\mathbf{x} = (x^{(1)},\cdots, x^{(d)})^T$ be the number of times each side appears when the dice is thrown $n$ times, where

$$ \sum_i x^{(i)} = n. $$ The probability distribution that $x$ follows is the multinomial distribution and is denoted by Mult(n, p)

$$ f(\mathbf{x}) =\dfrac{n!}{x^{(1)}! x^{(2)}!\cdots x^{(d)!}} p^{x^{(1)}}_1p^{x^{(2)}}_2\cdots p^{x^{(d)}}_d $$

When $d = 2$, Mult(n, p) is reduced to Bi(n, $p_1$).

normalization: $$ \sum_{\mathbf{x}} f(\mathbf{x}) = (p_1+p_2+\cdots p_d)^n = 1. $$

moment-generating function of Mult(n, p): $$ M_{\mathbf{x}}(t) = E[e^{\mathbf{t}^T\mathbf{x}}]=\sum_{\mathbf{x}} f(\mathbf{x})e^{\mathbf{t}^T\mathbf{x}} =(p_1 e^{t_1}+p_2e^{t_2}+\cdots+p_de^{t_d})^n. $$

  • Mean: $E[x^{(j)}]=np_j$
  • Covariance:

$$ {\rm Cov}[x^{(i)}, x^{(j)}] =\begin{cases} np_i(1-p_i), & i=j \\ -np_ip_j, & i\neq j \end{cases} $$

Rolling dice twice

The random variable $X$ is the sum of two values of 1-6. Thus, the random variable $X$ takes the value of: 2,3,4,...,12 with the probability $(1/6)^2, 2\times (1/6)^2, 3\times (1/6)^2, ..., \times (1/6)^2$.

  • The mean value is given by

\begin{align} E[X] &= 2\times (1/36) + 3\times (2/36) + 4\times (3/36) + 5\times (4/36) + 6\times (5/36) \nonumber\\ &+ 7\times (6/36) + 8\times (5/36)+ 9\times (4/36)+ 10\times (3/36)+ 11\times (2/36) + 12\times (1/36) \nonumber\\ &=7. \end{align}

Or it can be calculated simply with $E[X]= 2\times [1+2+3+\cdots+6]/6=2\times (1+6)\times6/2/6=7$.

  • The variance is given by

$$ \sigma^2 = \sum_i (X_i-E[X])^2 \times p_i. $$ Note: $p_i$, instead of $1/N$ is used here because the probability $p_i$ of each $X_i$ is different.

In [26]:
X_dist = pd.DataFrame(index=[2,3,4,5,6,7,8,9,10,11,12])
X_dist['prob'] = [1,2,3,4,5,6,5,4,3,2,1] 
X_dist['prob']=X_dist['prob']/36
In [27]:
X_dist
Out[27]:
prob
2 0.027778
3 0.055556
4 0.083333
5 0.111111
6 0.138889
7 0.166667
8 0.138889
9 0.111111
10 0.083333
11 0.055556
12 0.027778

compute the population mean and population variance

In [29]:
mean = pd.Series(X_dist.index*X_dist.prob).sum()
variance = pd.Series((X_dist.index-mean)**2*X_dist.prob).sum()
mean,variance
Out[29]:
(6.999999999999998, 5.833333333333333)

Sampling 5000 times which generates 5000 samples

each time takes two values from [0,1,2,3,4,5,6] randomly and calculate their sum.

In [18]:
import pandas as pd

die = pd.DataFrame([1, 2, 3, 4, 5, 6])
trial = 5000

results = [die.sample(2, replace=True).sum().loc[0] for i in range(trial)] 
In [50]:
results[:10]
Out[50]:
[7, 7, 4, 4, 9, 5, 5, 6, 12, 6]

compute the sample mean and sample variance using the methods of pd.Series():

.mean() and .var()

In [35]:
pd.Series(results).mean(), pd.Series(results).var()
Out[35]:
(7.041, 5.804079815963295)
In [19]:
freq = pd.DataFrame(results)[0].value_counts()
sort_result=freq.sort_index()
sort_result
Out[19]:
2     129
3     268
4     429
5     551
6     670
7     812
8     703
9     569
10    458
11    289
12    122
Name: 0, dtype: int64
In [21]:
sort_result.plot(kind='bar')
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x121b19400>