from IPython.display import Image
Image("images/statistical_test.jpg")
Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modelled by a probability distribution that has a fixed set of parameters.
Parametric tests make certain assumptions about a data set; namely, that the data are drawn from a population with a specific (normal) distribution. Non-parametric tests make fewer assumptions about the data set.
The majority of elementary statistical methods are parametric, and parametric tests generally have higher statistical power. If the necessary assumptions cannot be made about a data set, non-parametric tests can be used.
For example:
Before discussing the details of the parametric statistical tests, we should know
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
Let $\{X_1, \cdots, X_n\}$ be a random sample of size n—that is, a sequence of independent and identically distributed random variables drawn from a distribution of expected value given by $\mu$ and finite variance given by $\sigma^2$.
Suppose we are interested in the sample average, $$ S_n = \dfrac{1}{n}\sum^n_i X_i $$ of these random variables.
For large enough $n$, the distribution of the sample average $S_n$ is close to the normal distribution with mean $\mu$ and variance $\sigma^2/n$.
Proof :
The mean of the sample average is given by $$ E(S_n) = E(\dfrac{1}{n}\sum^n_i X_i) =\dfrac{1}{n}\sum^n_i E(X_i) = \dfrac{1}{n}\sum^n_i \mu = \mu. $$
The variance of the sample average is given by $$ Var(S_n) = Var(\dfrac{1}{n}\sum^n_i X_i)=\dfrac{1}{n^2}\sum^n_i Var(X_i) = \dfrac{1}{n^2}\sum^n_i \sigma^2 =\dfrac{1}{n}\sigma^2. $$
Let $\{X_1, \ldots, X_n \}$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value$\mu$ and variance $\sigma^2$.
Let
$$ \bar X = \frac 1 n \sum_{i=1}^n X_i $$
be the sample mean and let
$$ S^2 = \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)^2 $$
be the sample variance.
has a standard normal distribution (i.e. normal with expected value 0 and variance 1)
The student's $t$-distribution''' has the probability density function given by
$$ f(t) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}, $$
where $\nu$ is the number of degrees of freedom and $\Gamma$ is thr gamma function.
For large values of $\nu (\ge30)$, the $f(t)$ becomes closely the standarized normal distribution
$$ f(t) \to \dfrac{1}{(2\pi)^{1/2}} e^{-t^2/2}. $$
It is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution.
Because of the central limit theorem, many test statistics are approximately normally distributed for large samples.
Proportion test belongs to z-tests because the parameters (the variance $\sigma$ of the random variables is known) for the distributions are known and given by $$ \mu = p, \quad \sigma^2 = p (1-p). $$
One can use the proportional z-test to carry out AB test.
The test statistic for the average sample is a z-score (z): $$ z = \dfrac{\widehat \mu - \mu}{\sigma/\sqrt{n}}= \dfrac{\widehat p - p_0}{\sigma/\sqrt{n}} $$
where
Question: Consider a population that is normally distributed with $\sigma^2=100$. We claim the population mean is $\mu=100$. Use a sample of size $25$ and mean $\mu=105$ to test the claim.
# generate a set of random variables from the normal distribution
# N(mu=100, sigma=10)
data = 10*np.random.randn(25)+100 # mu=100, sigma=10
plt.hist(data)
print("The sample mean={} and sample variance={}".format(data.mean(),data.var()))
# build a ztest function to test the mean value
def ztest4mean(data,mu_0,var_0,alpha,sides):
"""
data -- an array of the sample data
mu_0 -- the null hypothesis value for the (claim) mean
var_0 -- the known variance (sigma^2) of the population
alpha -- confidence level
sides -- 1 for one-tail; 2 for two-tail test
"""
# sample size
n = len(data)
# sample mean
mean = data.mean()
# known variance of the population
s2 = np.sqrt(var_0/n)
if abs(s2)<1.e-20:
print("error in the data")
else:
stats = abs(mean-mu_0)/s2 # also called z-score
p_value = norm.sf(abs(stats))*sides # sf(x, loc=0, scale=1) returns the p value
if p_value > alpha:
print('Fail to reject H0')
else:
print('Reject H0')
return stats,p_value
# hypothesis on the mean
mu_0 = 105
# known variance of the population
var_0 = 100
# confidence level: 1-\alpha
alpha=0.05
# one-tail or two-tail
sides = 2
stats,p = ztest4mean(data,mu_0,var_0,alpha,sides)
print("z score={}, p value={}".format(stats,p))
# hypothesis on the mean
mu_0 = 101
# known variance of the population
var_0 = 100
# confidence level: 1-\alpha
alpha=0.05
# one-tail or two-tail
sides = 2
stats,p = ztest4mean(data,mu_0,var_0,alpha,sides)
print("z score={}, p value={}".format(stats,p))
The XSORT method of gender selection is believed to increases the likelihood of birthing a girl.
14 couples used the XSORT method and resulted in the birth of 13 girls and 1 boy.
Using a 0.05 significance level, test the claim that the XSORT method increases the birth rate of girls.
(Assume the normal birthrate of girls is 0.5)
Analysis: the test statistic $$ z = \dfrac{(p^G - p^G_0)}{\sigma/\sqrt{n}} $$ follows the normal distribution $N(0,1)$, where $\sigma^2=p^G_0 (1-p^G_0)=0.25$ due to the binomial distribution with $n=1$ (or called Bernoulli distribution).
Attention: All Bernoulli distributions are binomial distributions, but most binomial distributions are not Bernoulli distributions.
If
$$ 𝑋=\begin{cases} 1, & {\rm with ~~ probability} ~~ p,\\ 0, & {\rm with ~~ probability} ~~1−p, \end{cases} $$
then the probability distribution of the random variable $𝑋$ is a Bernoulli distribution.
If $𝑋=𝑋_1+ \cdots +𝑋_𝑛$ and each of $𝑋_1,\cdots, 𝑋_𝑛$ has a Bernoulli distribution with the same value of $𝑝$ and they are independent, then $𝑋$ has a binomial distribution, and the possible values of $𝑋$ are $\{0,1,2,3,\cdots,𝑛\}$.
If $𝑛=1$, then that binomial distribution is a Bernoulli distribution.
def ztest4Proportion(size,prob_s,prob_0,alpha,sides):
"""
prob_s -- sample proportion
prob_0 -- the null hypothesis value for the proportion
alpha -- confidence level
sides -- 1 for one-tail; 2 for two-tail test
"""
sigma = np.sqrt(prob_0*(1.0-prob_0))/np.sqrt(size)
stats = abs(prob_s-prob_0)/sigma # also called z-score
p_value = norm.sf(abs(stats))*sides # sf(x, loc=0, scale=1) returns the p value
if p_value > alpha:
print('Fail to reject H0')
else:
print('Reject H0')
return stats,p_value
stats,p_value = ztest4Proportion(100,0.45,0.5,0.05,1)
print("stats={}, p value={}".format(stats,p_value))
Outline for A/B Tests
Control Group (A) and Test Group (B):
np.random.randn(10)
def generate_data4ABtest(n_A, n_B):
assert min(n_A, n_B) > 40
rand1 = np.random.randn(n_A)
data1 = np.array(rand1>0.5).astype(int)
rand2 = np.random.randn(n_B)
data2 = np.array(rand2>0.5).astype(int)
return data1.tolist(),data2.tolist()
n_A = 150
n_B = 150
data1,data2=generate_data4ABtest(n_A, n_B)
A_conv_rate = data1.count(1)/len(data1)
B_conv_rate = data2.count(1)/len(data2)
A_conv_rate,B_conv_rate
According to the properties of the Bernoulli distribution for $X$, the mean and variance are as follows: $$ E(X) = p, ~~ var(X)=p(1-p). $$
According to the central limit theorem, by calculating many sample means we can approximate the true mean of the population, $\mu$, from which the data for the control group was taken. The distribution of the sample means, $p$, will be normally distributed around the true mean $\mu$ with a standard deviation equal to the standard error of the mean.
$$ \widehat p \sim N(\mu=p, \sigma^2/n=\dfrac{p(1-p)}{n}). $$
For two sample cases in the AB test, the null hypothesis $H_0$
The variance of the mean: $$ \sigma^2/n = \dfrac{p_A(1-p_A)}{n_A} + \dfrac{p_B(1-p_B)}{n_B} = \widehat p(1-\widehat p) (\dfrac{1}{n_A}+\dfrac{1}{n_B}),~~ \widehat p = \dfrac{p_An_A + p_Bn_B}{n_A+n_B}. $$
Therefore the test statistic is $$ z = \dfrac{p_A-p_B}{\sigma/\sqrt{n}} $$
def ztest4ABtest(p1, p2, n1, n2, alpha, sides):
"""
propotional test for the same sample size.
p1 -- prob for A
p2 -- prob for B
alpha -- confidence level
sides -- 1 for one-tail; 2 for two-tail test
"""
# avoid singularity problem
assert n1 > 0.1 and n2 > 0.1
# mean proportion
prob = (p1*n1 + p2*n2)/ (n1 + n2) # or mean = B_conv_rate
# standard error
SE = np.sqrt(prob*(1-prob)*(1/n1 + 1/n2))
stats = abs(p1-p2)/SE
p_value = norm.sf(abs(stats))*sides # sf(x, loc=0, scale=1) returns the p value
if p_value > alpha:
print('Fail to reject the H0: namely two performaces are similar at the C.L. of {}%'.format(100*(1-alpha)))
else:
print('Reject H0: namely, two performances are different at the C.L. of {}%'.format(100*(1-alpha)))
return stats,p_value
stats,p = ztest4ABtest(A_conv_rate,B_conv_rate,n_A,n_B,0.05, 2)
print("stats={}, p value={}".format(stats,p))
The AB test can also be performed using the chi-square test
from scipy.stats import chi2_contingency
data = np.array([[data1.count(1),data1.count(0)],
[data2.count(1),data2.count(0)]])
chi,p_value=chi2_contingency(data)[:2]
alpha=0.05
if p_value > alpha:
print('Fail to reject H0')
else:
print('Reject H0')
print("stats={}, p value={}".format(stats,p))
It is shown above that the z-test for the difference in the conversion rate and the chi2-test for the count numbers give the same results.
Assumptions on the data:
use the statistic method ttest_1samp()
# build a ztest function to test the mean value
from scipy import stats
def ttest4mean(data,mu_0,alpha,sides):
"""
data -- an array of the sample data
mu_0 -- the null hypothesis value for the (claim) mean
alpha -- confidence level
sides -- 1 for one-tail; 2 for two-tail test
"""
# sample variance: degree-of-freedom=n-1
sample_var = data.var(ddof=1)
# sample size
n = len(data)
# degree of freedom
dof = n - 1
# sample mean
mean = data.mean()
s2 = np.sqrt(sample_var/n)
if abs(s2)<1.e-20:
print("error in the data")
else:
t_score = abs(mean-mu_0)/s2 # also called z-score
p_value = stats.t.sf(np.abs(t_score), dof)*sides
if p_value > alpha:
print('Fail to reject H0')
else:
print('Reject H0')
return t_score,p_value
# generate data
data = 10*np.random.randn(25)+100 # mu=100, sigma=10
# hypothesis on the mean
mu_0 = 105
# confidence level: 1-\alpha
alpha=0.05
# one-tail or two-tail
sides = 2
t_score,p = ttest4mean(data,mu_0,alpha,sides)
print("t score={}, p value={}".format(t_score,p))
Use package from scipy.stats
from scipy.stats import ttest_1samp
# scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate')[source]
# Returns
# -------
# statistic : float or array
# t-statistic
# pvalue : float or array
# two-tailed p-value
mu_0 = 105
stats,p = ttest_1samp(data,mu_0)
alpha=0.05
if p > alpha:
print('t score={}, p value={}'.format(stats,p))
print('Same distribution (fail to reject H0)')
else:
print('t score={}, p value={}'.format(stats,p))
print('Different distribution (reject H0)')
These tests are often referred to as "unpaired" or "independent samples" ''t''-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.
https://pythonfordatascience.org/welch-t-test-python-pandas/
Welch’s t-test Assumptions:
Keep in Mind: Always use Welch’s t-testto avoid checking if the two samples are from the populations with the same variances.
$$ t \quad = \quad {\; \overline{X}_1 - \overline{X}_2 \; \over \sqrt{ \; {s_1^2 \over N_1} \; + \; {s_2^2 \over N_2} \quad }} $$
where $\overline{X}_1, s_1^2$ and $N_1$ are the 1st sample mean, sample variance and sample size, respectively.
The degrees of freedom (statistics) $\nu$ associated with this variance estimate is approximated using the Welch–Satterthwaite equation:
$$ \nu \quad \approx \quad {{\left( \; {s_1^2 \over N_1} \; + \; {s_2^2 \over N_2} \; \right)^2 } \over { \quad {s_1^4 \over N_1^2 \nu_1} \; + \; {s_2^4 \over N_2^2 \nu_2 } \quad }} $$
Here
To conduct a Welch’s t-test, one needs to use the stats.ttest_ind() method while passing equal_var=False in the argument.
data1 = 5* np.random.randn(100)+50
data2 = 5* np.random.randn(100)+52
# scipy.stats.ttest_ind (a, b, axis=0, equal_var=True, nan_policy='propagate')
from scipy.stats import ttest_ind
stats,p = ttest_ind(data1,data2,equal_var=False)
alpha=0.05
print("Welch's t-test={}\n".format(stats))
print("Welch's p-value={}\n".format(p))
if p > alpha:
print('Same distribution (fail to reject H0)')
else:
print('Different distribution (reject H0)')
def welch_ttest(x, y):
## Welch-Satterthwaite Degrees of Freedom ##
dof = (x.var()/x.size + y.var()/y.size)**2 / ((x.var()/x.size)**2 / (x.size-1) + (y.var()/y.size)**2 / (y.size-1))
t, p = ttest_ind(x, y, equal_var = False)
print("\n",
f"Welch's t-test= {t:.4f}", "\n",
f"p-value = {p:.4f}", "\n",
f"Welch-Satterthwaite Degrees of Freedom= {dof:.4f}")
welch_ttest(data1, data2)
For example:
The sample one $\{X_1,X_2,\cdots,X_n\}$ and sample two $\{Y_1,Y_2,\cdots,Y_n\}$ are paired. In this case, one can creat a sample as the difference between them, namely
$\{X_1 - Y_1, X_2-Y_2,\cdot, X_n-Y_n\}$, for which, one can use one-sample $t$-test to test if the difference in the two samples follows the normal distribution with mean $\mu=0$ and variance $s^2$.
An Analysis of Variance Test or an ANOVA is a generalization of the t-tests to more than 2 groups.
Our null hypothesis states that there are equal means in the populations from which the groups of data were sampled. $$ \mu_1 =\mu_2= \cdots =\mu_n $$
for $n$ groups of data. Our alternative hypothesis would be that any one of the equivalences in the above equation fail to be met.
The ANOVA assumptions:
The following example is from https://pythonfordatascience.org/anova-python/
import pandas as pd
import scipy.stats as stats
import researchpy as rp
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
# Loading data
df = pd.read_csv("https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/difficile.csv")
df.drop('person', axis= 1, inplace= True)
# Recoding value from numeric to string
df['dose'].replace({1: 'placebo', 2: 'low', 3: 'high'}, inplace= True)
# Gettin summary statistics
rp.summary_cont(df['libido'])
rp.summary_cont(df['libido'].groupby(df['dose']))
stats.f_oneway(df['libido'][df['dose'] == 'high'],
df['libido'][df['dose'] == 'low'],
df['libido'][df['dose'] == 'placebo'])
results = ols('libido ~ C(dose)', data=df).fit()
results.summary()
Overall the model is significiant, F(2,12)= 5.12, p = 0.0247. This tells us that there is a significant difference in the group means.
aov_table = sm.stats.anova_lm(results, typ=2)
aov_table
stats.levene(df['libido'][df['dose'] == 'placebo'],
df['libido'][df['dose'] == 'low'],
df['libido'][df['dose'] == 'high'])
Levene’s test for homogeneity of variance is not significant which indicates that the groups have equal variances.
In order to be able to use parameteric statistic tests, one should first test the assumptions:
It assumes that the continuous variables to be used in the analysis are normally distributed. Normal distributions are symmetric around the center (a.k.a., the mean) and follow a ‘bell-shaped’ distribution. Normality is typically assessed in the examination of mean differences (e.g., t-tests and analyses of variance – ANOVAs/MANOVAs) and prediction analyses (e.g., linear regression analyses). Normality can be examined by several methods, two of which are the Kolmogorov Smirnov (KS) test and the examination of skew and kurtosis. The KS test utilizes the z test statistic, and if the corresponding p value is less than .05 (statistical significance), then the assumption of normality is not met. Also, normality can be defined as skew below ± 2.0 and kurtosis below ± 7.0, and if the observed values exceed these boundaries, then the assumption of normality is not met.
Python packages:
It refers to a linear, or straight line, relationship between the independent variable(s) and the dependent variable. If the assumption of linearity is not met, then predictions may be inaccurate. Linearity is typically assessed in Pearson correlation analyses and regression analyses. Linearity can be assessed by the examination of scatter plots.
It refers to equal variances across different groups or samples. Equality of variance is usually examined when assessing for mean differences on an independent grouping variable (e.g., t-tests and analyses of variance – ANOVAs/MANOVAs). Equality of variance can be assessed by utilizing Levene’s test for each continuous, dependent variable. Levene’s test utilizes the F test, and if the corresponding p value is less than .05 (statistical significance), then the assumption of equality of variance is not met.
Python packages: