Nonparametric Statistical Significance Tests in Python¶

Nonparametric statistics are those methods that do not assume a specific distribution to the data.

Often, they refer to statistical methods that do not assume a Gaussian distribution. They were developed for use with ordinal or interval data, but in practice can also be used with a ranking of real-valued observations in a data sample rather than on the observation values themselves.

To test whether the two data samples have the same or different distributions.
The null hypothesis of these tests is often the assumption that both samples were drawn from a population with the same distribution, and therefore the same population parameters, such as mean or median.
If after calculating the significance test on two or more samples the null hypothesis is rejected, it indicates that there is evidence to suggest that samples were drawn from different populations, and in turn the difference between sample estimates of population parameters, such as means or medians may be significant.
Tests also return a $p$-value that can be used to interpret the result of the test. The $p$-value can be thought of as the probability of observing the two data samples given the base assumption (null hypothesis) that the two samples were drawn from a population with the same distribution.
The p-value can be interpreted in the context of a chosen significance level called $\alpha$. A common value for alpha is 5% or 0.05. If the p-value is below the significance level, then the test says there is enough evidence to reject the null hypothesis and that the samples were likely drawn from populations with differing distributions.
$p <= \alpha$: reject H0, different distribution.
$p > \alpha$: fail to reject H0, same distribution.

# generate gaussian data samples
from numpy.random import seed
from numpy.random import randn
from numpy import mean
from numpy import std

# seed the random number generator
seed(1)
# generate two sets of univariate observations
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51
# summarize
print('data1: mean=%.3f stdv=%.3f' % (mean(data1), std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (mean(data2), std(data2)))

data1: mean=50.303 stdv=4.426
data2: mean=51.764 stdv=4.660

The Mann-Whitney U test (for two independent samples)¶

The Mann-Whitney U test is a nonparametric statistical significance test for determining whether two independent samples were drawn from a population with the same distribution.

More specifically, the test determines whether it is equally likely that any randomly selected observation from one sample will be greater or less than a sample in the other distribution. If violated, it suggests differing distributions.

Fail to Reject H0: Sample distributions are equal.
Reject H0: Sample distributions are not equal.

from scipy.stats import mannwhitneyu

# compare samples
stat, p = mannwhitneyu(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distribution (fail to reject H0)')
else:
    print('Different distribution (reject H0)')

Statistics=4025.000, p=0.009
Different distribution (reject H0)

Wilcoxon Signed-Rank Test (for two paired samples/ one sample)¶

The samples are related or matched in some way or represent two measurements of the same technique. More specifically, each sample is independent, but comes from the same population.

The Wilcoxon signed ranks test is a nonparametric statistical procedure for comparing two samples that are paired, or related.
The parametric equivalent to the Wilcoxon signed ranks test goes by names such as the Student’s t-test, t-test for matched pairs, t-test for paired samples, or t-test for dependent samples.

The default assumption for the test, the null hypothesis, is that the two samples have the same distribution.

Fail to Reject H0: Sample distributions are equal.
Reject H0: Sample distributions are not equal.

For the test to be effective, it requires at least 20 observations in each data sample.

two paired samples (e.g. before and after treatments)¶

from scipy.stats import wilcoxon
# compare samples
stat, p = wilcoxon(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distribution (fail to reject H0)')
else:
    print('Different distribution (reject H0)')

Statistics=1886.000, p=0.028
Different distribution (reject H0)

one sample: testing the difference between the two samples¶

stat, p = wilcoxon(data1-data2)

print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distribution (fail to reject H0)')
else:
    print('Different distribution (reject H0)')

Statistics=1886.000, p=0.028
Different distribution (reject H0)

Kruskal-Wallis H Test (ANOVA: for more than two independent samples)¶

The Kruskal-Wallis test is a nonparametric version of the one-way analysis of variance test or ANOVA for short. It is named for the developers of the method, William Kruskal and Wilson Wallis. This test can be used to determine whether more than two independent samples have a different distribution. It can be thought of as the generalization of the Mann-Whitney U test.

When the Kruskal-Wallis H-test leads to significant results, then at least one of the samples is different from the other samples. However, the test does not identify where the difference(s) occur. Moreover, it does not identify how many differences occur. To identify the particular differences between sample pairs, a researcher might use sample contrasts, or post hoc tests, to analyze the specific sample pairs for significant difference(s). The Mann-Whitney U-test is a useful method for performing sample contrasts between individual sample sets.

Fail to Reject H0: All sample distributions are equal.
Reject H0: One or more sample distributions are not equal.

import numpy as np
data1 = np.random.randn(100)*5 + 50
data2 = np.random.randn(100)*5 + 50
data3 = np.random.randn(100)*5 + 52

from scipy.stats import kruskal

stats,p=kruskal(data1,data2,data3)
print('Statistics=%.3f, p=%.3f' % (stat, p))

alpha=0.05

if p>alpha:
    print('Same distribution (fail to reject H0)')
else:
    print('Different distribution (reject H0)')

Statistics=1886.000, p=0.001
Different distribution (reject H0)

Friedman Test (repeated measures ANOVA: for more than two paired samples)¶

The test assumes two or more paired data samples with 10 or more samples per group.

The Friedman test is a nonparametric statistical procedure for comparing more than two samples that are related. The parametric equivalent to this test is the repeated measures analysis of variance (ANOVA). When the Friedman test leads to significant results, at least one of the samples is different from the other samples.

The default assumption, or null hypothesis, is that the multiple paired samples have the same distribution. A rejection of the null hypothesis indicates that one or more of the paired samples has a different distribution.

Fail to Reject H0: Paired sample distributions are equal.
Reject H0: Paired sample distributions are not equal.

from scipy.stats import friedmanchisquare

# seed the random number generator
seed(1)
# generate three independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 50
data3 = 5 * randn(100) + 52
# compare samples
stat, p = friedmanchisquare(data1, data2, data3)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=9.360, p=0.009
Different distributions (reject H0)