Statistics

P-Value Calculator

Calculate p-values for Z-tests, T-tests, and Chi-square tests — one-tailed or two-tailed. Includes a live distribution curve with the rejection region, significance verdict at α=0.05 and 0.01, and plain-English interpretation.

P-value definitions
Z-test:  p = 2Φ(−|z|)  (two-tail)
T-test:  p = 2·Ix(df/2, ½)
χ²:    p = 1 − Γ(df/2, x/2) / Γ(df/2)

Enter your test statistic

Select the test type and tail direction, then enter your statistic.

Calculated as (x̄ − μ₀) / (σ / √n) Enter a valid z-statistic.
(x̄ − μ₀) / (s / √n) Enter a valid t-statistic.
One-sample: n−1   Two-sample: n₁+n₂−2 Enter a positive integer for df.
Σ[(O − E)² / E] Enter a non-negative chi-square statistic.
(rows−1)×(cols−1) or k−1 Enter a positive integer for df.
📉

Ready to calculate

Select the test type, enter your statistic (or raw inputs), and press Calculate to get the exact p-value with distribution curve and significance verdict.

P-value
What this means

Distribution Curve
Shaded region = p-value area
Distribution curve
P-value region
Test statistic
Evidence strength reference
Where does your p-value fall on the scale of statistical evidence?
How it works

Understanding p-values

p

What a p-value is

The p-value is the probability of observing results at least as extreme as yours, if the null hypothesis were true. It is not the probability that H₀ is true, nor the probability that the result is due to chance. A small p-value (typically <0.05) means the data would be unlikely under H₀, giving evidence to reject it.

α

Significance level vs p-value

The significance level α (commonly 0.05 or 0.01) is set before collecting data. You reject H₀ when p < α. Choosing α=0.05 means you accept a 5% chance of a false positive (Type I error). A smaller α reduces false positives but increases the risk of missing a real effect (Type II error, power).

Hypothesis testing in Six Sigma

In DMAIC Analyse and Improve phases, hypothesis tests determine whether a factor has a statistically significant effect on the output. Common tests: one-sample t (is the mean at target?), two-sample t (do two groups differ?), chi-square (are defect counts independent of category?). P < 0.05 is standard; for critical decisions use α = 0.01.