Complete guide
Use the calculator above to upload a CSV or Excel file, select a numeric column and get a full Minitab-style analysis: descriptive statistics, an Anderson-Darling normality test, histogram, box plot and Q-Q plot — with specific guidance if your data is non-normal. It is a complete first-pass diagnostic before any deeper Six Sigma analysis.
What it is
What is dataset analysis?
A Dataset Analysis is the structured exploratory phase of any statistical study. It combines descriptive statistics (mean, median, σ, skewness, kurtosis), a formal normality test, and visualisations (histogram, box plot, Q-Q plot) to characterise the data before any modelling, capability calculation or hypothesis test is run.
Calculation logic
How the calculation works
The tool computes the standard descriptive statistics, runs the Anderson-Darling test for normality (more powerful than Shapiro-Wilk for tails), and generates three plots. If the data fails the normality test, it provides specific guidance — try a Box-Cox transformation, switch to non-parametric tests, or use non-normal capability methods.
Common mistakes
Watch-outs before using dataset analysis
- Skipping the diagnostic step and running parametric tests on non-normal data.
- Stripping outliers without first checking they are real values rather than data-entry errors.
- Reporting mean and standard deviation on heavily skewed data — median and IQR are more honest.
- Treating a normality test pass as proof of normality — it just fails to find evidence of non-normality.
- Using normal-based capability indices (Cpk, Ppk) on visibly non-normal data without transformation.
What to do next
Turn the result into action
If the data is normal, proceed with standard capability, hypothesis tests and DOE. If non-normal, transform (Box-Cox or log), switch to non-parametric methods, or use non-normal capability techniques. Either way, document the choice in the project file.
What is descriptive statistics?
A summary of a data set using measures of central tendency (mean, median), spread (σ, IQR, range) and shape (skewness, kurtosis). It is the starting point of any analysis.
What is the Anderson-Darling test?
A formal hypothesis test for normality, more sensitive in the tails than the Shapiro-Wilk test. A small p-value (typically < 0.05) indicates the data is not normally distributed.
What if my data is non-normal?
Three options: (1) transform the data (Box-Cox, log), (2) switch to non-parametric tests (Mann-Whitney, Kruskal-Wallis), or (3) use non-normal capability methods. The tool flags the recommended path.
Why use a Q-Q plot?
A Q-Q plot reveals where data deviates from a reference distribution. A straight line means normal; bends in the tails reveal skew or heavy tails. It is more diagnostic than any single test statistic.
Should I always transform non-normal data?
Not always. If the analysis you plan is robust to non-normality (e.g. ANOVA on large samples), transformation may be unnecessary. The right answer is to match the method to the data, not force the data to fit the method.