Analysis and Inference
Significance testing, Bayesian methods, sharper predictions, and what it takes to build a real theory.
Choose your analysis framework before you collect data, and commit to it. This page covers the two dominant statistical frameworks, how to make your predictions more informative, and what separates a theory from a description.
Two Statistical Frameworks
Null Hypothesis Significance Testing
NHST asks: if there were truly no effect, how surprising is the data I observed? The answer is the p-value. A small p-value (below a threshold such as .05 or .01) leads you to reject the null hypothesis.
Two errors are always in play:
- Type I error (false positive): concluding there is an effect when there is not. Controlled by your alpha (for example, 5%).
- Type II error (false negative): missing an effect that is really there. Tied to statistical power.
A p-value tells you how incompatible the data are with the null hypothesis. It is not the probability that your hypothesis is true, and statistical significance is not the same as practical importance.
Bayesian Statistics
Bayesian analysis asks a different question: given the data, how much should I believe each competing hypothesis? The Bayes factor quantifies the relative evidence for one hypothesis over another.
A key advantage: Bayesian methods can provide evidence for a null (no-effect) hypothesis, not just fail to reject it. NHST can never confirm "no effect"; it can only fail to find one. When distinguishing "no evidence of an effect" from "evidence of no effect" matters, Bayesian methods are the better fit.
| NHST | Bayesian | |
|---|---|---|
| Core output | p-value | Bayes factor / posterior |
| Can support the null? | No | Yes |
| Needs priors? | No | Yes |
| Common tools | Most stats packages, G*Power for power | JASP, R, Stan |
Neither framework is universally correct. Pick the one that answers your question, and state it in your preregistration.
Make Sharper Predictions
A theory that predicts only "there will be a difference" is hard to falsify, because almost any non-zero result confirms it. You make your theory more useful, and more testable, by predicting the size of an effect:
- Point prediction: the effect will be approximately X.
- Range prediction: the effect will fall between X and Y.
Specific predictions give clear falsification criteria and stop you from reporting trivially small but statistically significant results as meaningful.
From Findings to Theory
A single significant result is a finding, not a theory. Aim for converging evidence: results that hold across different methods, measures, and samples. Triangulating with multiple approaches guards against any one method's quirks.
A genuine theory explains more than what happened. It accounts for:
- Why the effect occurs (the mechanism).
- How it operates (the process).
- When it holds and when it breaks down (the boundary conditions).
Raw data, a list of variables, or a bare hypothesis are inputs to a theory, not the theory itself.