The concept of size effect in the light of Neyman-Pearson’s theory of testing statistical hypothesis


  • Wiesław Szymczak Uniwersytet Łódzki, Wydział Nauk o Wychowaniu, Instytut Psychologii, Zakład Metodologii Badań Psychologicznych i Statystyki image/svg+xml



theories of statistical hypothesis testing, probability, power of test, empirical power of test, effect size


The aim of this study is to draw the attention of researchers using statistical methods in the analysis of the results of their research on the combination of two different theories testing statistical hypothesis, Fisher’s theory and Neyman-Pearson’s theory. Including in the presently used statistical instruments, ideas of both of these theories, causes that the vast majority of researchers without a moment’s thought, acknowledge that the smaller the probability the stronger relationship. The study presents the weaknesses of Neyman-Pearson’s theory and the resulting problems with decision-making as a result of the conducted tests. These problems have become a justified quest for less unreliable solutions, however, the proposed measures of the size effect as using on one hand dogma about the relationship between the degree of probability in the test and the strength of dependence, on the other, lack of any theoretical basis of this solution, seem to be another pseudo solution to actual problems. Moreover, the use of measures of size effect seems to be an attempt to free researchers from the profound thinking about the results obtained from the statistical analysis. A trivial recipe was established: the corresponding value of the measures instantly implies the strength of the relationship – this approach seems unworthy of the researcher.


Agresti A. (1990). Categorical Data Analysis. New York: John Wiley and Sons.
Google Scholar

Allen J., Le H. (2007). An additive measure of overall effect size for logistic regression models. Journal of Educational and Behavioral Statistics, 33, 416–441.
Google Scholar DOI:

Anscombe F. J., Aumann R. J. (1963). A definition of subjective probability. The Annals of Mathematical Statistics, 34 (1), 199–205.
Google Scholar DOI:

APA (2010). Publication Manual, 6th ed. Washington: American Psychological Association.
Google Scholar

Berger J. O. (2003). Could Fisher, Jefreys and Neyman have agreed on testing? Statistical Sciences, 18 (1), 1–32.
Google Scholar DOI:

Blalock H. M. (1975). Statystyka dla socjologów. Warszawa: PWN.
Google Scholar

Blume J. D. (2002). Likelihood methods for measuring statistical evidence. Statistics in Medicine, 21, 2563–2599.
Google Scholar DOI:

Christensen R. (2005). Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician, 59 (2), 121–126.
Google Scholar DOI:

Chinn S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19 (22), 3127–3131.
Google Scholar DOI:<3127::AID-SIM784>3.0.CO;2-M

Chow S. L. (1996). Statistical Significance: Rationale, Validity and Utility. London: Sage Publications.
Google Scholar

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Hillsdale: Lawrence Erlbaum Associates, Inc.
Google Scholar

Cohen J. (1992). Statistical power analysis. Current Directions in Psychological Sciences, 1 (3), 98–101.
Google Scholar DOI:

Denis D. J. (2003). Alternatives to null hypothesis significance testing. Theory and Science, 4 (1), 1–17.
Google Scholar

Dienes Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspective on Psychological Science, 6 (3), 274–290.
Google Scholar DOI:

Dooling D. J., Danks J. H. (1975). Going beyond tests of significance: Is psychology ready? Bulletin of the Psychonomic Society, 5, 15–17.
Google Scholar DOI:

Dudek B. (2007). Stres związany z pracą: teoretyczne i metodologiczne podstawy badań zależności między zdrowiem a stresem zawodowym. [W:] M. Górnik-Durose, B. Kożusznik (red.), Perspektywy psychologii pracy (s. 220–246). Katowice: Wydawnictwo Uniwersytetu Śląskiego.
Google Scholar

Favreau O. E. (1997). Sex and gender comparison: Does null hypothesis testing create a false dichotomy? Feminism and Psychology, 7, 63–81.
Google Scholar DOI:

Field A. (2009). Discovering Statistics Using SPSS, 3rd ed. London: Sage Publications.
Google Scholar

Fisher R. A. (1935). The logic of inductive inference (with discussion). Journal of the Royal Statistical Society, 98 (1), 39–82.
Google Scholar DOI:

Fisz M. (1969). Rachunek prawdopodobieństwa i statystyka matematyczna. Warszawa: PWN.
Google Scholar

Greenland S., Maclure M., Schlesselman J. J., Poole C., Morgenstern H. (1991). Standardized regression coefficients: A further critique and review of some alternatives. Epidemiology, 2 (5), 387–392.
Google Scholar DOI:

Greenland S., Schlesselman J. J., Criqui M. H. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123 (2), 203–208.
Google Scholar DOI:

Greń J. (1968). Modele i zadania statystyki matematycznej. Warszawa: PWN.
Google Scholar

Hilbe J. M. (2009). Logistic Regression Models. Boca Raton: Chapman and Hall/CRC.
Google Scholar DOI:

Hoenig J. M., Heisey D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55 (1), 19–24.
Google Scholar DOI:

Hosmer D. W., Lemeshow L. (1989). Applied Logistic Regression. New York: John Wiley and Sons.
Google Scholar

Hubbard R., Armstrong J. S. (2006). Why we don’t really know what “statistical significance” means: A major educational failure. Journal of Marketing Education, 28 (2), 114–120.
Google Scholar DOI:

Hubbard R., Bayarri M. J. (2003). Confusion over measures of evidence (p’s) versus errors (α’s) in classical statistical testing. The American Statistician, 57 (3), 171–182.
Google Scholar DOI:

Inman H. F. (1994). Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from nature. The American Statistician, 48 (1), 2–11.
Google Scholar DOI:

Jeffreys H. (1961). Theory of Probability, London: Oxford University Press.
Google Scholar

Jones L. V., Tukey J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5 (4), 411–414.
Google Scholar DOI:

Karni E. (1993). A definition of subjective probabilities with state-dependent preferences. Econometrica, 61 (1), 187–198.
Google Scholar DOI:

Kelley K., Preacher K. J. (2012). On effect size. Psychological Methods, 17 (2), 137–152.
Google Scholar DOI:

Killeen P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16 (5), 345–353.
Google Scholar DOI:

Kline R. B. (2013). Beyond Significance Testing. Statistics Reform in the Bahavioral Sciences, 2nd ed. Washington: American Psychological Association.
Google Scholar DOI:

Kołmogorow A. N. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer-Verlag. Za: H. Bauer (1968). Probability Theory and Elements of Measure Theory. New York: Holt, Rinehart and Winston, Inc.
Google Scholar

Laplace P. S. (1812). Theorie analytique des probabilites. Paris: Courcier.
Google Scholar

Lehmann E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88 (424), 1242–1249.
Google Scholar DOI:

Lehmann E. L. (1995). Neyman’s Statistical Philosophy. Probability and Mathematical Statistics, 15, 29–36.
Google Scholar

Lenth R. V. (2007). Post hoc power: Tables and commentary. Technical Report No. 378, The University of Iowa, Department of Statistics and Actuarial Sciences, July, 1–13.
Google Scholar

Levine T. R., Weber R., Hullett C., Park H. S., Lindsey L. L. M. (2008). A critical assessment of null hypothesis significance testing in quantitative communication research. Human Communication Research, 34, 171–187.
Google Scholar DOI:

Lindgren B. W. (1962). Statistical Theory. New York: The Macmillan Co.
Google Scholar

Lindquist E. F. ([1938] 1993). A first course in statistics. Cambridge: Houghton Miffilin. Za: C. J. Huberty. Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. Journal of Experimental Education, 61 (4), 317–333.
Google Scholar DOI:

Machina M. J., Schmeidler D. (1992). A more robust definition of subjective probability. Econometrica, 60 (4), 745–780.
Google Scholar DOI:

Magee L. (1990). R2 measures based on Wald and likelihood ratio joint significance tests. The American Statistician, 44 (3), 250–253.
Google Scholar DOI:

Magiera R. (2007). Modele i metody statystyki matematycznej. Cz. II. Wnioskowanie statystyczne, wyd. 2 rozszerz. Wrocław: Oficyna Wydawnicza GiS.
Google Scholar

Manthey J. (2010). Elementary Statistics: A History of Controversy. Boston: AMATYC 2010 Conference – Bridging Past to Future Mathematics, 11–14 November.
Google Scholar

Menard S. (2000). Coefficients of determination for multiple logistic regression analysis. The American Statistician, 54 (1), 17–24.
Google Scholar DOI:

Mises R. von (1936). Wahrscheinlichkeit, Statistik und Wahrheit. Wienna: Springer Verlag.
Google Scholar

Nagelkerke N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78 (3), 691–692.
Google Scholar DOI:

Neyman J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97–131.
Google Scholar DOI:

Neyman J., Pearson E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337. Za: E. L. Lehmann (1995). Neyman’s statistical philosophy. Probability and Mathematical Statistics, 15, 29–36.
Google Scholar

O’Keefe D. J. (2007). Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power: Sorting out appropriate uses of statistical power analyses. Communications Methods and Measures, 1 (4), 291–299.
Google Scholar DOI:

Onwuegbuzie A. J., Leech N. L. (2004). Post hoc power: A Concept whose time has come. Understanding Statistics, 3 (4), 201–230.
Google Scholar DOI:

Papoulis A. (1972). Prawdopodobieństwo, zmienne losowe i procesy stochastyczne. Warszawa: Wydawnictwa Naukowo-Techniczne.
Google Scholar

Rao C. R. (1982). Modele liniowe statystyki matematycznej. Warszawa: PWN.
Google Scholar

Rasch D. (2012). Hypothesis testing and the error of the third kind. Psychological Test and Assessment Modeling, 54 (1), 90–99.
Google Scholar

Roberts S., Pashler H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107 (2), 358–367.
Google Scholar DOI:

Rodgers J. L. (2010). The epistemology of mathematical and statistical modeling. A quiet methodological revolution. American Psychologist, 65 (1), 1–12.
Google Scholar DOI:

Rosenthal R. (1991). Metaanalytic Procedures for Social Research, 2nd ed. Newbury Park: Sage.
Google Scholar DOI:

Rosnow R. L., Rosenthal R. (2005). Beginning behavioural research: A conceptual primer, 5th ed. Englewood Cliffs NJ: Pearson/Prentice Hall.
Google Scholar

Royall R. (2000). On the probability of observing misleading statistical evidence (with comments). Journal of the American Statistical Association, 95 (451), 760–780.
Google Scholar DOI:

Royall R. (1997). Statistical Evidence. A Likelihood Paradigm. London: Chapman and Hall/CRC.
Google Scholar

Sedlmeier P., Gigerenzer G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105 (2), 309–316.
Google Scholar DOI:

Seltman H. J. (2014). Experimental design and analysis. Chapter 12: Statistical power, [dostęp: 10.12.2014].
Google Scholar DOI:

Silvey S. D. (1978). Wnioskowanie statystyczne. Warszawa: PWN.
Google Scholar

Sink C. A., Mvududu N. H. (2010). Statistical power, sampling, and effect sizes: Three keys to research relevancy. Counseling Outcome Research and Evaluation, 1 (2), 1–18.
Google Scholar DOI:

Sterne J. A. C. (2002). Teaching hypothesis tests – time for significant change? Statistics in Medicine, 21 (7), 985–994.
Google Scholar DOI:

Szymczak W. (2010). Podstawy statystyki dla psychologów. wyd. 2 popr. Warszawa: Difin.
Google Scholar

Tabachnick B. G., Fidell L. S. (2007). Using Multivariate Statistics, 5th ed. Boston: Pearson Education, Inc.
Google Scholar

Thalheimer W., Cook S. (2002). How to calculate effect sizes from published research articles: A simplified methodology, [dostęp: 28.08.2012].
Google Scholar

Thompson B. (1994). The concept of statistical significance testing. Practical Assessment, Research and Evaluation, 4, 5.
Google Scholar

Valentine J. C., Cooper H. (2003). Effect Size Substantive Interpretation Guidelines: Issues in the Interpretation of Effect Sizes. Washington: What Works Clearinghouse.
Google Scholar

Volker M. A. (2006). Reporting effect size estimates in school psychology research. Psychology in the Schools, 43 (6), 653–672.
Google Scholar DOI:

Williams R. H., Zimmerman D. W. (1989). Statistical power analysis and reliability of measurement. Journal of General Psychology, 116 (4), 359–369.
Google Scholar DOI:

Zubrzycki S. (1970). Wykłady z rachunku prawdopodobieństwa i statystyki matematycznej. Warszawa: PWN.
Google Scholar



How to Cite

Szymczak, W. (2015). The concept of size effect in the light of Neyman-Pearson’s theory of testing statistical hypothesis. Acta Universitatis Lodziensis. Folia Psychologica, (19), 5–41.


