PRDA: Prospective and Retrospective Design Analysis • PRDA

{PRDA} allows performing a prospective or retrospective design analysis to evaluate inferential risks (i.e., power, Type M error, and Type S error) in a study considering Pearson’s correlation between two variables or mean comparisons (one-sample, paired, two-sample, and Welch’s t-test).

Introduction to design analysis

The term Design Analysis was introduced by Gelman and Carlin (2014) as a broader definition of Power Analysis. Traditional power analysis has a narrow focus on statistical significance. Design analysis, instead, evaluates together with power levels also other inferential risks (i.e., Type M error and Type S error), to assess estimates uncertainty under hypothetical replications of a study.

Given an hypothetical effect size and the study characteristics (i.e., sample size, statistical test directionality, \(\alpha\) level), design analysis evaluates:

Power: the probability of the test rejecting the null hypothesis.
Type M error (Magnitude): the factor by which a statistically significant effect is on average exaggerated, also known as Exaggeration Ratio.
Type S error (Sign): the probability of finding a statistically significant result in the opposite direction to the hypothetical effect.

Moreover, Gelman and Carlin (2014) distinguished between two types of design analysis according to the study phase:

Prospective Design Analysis: if the analysis is performed in the planning stage of a study to define the sample size needed to obtain a required level of power.
Retrospective Design Analysis: if the analysis is performed in a later stage when the data have already been collected. This is still useful to evaluate the inferential risks associated with the study.

It is important to do not mistake a retrospective design analysis for post-hoc power analysis. The former defines the hypothetical effect size according to previous results in the literature or experts indications, whereas the latter defines the hypothetical effect size based on the same study results and it is a widely-deprecated practice (Goodman 1994; Lenth 2007; Senn 2002).

Enhancing researchers awareness

Although Type M error and Type S error depend directly on power level, they underline valuable information regarding estimates uncertainty that would otherwise be overlooked. This enhances researchers awareness about the inferential risks related to their studies and helps them in the interpretation of the results.

Despite the lower chances, a statistically significant result could be obtained even in an underpowered study (e.g., power = 20%). This might seem a promising finding, and researchers might think that getting a statistically significant result in an underpowered study means the results must be reliable. Therefore, they would probably be even more confident in the interpretation of their results.

However, in this scenario statistically significant results are almost certain to be an overestimation of the population effect. As pointed out by Gelman, Hill, and Vehtari (2020) “a key risk for a low-power study is not so much that it has a small chance of succeeding, but rather that an apparent success merely masks a larger failure” (p.292). This is also referred as the “Winner’s curse”, indicating that the apparent win in terms of a statistically significant result is an actual loss as the obtained estimate is inflated.

For example, in a study considering a two-sample t-test with 30 participants per group, if the hypothetical population effect size is small (e.g., Cohen’s d of .25) the actual power is only 16%. The associated Type M error is around 2.60 and the Type S error is 0.01. That means, statistical significant results are on average an overestimation of 160% of the hypothesized population effect and there is a 1% probability of obtaining a statistically significant result in the opposite direction.

In this scenario, knowing the type M and S errors, researchers would be much more cautious in interpreting the results and might consider carrying out a replication study to obtain more reliable results.

More on design analysis

To know more about design analysis consider Gelman and Carlin (2014). While, for an introduction to design analysis considering examples in psychology see Altoè et al. (2020) and Bertoldo, Zandonella Callegher, and Altoè (2020).

The package

Given a plausible value of effect size, {PRDA} performs a prospective or retrospective design analysis to evaluate the inferential risks (i.e., power, Type M error, and Type S error) related to the study design.

{PRDA} package can be used for Pearson’s correlation between two variables or mean comparisons (i.e., one-sample, paired, two-sample, and Welch’s t-test) considering an hypothetical value of \(\rho\) or Cohen’s d respectively. See vignette("retrospective") and vignette("prospective") to know how to set function arguments for the different effect types.

Install

You can install the released version of PRDA from CRAN with:

install.packages("PRDA")

And the development version from GitHub with:

# If devtools is not installed yet: 
# install.packages( "devtools" )  
devtools::install_github("CaludioZandonella/PRDA",
                         build_vignettes = TRUE)
library(PRDA)

Functions

In {PRDA} there are two main functions:

retrospective(). Given the hypothetical population effect size and the study sample size, the function retrospective() performs a retrospective design analysis. According to the defined alternative hypothesis and the significance level, the inferential risks (i.e., Power level, Type M error, and Type S error) are computed together with the critical effect value (i.e., the minimum absolute effect size value that would result significant). To know more about function arguments and examples see the function documentation ?retrospective and vignette("retrospective").
prospective(). Given the hypothetical population effect size and the required power level, the function prospective() performs a prospective design analysis. According to the defined alternative hypothesis and the significance level, the required sample size is computed together with the associated Type M error, Type S error, and the critical effect value (i.e., the minimum absolute effect size value that would result significant). To know more about function arguments and examples see the function documentation ?prospective and vignette("prospective").

Hypothetical effect size

The hypothetical population effect size can be defined as a single value according to previous results in the literature or experts indications. Alternatively, {PRDA} allows users to specify a distribution of plausible values to account for their uncertainty about the hypothetical population effect size. To know how to specify the hypothetical effect size according to a distribution and an example of application see vignette("retrospective").

Case study

Eisenberger, Lieberman, and Williams (2003) claimed that social and physical pain seem to share similar neural underpinnings. Their experiment included 13 participants, and they found a statistically significant correlation between perceived distress due to social exclusion and activity in the brain area associated with physical pain. However, the magnitude of the estimated correlation (\(r = .88\)) is beyond what could be considered plausible. In this field correlations are likely to be around \(\rho = .25\) (for a complete discussion see Bertoldo, Zandonella Callegher, and Altoè 2020).

Retrospective design analysis

The function retrospective() can be used to evaluate the inferential risks associated with the study.

set.seed(2020) # set seed to make results reproducible

retrospective(effect_size = .25, sample_n1 = 13, test_method = "pearson")
#> 
#>  Design Analysis
#> 
#> Hypothesized effect:  rho = 0.25 
#> 
#> Study characteristics:
#>    test_method   sample_n1   sample_n2   alternative   sig_level   df
#>    pearson       13          NULL        two_sided     0.05        11
#> 
#> Inferential risks:
#>    power   typeM   typeS
#>    0.127   2.583   0.028
#> 
#> Critical value(s): rho  =  ± 0.553

In the output, we have the summary information about the hypothesized population effect, the study characteristics, and the inferential risks. We obtained a statistical power of almost 13% that is associated with a Type M error of around 2.6 and a Type S error of 0.03. That means, statistical significant results are on average an overestimation of 160% of the hypothesized population effect and there is a 3% probability of obtaining a statistically significant result in the opposite direction.

To know more about function arguments and examples see the function documentation ?retrospective and vignette("retrospective").

Prospective design analysis

Considering the previous results, researchers might consider planning a replication study to obtain more reliable results. The function prospective() can be used to compute the sample size needed to obtain a required level of power (e.g., power = 80%).

prospective(effect_size = .25, power = .8, test_method = "pearson", 
            display_message = FALSE)
#> 
#>  Design Analysis
#> 
#> Hypothesized effect:  rho = 0.25 
#> 
#> Study characteristics:
#>    test_method   sample_n1   sample_n2   alternative   sig_level   df 
#>    pearson       126         NULL        two_sided     0.05        124
#> 
#> Inferential risks:
#>    power   typeM   typeS
#>    0.807   1.111   0    
#> 
#> Critical value(s): rho  =  ± 0.175

In the output, we have again the summary information about the hypothesized population effect, the study characteristics, and the inferential risks. To obtain a power of around 80% the required sample size is \(n = 126\), the associated Type M error is around 1.10 and the Type S error is approximately 0.

To know more about function arguments and examples see the function documentation ?prospective and vignette("prospective").

References

Altoè, Gianmarco, Giulia Bertoldo, Claudio Zandonella Callegher, Enrico Toffalini, Antonio Calcagnì, Livio Finos, and Massimiliano Pastore. 2020. “Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis.” Frontiers in Psychology 10. https://doi.org/10.3389/fpsyg.2019.02893.

Bertoldo, Giulia, Claudio Zandonella Callegher, and Gianmarco Altoè. 2020. “Designing Studies and Evaluating Research Results: Type M and Type S Errors for Pearson Correlation Coefficient.” Preprint. PsyArXiv. https://doi.org/10.31234/osf.io/q9f86.

Eisenberger, Naomi I., Matthew D. Lieberman, and Kipling D. Williams. 2003. “Does Rejection Hurt? An fMRI Study of Social Exclusion.” Science 302 (5643): 290–92. https://doi.org/10.1126/science.1089134.

Gelman, Andrew, and John Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. https://doi.org/10.1177/1745691614551642.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Analytical Methods for Social Research. Cambridge: Cambridge University Press.

Goodman, Steven N. 1994. “The Use of Predicted Confidence Intervals When Planning Experiments and the Misuse of Power When Interpreting Results.” Annals of Internal Medicine 121 (3): 200. https://doi.org/10.7326/0003-4819-121-3-199408010-00008.

Lenth, R. V. 2007. “Statistical Power Calculations1.” Journal of Animal Science 85 (suppl_13): E24–E29. https://doi.org/10.2527/jas.2006-449.

Senn, Stephen J. 2002. “Power Is Indeed Irrelevant in Interpreting Completed Studies.” BMJ: British Medical Journal 325 (7375): 1304. https://doi.org/10.1136/bmj.325.7375.1304.