A question I frequently get asked by both new and experienced Partners in the assessment industry is why the number of 30 incumbents is recommended as a minimum sufficient sample size for creating custom concurrent performance models.
The topic itself is an important one. It has to do with the technical feasibility of any criterion-related validity study. When the EEOC published its regulations back in 1978, it stated that the first condition of a meaningful study is: An adequate sample of persons available for the study to achieve findings of statistical significance.
So now the question becomes: what determines an adequate sample?
One avenue to answer this question may be taken from the 2006 US Department of Labor’s Testing and Assessment: A Guide to Good Practices for Workforce Investment Professionals (https://www.onetcenter.org/dl_files/proTestAsse.pdf). In this document as well as an earlier version published online in 2000, we find a table for interpreting validity coefficients. Very beneficial validity corresponds to a minimum correlation coefficient of 0.35 on a scale of zero to one. What many individuals might not realize is that the critical value of 0.35 for validity is dependent upon sample size. The sample sued for this table is 32 observations. Sample size in statistics is typically referred to as n. Therefore, this table has an n of 32 (n=32). Statistical textbooks also refer to sample size by a term called degrees of freedom (df). The formula to calculate df is sample size (n) minus 2. This sample then has 30 degrees of freedom.
It is important to know that when sample size increases, the critical value for very beneficial validity decreases. When sample size decreases, the critical value for very beneficial increases. The two have an inverse relationship.
Returning to the EEOC topic of adequate sample, the US DOL’s validity interpretation table by it’s critical value for very beneficial validity corresponds to 50% statistical power. 50%, that’s the number! If we achieve 50% statistical power, we should be showing an attempt to comply with governmental regulations and have a technically feasible study.
In the end, statistical power is related to type two statistical error, beta. Type one statistical error, by the way, is called alpha. The formula to calculate statistical power equals one minus beta. What is important here is that the larger the sample size, the greater the statistical power. The smaller the type two statistical error, the greater the statistical power. If we have very large type two statistical error (greater than 50%) we might not be able to detect meaningful differences in validity. Our chances of producing statistically-significant results is diminished by increased sample-dependent error.
In conclusion, although there is no governmental agency –that I’m aware of– explicitly stating that 50% statistical power is a stringent guideline, it may be deduced as one by referring to US DOL publications that discuss validity. Also, there’s a rule in statistics that provides a mathematical reason for having a minimum of 30 observations in a criterion-related validity study. Sample size of 30 to 40 is where we may begin to assume a normal distribution in our data. But that’s the topic for another day.