In June 2022, the U.S. Environmental Protection Agency (EPA) updated ProUCL, its environmental statistics software package. First released as ProUCL v.3.0 in April 2004, EPA has upgraded ProUCL over the years to help statisticians, researchers, and various environmental practitioners perform basic statistical evaluation of groundwater, surface water, air, and soil data that may contain left-censored results (i.e., non-detect values below reporting limits). The June 2022 update introduced ProUCL v.5.2.
Why is ProUCL important?
Statistical methods described in many EPA, state, and local guidances often refer practitioners to the ProUCL Technical Guide for detailed recommendations on statistical methods to use for site characterization, site remediation, and risk assessment.
While guidance documents for different industries are not frequently updated and are not all updated at the same time, each ProUCL software update and associated methods take into account results from recent research on proposed statistical methods and are peer-reviewed and vetted by EPA before publication. Using the latest version of ProUCL and following the methods described in its Technical Guide can help ensure your project’s data evaluation aligns with EPA expectations.
What has changed?
The previous version of ProUCL (i.e., v.5.1) focused on achieving the desired coverage (i.e., confidence level), which may have resulted in unrealistically high estimates of upper limits around population parameters. In the ProUCL v.5.2.0 Technical Guide, EPA revised the recommended statistical methods based on additional simulation studies and proposed alternative approaches to balance desired coverage with meaningful estimates. Primarily, the major changes from v.5.1 to v.5.2.0 include:
- Increasing the minimum sample size requirements from 8 to 10
- Adding a distribution-specific significance level for each of the three potential underlying distributions (i.e., normal: , gamma: , and lognormal: ) to use in goodness-of-fit tests
- Updating recommended methods used to estimate limits of prediction, tolerance, and confidence intervals
What do the ProUCL changes mean?
- New minimum sample size of 10: The minimum sample size requirement for any statistical evaluation highly depends on the characteristics of the data (e.g., underlying distribution and variance) and the test(s) used. Therefore, every attempt should be made to calculate and collect an adequate number of samples needed to meet project-specific data quality objectives (DQO). That said, the ProUCL v.5.2.0 Technical Guide recommends a strict minimum of 10 samples to perform statistical evaluations if a DQO-based sample size requirement cannot be achieved due to resource limitations or if a DQO study could not be performed. While 10 samples may not be sufficient to achieve the desired confidence level and test power under nonparametric conditions, depending on the statistical procedure and observed variance, this sample size would help obtain reliable estimates and test results in most parametric settings. The increase in sample size requirement may increase your baseline monitoring period by as little as six months depending on your sampling frequency. Additionally, your background may not be updated until 10 additional data points are available.
- Goodness-of-fit test and distribution-specific significance levels: ProUCL helps estimate the upper limits of one-sided prediction, tolerance, and confidence intervals assuming normal, gamma, and lognormal distributions. Since the development of ProUCL v.5.1, simulations have shown that the normal distribution could be used as a good approximation that supports an adequate estimate of the upper confidence limit (UCL) around the true population mean in more situations than originally thought. On the contrary, the simulations showed that using the lognormal distribution could return unrealistic results more often than expected. Therefore, the ProUCL v.5.2.0 Technical Guide recommends using different significance levels for each distribution when performing goodness-of-fit tests, favoring estimates based on the normal distribution and limiting those based on the lognormal to more dependable settings.
- No more Chebyshev intervals: Of the changes to the recommended methods used to estimate limits of prediction, tolerance, and confidence intervals, the most notable is that the Chebyshev method is no longer endorsed. Simulations demonstrated that Chebyshev interval limits were often unrealistically high compared to the true parameter values, thus inadequately representing actual site conditions. The ProUCL v.5.2.0 Technical Guide recommends using alternative methods to the Chebyshev approach to estimate upper prediction (UPL), tolerance (UTL), and confidence limits. The UPLs, UTLs, and UCLs obtained following the new guidance are intended to be a compromise between achieving the desired coverage of the interval and providing a realistic estimate of the site conditions. Depending on the distribution, sample size, and data characteristics (e.g., the presence of ND results and the observed variance and skewness), the estimates obtained following the new methods may be lower than those previously obtained. Depending on the data and the limit being calculated, this change may result in lower background threshold values and a higher number of downgradient/point-of-compliance exceedances than previously reported, or lower UCLs could result in fewer exceedances of cleanup and health standards (and potentially lead to shorter remediation timeframes) for sites in corrective action programs.
If I use a software other than ProUCL (e.g., SAS, R, Sanitas, SPSS, Excel), should I care?
Most agencies refer to the ProUCL Technical Guide for details on statistical methods used for required environmental evaluations, regardless of the actual software used. If your agency requires the use of ProUCL methods and you want to use an alternative software (e.g., SAS, R, Sanitas), you may be required to thoroughly test functions to verify results obtained from the alternative software match those obtained using ProUCL.
How soon will this update take effect?
ProUCL v.5.2.0 has been available since June 2022, and select state and local agencies began mandating its use in August 2022. Requirements related to using the latest version may be agency-dependent as not all agencies have indicated that they now require ProUCL v.5.2.0. However, organizations may consider adopting the latest version as it corrects previously known bugs and considers the latest statistical research.
How will the ProUCL update affect my site?
Because the changes impact several steps of the statistical procedures and are data-driven, it is hard to exactly predict or make a blanket statement about their potential effect. If you currently use Trihydro for your statistical needs, we recommend reaching out to your Trihydro project manager so they can provide further insight using your data and previous results.
Can I use ProUCL v.5.2.0 but choose not to follow the new Technical Guide recommendations that support the software?
This is inadvisable. The new recommended methods are based on results from extensive simulations that used certain sets of assumptions and data handling procedures. ProUCL v.5.2.0 assumes that the input data follow its current recommendations and will use the methods described in its Technical Guide. Failure to meet any data or procedure requirement and/or assumption could result in unreliable estimates that may later be questioned by regulatory agencies. The ProUCL v.5.2.0 Technical Guide strongly discourages the “use of any portion of ProUCL that does not comply with the ProUCL Technical Guide.”
If you have questions or want to learn more about the ProUCL update and what it means for your site, please contact us.