Confidence intervals must be robust in having nominal and actual probability coverage in close agreement. This article examined two ways of computing an effect size in a two-group problem: (a) the classic approach which divides the mean difference by a single standard deviation and (b) a variant of a method which replaces least squares values with robust trimmed means and a Winsorized variance. Confidence intervals were determined with theoretical and bootstrap critical values. Only the method that used robust estimators and a bootstrap critical value provided generally accurate probability coverage under conditions of nonnormality and variance heterogeneity in balanced as well as unbalanced designs.