## Stratified Sampling

Figure 23.4 shows three ways that sampling might be arranged in a area. Random sampling and systematic sampling do not take account of any special features of the site, such as different soil type of different levels of contamination. Stratified sampling is used when the study area exists in two or more distinct strata, classes, or conditions (Gilbert, 1987; Mendenhall et al., 1971). Often, each class or stratum has a different inherent variability. In Figure 23.4, samples are proportionally more numerous in stratum 2 than in stratum 1 because of some known difference between the two strata.

We might want to do stratified sampling of an oil company's properties to assess compliance with a stack monitoring protocol. If there were 3 large, 30 medium-sized, and 720 small properties, these three sizes define three strata. One could sample these three strata proportionately; that is, one third of each, which would be 1 large, 10 medium, and 240 small facilities. One could examine all the large facilities, half of the medium facilities, and a random sample of 50 small ones. Obviously, there are many possible sampling plans, each having a different precision and a different cost. We seek a plan that is low in cost and high in information.

The overall population mean y is estimated as a weighted average of the estimated means for the strata:

Random Sampling

Systematic Sampling

### Stratified Sampling

FIGURE 23.4 Comparison of random, systematic, and stratified random sampling of a contaminated site. The shaded area is known to be more highly contaminated than the unshaded area.

Random Sampling

Systematic Sampling

### Stratified Sampling

FIGURE 23.4 Comparison of random, systematic, and stratified random sampling of a contaminated site. The shaded area is known to be more highly contaminated than the unshaded area.

Observations ni |
2 si |
Size of Stratum |
Weight wi | ||

Stratum i |
20 |
34 |
35.4 |
i500 |
0.5 |

Stratum 2 |
8 |
25 |
i80 |
750 |
0.25 |

Stratum 3 |
i2 |
i9 |
i2 |
750 |
0.25 |

where ns is the number of strata and the wi are weights that indicate the proportion of the population included in stratum i. The estimated variance of y is:

Example 23.9

Suppose we have the data in Table 23.4 from sampling a contaminated site that was known to have three distinct areas. There were a total of 3000 parcels (acres, cubic meters, barrels, etc.) that could have been sampled. A total of n = 40 observations were collected from randomly selected parcels within each stratum. The allocation was 20 observations in stratum 1, 8 in stratum 2, and 12 in stratum 3. Notice that one-half of the 40 observations were in stratum 1, which is also one-half of the population of 3000 sampling units, but the observations in strata 2 and 3 are not proportional to their populations. This allocation might have been made because of the relative cost of collecting the data, or because of some expected characteristic of the site that we do not know about. Or, it might just be an inefficient design. We will check that later.

The overall mean is estimated as a weighted average:

The estimated variance of the overall average is the sum of the variances of the three strata weighted with respect to their populations:

The confidence interval of the mean is y ± 1.96 Js2, or 28 ± 2.7.

The confidence intervals for the randomly sampled individual strata are interpreted using familiar equations. The 95% confidence interval for stratum 2 is y2 ± 1.96*/s22ln2 and 25 ± 1.96 V180/8 = 25 ± 9.3. This confidence interval is large because the variance is large and the sample size is small. If this had been known, or suspected, before the sampling was done, a better allocation of the n = 40 samples could have been made.

Samples should be allocated to strata according to the size of the strata, its variance, and the cost of sampling. The cost of the sampling plan is:

The ci are the costs to collect and analyze each specimen. The optimal sample size per stratum is:

wst/jci

This says that the sample size in stratum i will be large if the stratum is large, the variance is large, or the cost is low. If sampling costs are equal in all strata, then:

Using these equations requires knowing the total sample size, n. This might be constrained by budget, or it might be determined to meet an estimation error criterion for the population mean, or to have a specified variance (Gilbert, 1987).

The sample size needed to estimate the overall mean with a specified margin of error (E) and an approximate probability (1 - a)100% = 95% of exceeding that error is:

Example 23.10

Using the data from Example 23.9 (Table 23.4), design a stratified sampling plan to estimate the mean with a margin of error of 1.0 unit with 95% confidence. There are three strata, with variances s1 =35.4, s2 = 180, and s^ =12, and having weights w1 = 0.5, w2 = 0.25, and w3 = 0.25. Assume equal sampling costs in the three strata. The total sample size required is:

The allocation among strata is:

giving m = 4( 0.5)(35.4) = 71 n2 = 4(0.25)(180) = 180 m = 4(0.25)(12) = 12

This large sample size results from the small margin of error (1 unit) and the large variance in stratum 2.

### Example 23.11

The allocation of the n = 40 samples in Example 23.9 gave a 95% confidence interval of ± 2.7. The allocation according to Example 23.10 is ni = 40(^^/65.7) = 0.61 w^ , which gives ni = 11, n 2 = 28, and n3 = 2. Because of rounding, this adds to n = 41 instead of 40.

If this allocation leads to the same sample variances listed in Table 23.4, the variance of the mean would be = 1.58 and the 95% confidence interval would be ± 1.96 VT53 = ± 2.5. This is a reduction of about 10% without doing any additional work, except to make a better sampling plan.

## Do It Yourself Car Diagnosis

Don't pay hundreds of dollars to find out what is wrong with your car. This book is dedicated to helping the do it yourself home and independent technician understand and use OBD-II technology to diagnose and repair their own vehicles.

## Post a comment