User:RayLei/sandbox

This is the user sandbox of RayLei. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

In probability theory, if a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as the number of "successes" in n independent Bernoulli trials with probability p of success on each trial, then

P(X\leq x)=P(X<x+1)

for any x ∈ {0, 1, 2, ... n}. If np and n(1 − p) are large (sometimes taken to mean ≥ 5), then the probability above is fairly well approximated by

P(Y\leq x+1/2)

where Y is a normally distributed random variable with the same expected value and the same variance as X, i.e., E(Y) = np and var(Y) = np(1 − p). This addition of 1/2 to x is a continuity correction.

A continuity correction can also be applied when other discrete distributions supported on the integers are approximated by the normal distribution. For example, if X has a Poisson distribution with expected value λ then the variance of X is also λ, and

P(X\leq x)=P(X<x+1)\approx P(Y\leq x+1/2)

if Y is normally distributed with expectation and variance both λ.

Reason[edit]

Look at the Bin(10, 0.5) distribution, whose mean and variance are 5 and 2.5, respectively. How well does a N(5, 2.5) distribution approximate the Bin(10, 0.5)?

The probability measure of a continuous distribution, in particular normal, at one point is zero, so it is reasonable to approximate the lumps of probability at the integers by areas under the normal curve. Take P(X = 3), for example. The exact binomial probability is 0.1172.

If integrate the N(5, 2.5) density from 2 to 3 it is too low everywhere and the result is too small.

Φ((3-5)/sqrt(2.5)) - Φ((2-5)/sqrt(2.5)) = Φ(-1.265) - Φ(-1.897) = 0.1030 - 0.0289 = 0.0741

On the other hand, if integrate the N(5, 2.5) density from 3 to 4 it is too high everywhere and the result is too large.

Φ((4-5)/sqrt(2.5)) - Φ((3-5)/sqrt(2.5)) = Φ(-0.6325) - Φ(-1.265) = 0.2635 - 0.1030 = 0.1605

But if integrate from 2.5 to 3.5 the result is an much closer approximation.

Φ((3.5-5)/sqrt(2.5)) - Φ((2.5-5)/sqrt(2.5)) = Φ(-0.9487) - Φ(-1.581) = 0.1714 - 0.0569 = 0.1145

More generally, one can use the integration under the normal from minus infinity to a+0.5 to approximate the binomial probability P(X <= a) and normal a-0.5 to infinity to approximate binomial P(x>=a), e.g. a = 3 and the exact binomial calculation gives P(x<=a)=0.1719 and P(x>=a)=0.9453 while the normal approximation (with continuity correction) is Φ((3.5-5)/sqrt(2.5)) = Φ(-0.9487) = 0.1714 and 1 - Φ((2.5-5)/sqrt(2.5)) = 1 - Φ(-1.581) = 1 - 0.0569 = 0.9431.

Applications[edit]

Before the ready availability of statistical software having the ability to evaluate probability distribution functions accurately, continuity corrections played an important role in the practical application of statistical tests in which the test statistic has a discrete distribution: it was a special importance for manual calculations. A particular example of this is the binomial test, involving the binomial distribution, as in checking whether a coin is fair. Where extreme accuracy is not necessary, computer calculations for some ranges of parameters may still rely on using continuity corrections to improve accuracy while retaining simplicity.

References[edit]

Devore, Jay L., Probability and Statistics for Engineering and the Sciences, Fourth Edition, Duxbury Press, 1995.
Feller, W., On the normal approximation to the binomial distribution, The Annals of Mathematical Statistics, Vol. 16 No. 4, Page 319-329, 1945.
Peter Macdonald. (1998-03-04). The Continuity Correction. In Statistics 2MA3Probability and Statistical Methods for Science. Retrieved 4 March 2012, from http://www.math.mcmaster.ca/peter/s2ma3/s2ma3_9798/cont_correct.html.

Category:Probability theory Category:Statistical tests Category:Computational statistics

Reason[edit]

Applications[edit]

See also[edit]

References[edit]