The Central Limit Theorem in the Sociology Classroom: A Constructivist Approach

To undergraduate students of sociology, and who are required to take at least one course in research methods, understanding of the CLT requires the successful negotiation of a number of hurdles, lack of training in mathematics in general and mathematical statistics in particular, and a sociological aversion to the “bell curve” or normal distribution. This paper critiques the “sweetening” approaches to conduct experiments that construct the Central Limit Theorem in the classroom. It proceeds to outline a simple type of experiment based on a discrete rectangular population distribution, and offers a proof, understandable to sociology undergraduates, of how a discrete rectangular population distribution gives rise to a continuous sampling distribution.


Introduction
The Central Limit Theorem (CLT) is a fundamental theorem of mathematical statistics. It has the characteristic of fundamental theorems in other branches of mathematics in that it conveys a profound and partly surprising finding. While some statisticians point out that there are two fundamental theorems in mathematical statistics-the Law of Large Numbers (LLN) and the CLT-and refer to the CLT as the "second" fundamental theory of statistics (e.g. Grimstead and Snell 1997), they would not dispute the much more surprising result of the CLT compared to the LLN (Wicklin 2014). Indeed, the LLN says that large random samples reflect the population. This result is not all that unexpected. It is also often confused with the much more extensive result of the CLT, that in repeated large (simple) random samples taken from a population, the distribution of the sample mean is normal, with a mean equal to that of the population (μ) and a standard deviation equal to that of the population divided by the square root of the sample size. The most exhilarating part of the result is that this holds regardless of the population distribution. When we consider that the normal distribution is a continuous distribution, with the range of real numbers on the horizontal axis, while the population distribution may be very non-normal (e.g. a rectangular or uniform distribution), with only a few discrete values, this results is definitely not intuitively obvious.
For undergraduate students of sociology, and who are required to take at least one course in research methods, understanding of the CLT requires the successful negotiation of a number of hurdles, lack of training in mathematics in general and mathematical statistics in particular, and a sociological aversion to the "bell curve" or normal distribution. As a representation of actual populations, the normal distribution has known its heyday. Notably, the famous Belgian statistician and sociologist Adolphe Quetelet (1796-1874) utilized it to compute deviations in human behavior and characteristics (see Landau and Lazarsfeld 1968). Today, the average sociology student is understandably weary of the use of the normal distribution in describing human traits as average or deviating from the average in a negative or positive direction, and the research methods professor's task partly involves re-introduction of the normal distribution as a vital one for quantitative data analysis. The larger task is to convey the "truth" of the CLT to sociology students without having to rely on the formal proof. While some form of experimentation is often used in "constructivist" approaches that constitute a relatively easy alternative to a formal proof, this method requires a leap from the classroom experiment to the situation where many more and larger samples are taken from the population. While this leap can be taken constructively by applying combinatorics, sociology students do not typically possess skills in this area.
During the 15 years I spent as an economics professor, I found it rather easy to introduce students to the Central Limit Theorem, by means of (1) a simple experiment of drawing chips or marbles of four colors from a closed container of a fixed total number of them, usually 100; (2) using combinatorics to demonstrate that the rudimentary normal-looking curve generated via our experiment becomes more "normal" as the number of samples increases indefinitely; and (3) relying on a statistical program-such as TSP-to simulate the sample selection of larger-size samples out of non-normal populations. I often relied on a section in my own textbook (Author 1989, pp. 257-261) to show the outcome of the second step and to formulate the Central Limit Theorem. For other situations than the CLT, such as confidence intervals, it is relatively easy to incorporate computer simulations in the classroom, and these are also covered in the literature, as exemplified by Dambolena (1986), Kennedy, Olinsky, and Schumacher (1990), Ng and Wong (1995), West and Ogden (1998), and Paret and Martz (2008). A review of the relevant literature is provided by Mills (2002). The CLT is not so straightforward to convey by simulation, since the computer simulation has to include a routine that selects random samples of a given size. When students can only see the end result of such random sample selection, they may not necessarily be convinced that the outcome is truly illustrating the theorem or concept at hand.
After joining the sociology group of a multidisciplinary department in 1997, I began to look for ways to explain the meaning of the CLT using essentially only the first of these three steps. At the same time, I noticed that professors teaching research methods to sociology students often "sweetened" the CLT by taking samples from a population of M&Ms, with the accompanying objective that the students can eat the experiment. Nearly two decades later, I remain convinced that M&Ms do not lend themselves well to such experiments, as explained in the next section.

Sweetening the CLT?
Let us start with the consideration that the majority of students would only want to eat M&Ms that only they have touched. That does not leave many options for a constructivist experiment. Suppose that we purchase a number of small packages of the popular candy, let each student open their bag and count the International Journal for Innovation Education and Research www.ijier.net Vol:-6 No-06, 2018 International Educative Research Foundation and Publisher © 2018 pg. 59 number of reds out of the total. That result can then be compared with the number of reds that are in the population produced by the Mars Company, a number it is willing to share with the public. There are three problems with this experiment. The first problem is that we are making the implicit assumption that each bag constitutes a simple random sample of the total M&Ms produced. Since the bags are filled by a machine that distributes colored candies according to preset percentages for each color, this is not a valid assumption. The second problem is that we would not clearly be able to illustrate how the continuous normal distribution is associated with non-normal and discrete distributions. The third problem relates to the first, and that is that this experiment is more appropriate to construct the law of large numbers than the CLT, i.e. in repeated trials, the population frequency of any color of the M&Ms is approximated, as the mean of the means for the students' relative frequencies of "red" will reveal.
M&Ms can be used for experiments in the classroom, in experiments that require each candy bag to be examined by one student only, and only once (they do get sticky after a while, despite their hard shell). For example, one can test by means of a Chi-squared test whether the counts for each color per bag are the same. This is a fun exercise, and the candy does not get germ-covered or sticky in the process, culminating in an enjoyable snack in addition to increased understanding of the Chi-squared test. Paret and Martz (2008) provide a nice overview of ways in which statistics can be sweetened, using Minitab 16. Not surprisingly, the Central Limit Theorem is not one of the five tests reviewed in their paper. This does not mean that M&Ms cannot be used at all in a constructionist experiment to demonstrate the CLT. However, the M&Ms will (1) be touched by more than one student (yuck) and (2) become sticky before the experiment is concluded. Here are the steps of one such experiment: (1) open a number of small M&M bags, and empty them in a container that does not reveal the colors (There is no need to have a bag for each student); (2) count the number of each color represented by the M&Ms in the bowl; (3) give a numerical code to each color (e.g. red = 1, brown = 2, and so on); (3) construct the frequency distribution for this population and compute the population mean and variance; (4) let each student draw a given number of M&Ms at random, with replacement (here is where the stickiness comes in); (5) shake the container and move on to the next student; (6) let each student compute using the numerical codes associated with each color 1 ; (7) let students plot their own value of in a graph on the blackboard, and voila. It is doubtful that students will want to eat any of the pieces of candy after the experiment is concluded. To be fair, such experiment can be conducted with candies wrapped in solid-color cellophane, which would allow step (8): divide the candies among the students and eat away.

A Useful Constructionist Approach
It is easier to do the experiment described at the end of the previous section if one uses a well-defined population that is non-normal. After all, while a normal population distribution also results in a normal sampling distribution, it is much more exhilarating for students (and their teachers) to begin with a very non-normal distribution. For CLT experiments, I keep a bag of 100 glass beads in my filing cabinet, with International Educative Research Foundation and Publisher © 2018 pg. 60 equal numbers of each of the colors clear (white), yellow, blue and green. 2 The population is easily visualized and plotted, and its mean and standard deviation easily computed. In fact, this is a good review for students who have difficulty computing variances and standard deviations. We assign codes as follows: clear = 1, yellow = 2, blue = 3, and green = 4. The population mean μ equals 25 x (1 + 2 + 3 + 4) divided by 100, which yields 2.5. The variance and standard deviation are then computed as in the following familiar worksheet: 3 X μ X -μ (X -μ) squared 1 2. This is a good place to remind students that the sum of the absolute deviations (X -μ) will always be zero, so we square the deviations prior to computing the variance, which is the sum of the last column divided by 4, or 5/4 = 1.25. The standard deviation is √1.25 or 1.12, rather large, considering that the range of values is from 1 to 4, and the difference between the highest (or lowest) values and the mean is 1.5 in absolute value. The outcome that for the rectangular distribution the variance is identical to this difference is not to be expected for all rectangular distributions. The formula for the variance of a discrete uniform distribution is (k 2 -1)/12, where k = highest valuelowest value + 1 (in our case, 4 -1 + 1 = is 4. 1/12 of k 2 -1 equals 1.25, which is the same as we computed in our little worksheet above.
The beads are placed in a dark-colored envelope or bag, which is mixed before each student draws the assigned number of beads in one grab, counts the beads, records the colors, and computes her with the number codes. The results are recorded in tables, such as Tables 1-4 that are the outcome of four such experiments. We can easily see that the mean of the sampling distribution is very close to that of the population, and we can compute the standard deviation using SPSS or another computer or calculator, and see that it is indeed substantially smaller than that of the population. Plotting our results often yields a frequency distribution that indeed seems to approach a normal distribution. Let us not forget that the experiments presented in Tables 1-4 are based on very small samples. If n increases to the rule of thumb value for a large sample, 30, the results will be very striking indeed. However, such experiment would require the aid of a computer program. 4

The Standard Deviation of the Mean, or the Standard Error
While the experiment also clearly shows that the standard error of the mean is much smaller than the population standard deviation, students are not automatically convinced that it equals σ/√n. This part of the theorem can be demonstrated very straightforwardly, however. Many sociology students possess sufficient mathematical skills to understand that (1) the variance of a constant times a variable, or Var(CX) equals the square of the constant times the variance, or C 2 Var(X). Most also remember the quadratic formula from their high school years, so that it becomes clear that the X need to be independent (as in simple random samples) for the covariances to disappear. Accordingly, Var ( ) = 1/n 2 Var (X1 + … + Xn) = n/n 2 Var (X), so that the standard deviation of the sample mean equals σ/√n. Most of the students in a sociology research methods class can derive this result.

Another and More Exciting Part of the Theorem
Our experiment outlined in Section 3 and illustrated by four actual classroom experiments presented in Tables 1-4 does not provide a clear way to show how the discrete population distribution gets transformed into a sampling distribution of the mean that is continuous. It is clear from the experiments, however, that increasingly large sample sizes yield an increasing number of possible values on the axis. We can do better than this, however, and introduce an actual proof that is within the realm of understanding of the typical sociology student.
The method of mathematical proof by induction relies directly on the properties of natural numbers, which progress in a clear pattern 1, 2, 3, and so on. Proof by induction can be envisaged when something needs to be proved where a progression of the natural numbers plays a crucial role. Such is the case when we think about taking larger and larger samples out of a population. The sample size thus increases from the smallest possible size of 1 to 2, 3, 4, and going on to infinity. We need to convince students that, even though there are only four distinct (discrete) values on the horizontal axis for the population, an infinite number of values on the horizontal axis results for an infinitely large sample size.
It is certainly possible to construct part of this argument experimentally. Table 1 shows that when n = 3, we have values for that include the original four values in the population, 1, 2, 3, and 4, and the additional International Journal for Innovation Education and Research www.ijier.net Vol:-6 No-06, 2018 International have also noticed how evenly spread these "in-between" outcomes are in relation to the population values. It is easy to verify in a class experiment that n = 4 leads to n -1 = 3 such in-between values, n = 5 leads to 4 in-between values, and n = 6 has 5 in-between values. For example, in this latter case, we have between 2 and 3 the five new values 2.17, 2.33, 2.5, 2.67, and 2.87.
The formal extension and proof that each addition to the sample size also adds one additional outcome for is fairly straightforward. Any proof by induction has only two steps. The first is to demonstrate the result for the smallest value of the natural number involved, and the second (harder) is to show that the result holds for any two consecutive values of the natural number involved. If these two steps are demonstrated, the result holds for any value of the natural number. This proof is outlined below for the case of the 4-value discrete uniform distribution, with the incremental sample size following the progression of natural numbers, that is, a sample size of 1, or 2, or 3, or 30, or 5000, etc.
In the first step of the proof, we demonstrate that the result holds for n = 1. Indeed, when n = 1, only the outcomes 1, 2, 3, or 4 are possible, and zero additional values are possible for . This completes the first step.
In the second step, we pick a sample size n = k. The sample mean is the sum of k natural numbers for our uniform discrete distribution. The smallest possible occurs when each item picked is the white one, with a value of 1. So 1 occurs k times, and = k (1)/k = 1. The next highest value for occurs when all but one of the items picked is the white one (value of 1), and one is yellow (value of 2), so that we get = (k -1 + 2)/k = (k + 1)/k = 1 + 1/k. The next largest value is (k + 2)/k = 2 -1/k, and so on, until we get the largest possible outcome of 2k/k = 2. Thus, gathering the results together, there are k -1 new values for between the values 1 and 2. Extending the same reasoning to the possible values between 2 and 3 again yields k -1 new values, and the same goes for the values between 3 and 4. Since we have demonstrated the result for steps 1 and 2 of the proof, we have established the proof for the rectangular distribution. The proof can easily be extended to other types of discrete distributions.

Conclusion
The meaning of the Central Limit Theorem is notoriously difficult to convey to students who are not well versed in mathematical statistics, and undergraduate sociology students certain are among these. A welldesigned experiment, however, can help students to "construct" the theory rather than to formally prove it.
The portion of the theorem that implies that discrete population distributions nevertheless have sample distributions for the mean that approach a normal and therefore continuous distribution, can be understood by the constructivist approach of the classroom experiment, but students can learn to prove this part of the theorem with tools no more extensive than those obtained in the very basic mathematics competency program that exists in all universities.