Day
Temp
Monday
77
Tuesday
76
Wednesday
74
Thursday
78
Friday
78
This value falls so far into the tail that it cannot even be plotted on the distribution ( Figure 7.7 )! Because the result is significant, you also calculate an effect size:
The effect size you calculate is definitely large, meaning someone has some explaining to do!
Figure 7.7. Obtained z statistic. (“ Obtained z5.77 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)
You compare your obtained z statistic, z = 5.77, to the critical value, z * = 1.645, and find that z > z *. Therefore you reject the null hypothesis, concluding:
Reject H 0 . Based on 5 observations, the average temperature ( M = 76.6 degrees) is statistically significantly higher than it is supposed to be, and the effect size was large, z = 5.77, p < .05, d = 2.60.
Example C Different Significance Level
Finally, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, a = .01, to test the hypothesis.
We will use 60 as an arbitrary null hypothesis value:
We will assume a two-tailed test:
We have seen the critical values for z tests at a = .05 levels of significance several times. To find the values for a = .01, we will go to the Standard Normal Distribution Table and find the z score cutting off .005 (.01 divided by 2 for a two-tailed test) of the area in the tail, which is z * = ±2.575. Notice that this cutoff is much higher than it was for a = .05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.
We can now calculate our test statistic. We will use s = 10 as our known population standard deviation and the following data to calculate our sample mean:
The average of these scores is M = 60.40. From this we calculate our z statistic as:
The Cohen’s d effect size calculation is:
Our obtained z statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:
Fail to reject H 0 . Based on the sample of 10 scores, we cannot conclude that there is an effect causing the mean ( M = 60.40) to be statistically significantly different from 60.00, z = 0.13, p > .01, d = 0.04, and the effect size supports this interpretation.
There are several other considerations we need to keep in mind when performing hypothesis testing.
In the Physicians’ Reactions case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type I error. More generally, a Type I error occurs when a significance test results in the rejection of a true null hypothesis.
The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error . Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.
A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called b (“beta”). The probability of correctly rejecting a false null hypothesis equals 1 − b and is called statistical power . Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the null).
Misconceptions about significance testing are common. This section lists three important ones.
Your answer should include mention of the baseline assumption of no difference between the sample and the population.
Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.
We always calculate an effect size to see if our research is practically meaningful or important. NHST (null hypothesis significance testing) is influenced by sample size but effect size is not; therefore, they provide complimentary information.
“ Null Hypothesis ” by Randall Munroe/xkcd.com is licensed under CC BY-NC 2.5 .)
Introduction to Statistics in the Psychological Sciences Copyright © 2021 by Linda R. Cote Ph.D.; Rupa G. Gordon Ph.D.; Chrislyn E. Randell Ph.D.; Judy Schmitt; and Helena Marvin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Setting up the hypotheses.
When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.
Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.
As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.
If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47.
Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47
In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.
We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.
When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .
Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.
When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .
Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.
Determining Cutoff Scores with Two-Tailed Tests
Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).
When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.
Figure 3. Critical value differences in one and two-tail tests. Photo Credit
Let’s re view th e set critical values for Z.
We discussed z-scores and probability in chapter 8. If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.
Review: Critical values, p-values, and significance level
There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.
Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.
Figure 4: The rejection region for a one-tailed test
The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.
The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.
Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.
Figure 5: Two-tailed rejection region
Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.
Calculate the test statistic: Z
Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z! The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.
Make a decision
To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.
The formula for our z- statistic has not changed:
To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.
Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.
Figure 6. Relation between α, z obt , and p
When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.
Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.
The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.
Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.
Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.
Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.
Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.
Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.
Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:
H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00
Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.
Our assumption of no difference, the null hypothesis, is that this mean is exactly
the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:
H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00
In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.
Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense
Figure 7: Rejection region for z* = ±1.96
Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:
So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:
Figure 8: Test statistic location
Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:
When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.
When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.
For mean differences like we calculated here, our effect size is Cohen’s d :
Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size
d | Interpretation |
---|---|
0.0 – 0.2 | negligible |
0.2 – 0.5 | small |
0.5 – 0.8 | medium |
0.8 – | large |
Table 1. Interpretation of Cohen’s d
Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed
to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.
You start by laying out the null hypothesis:
H 0 : There is no difference in the average building temperature H 0 : μ = 74
Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:
H A : The average building temperature is higher than claimed H A : μ > 74
Now that you have everything set up, you spend one week collecting temperature data:
Day | Temp |
Monday | 77 |
Tuesday | 76 |
Wednesday | 74 |
Thursday | 78 |
Friday | 78 |
You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):
z = 76.60 − 74.00 = 2.60 = 5.78
1.00/√5 0.45
This value falls so far into the tail that it cannot even be plotted on the distribution!
Figure 7: Obtained z-statistic
You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.
d = (76.60-74.00)/ 1= 2.60
The effect size you calculate is definitely large, meaning someone has some explaining to do!
Example: Different Significance Level
First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.
We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50
We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50
We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.
We can now calculate our test statistic. The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:
Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:
Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).
Errors in hypothesis testing.
Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.
The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.
Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.
Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.
It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.
Having read the chapter, students should be able to:
1. List hypotheses. Determine critical region. Calculate z. Compare z to critical region. Draw Conclusion. We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important
5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”
Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Z Table. Z Score Table. Normal Distribution Table. Standard Normal Table.
Z-Test is a statistical test which let’s us approximate the distribution of the test statistic under the null hypothesis using normal distribution .
Z-Test is a test statistic commonly used in hypothesis test when the sample data is large.For carrying out the Z-Test, population parameters such as mean, variance, and standard deviation should be known.
This test is widely used to determine whether the mean of the two samples are different when the variance is known. We make use of the Z score and the Z table for running the Z-Test.
A test statistic is a random variable that we calculate from the sample data to determine whether to reject the null hypothesis. This random variable is used to calculate the P-value, which indicates how strong the evidence is against the null hypothesis. Z-Test is such a test statistic where we make use of the mean value and z score to determine the P-value. Z-Test compares the mean of two large samples taken from a population when the variance is known.
Z-Test is usually used to conduct a hypothesis test when the sample size is greater than 30. This is because of the central limit theorem where when the sample gets larger, the distributed data graph starts resembling a bell curve and is considered to be distributed normally. Since the Z-Test follows normal distribution under the null hypothesis, it is the most suitable test statistic for large sample data.
Why do we use a large sample for conducting a hypothesis test?
In a hypothesis test, we are trying to reject a null hypothesis with the evidence that we should collect from sample data which represents only a portion of the population. When the population has a large size, and the sample data is small, we will not be able to draw an accurate conclusion from the test to prove our null hypothesis is false. As sample data provide us a door to the entire population, it should be large enough for us to arrive at a significant inference. Hence a sufficiently large data should be considered for a hypothesis test especially if the population is huge.
Z-Test can be considered as a test statistic for a hypothesis test to calculate the P-value. However, there are certain conditions that should be satisfied by the sample to run the Z-Test.
This is already mentioned above. The size of the sample is an important factor in Z-Testing as the Z-Test follows a normal distribution and so should the data. If the same size is less than 30, it is recommended to use a t-test instead
Each element in the sample, when considered single should be independent and shouldn’t have a relationship with another element.
This is ensured if the sample data is large.
Each data in the population should have an equal chance to be selected as one of the sample data.
When considering multiple sample data, ensuring that the size of each sample is the same for an accurate calculation of population parameters.
The population parameter, standard deviation must be given to run a Z-Test as we cannot perform the calculation without knowing it. If it is not directly given, then it assumed that the variance of the sample data is equal to the variance of the entire population.
If the conditions are satisfied, the Z-Test can be successfully implemented.
The null hypothesis is a statement of no effect and it supports the data which is already given. It is generally represented as :
The statement that we are trying to prove is the alternate hypothesis. It is represented as:
This is the representation of a bidirectional alternate hypothesis.
This is the representation of a one-directional alternate hypothesis that is represented in the right region of the graph.
This is the representation of a one-directional alternate hypothesis that is represented in the left region of the graph.
Alpha level or significant level is the probability of rejecting the null hypothesis when it is true. It is represented by ( α ). An alpha level must be chosen wisely so as to avoid the Type I and Type II errors.
If we choose a large alpha value such as 10%, it is likely to reject a null hypothesis when it is true. There is a probability of 10% for us to reject the null hypothesis. This is an error known as the Type I error.
On the other hand, if we choose an alpha level as low as 1%, there is a chance to accept the null hypothesis even if it is false. That is we reject the alternate hypothesis to favor the null hypothesis. This is the Type II error.
Hence the alpha level should be chosen in such a way that the chance of making Type I or Type II error is minimal. For this reason, the alpha level is commonly selected as 5% which is proven best to avoid errors.
The critical value is the point in the normal distribution graph that splits the graph into two regions: the acceptance region and the rejection regions. It can be also described as the extreme value for which a null hypothesis can be accepted. This critical value of Z can be found from the Z table .
The sample data that we choose to test is converted into a single value. This is known as the test statistic. This value is compared to the null value. If the test statistic significantly differs from the null value, the null value is rejected.
Now, we have to determine whether the test statistic we have calculated comes under the acceptance region or the rejection region. For this, the test statistic is compared with the critical value to know whether we should accept or reject a null hypothesis.
Z-Test can be used to run a hypothesis test for a single sample or to compare the mean of two samples. There are two common types of Z-Test
This is the most basic type of hypothesis test that is widely used. For running an one-sample Z-Test, all we need to know is the mean and standard deviation of the population. We consider only a single sample for a one-sample Z-Test. One-sample Z-Test is used to test whether the population parameter is different from the hypothesized value i.e whether the mean of the population is equal to, less than or greater than the hypothesized value.
The equation for finding the value of Z is:
The following are the assumptions that are generally taken for a one-sampled Z-Test:
A two-sample Z-Test is used whenever there is a comparison between two independent samples. It is used to check whether the difference between the means is equal to zero or not. Suppose if we want to know whether men or women prefer to drive more in a city, we use a two-sample Z-Test as it is the comparison of two independent samples of men and women.
The following are the assumptions that are generally taken for a two-sample Z-Test:
A critical value is a line that splits a normally distributed graph into two different sections. Namely the ‘Rejection region’ and ‘Acceptance region’. If your test value falls in the ‘Rejection region’, then the null hypothesis is rejected and if your test value falls in the ‘Accepted region’, then the null hypothesis is accepted.
Critical Value Vs Significant Value
Significant level, alpha is the probability of rejecting a null hypothesis when it is actually true. While the critical value is the extreme value up to which a null hypothesis is true. There migh come a confusion regarding both of these parameters.
Critical value is a value that lies in critical region. It is in fact the boundary value of the rejection region. Also, it is the value up to which the null hypothesis is true. Hence the critical value is considered to be the point at which the null hypothesis is true or is rejected.
Critical value gives a point of extremity whose probability is indicated by the significant level. Significant level is pre-selected for a hypothesis test and critical value is calculated from this Alpha value. Critical value is a point represented as Z score and Significant level is a probability.
Z-Test are used when the sample size exceeds 30. As Z-Test follows normal distribution, large sample size can be taken for the Z-Test. Z-Test indicates the distance of a data point from the mean of the data set in terms of standard deviation. Also. this test can only be used if the standard deviation of the data set is known.
T-Test is based on T distribution in which the mean value is known and the variance could be calculated from the sample. T-Test is most preferred to know the difference between the statistical parameters of two samples as the standard deviation of the samples are not usually given in a two-sample test for running the Z-Test. Also, if the sample size is less than 30, T-Test is preferred.
This chapter presents two common tests of the hypothesis that a population mean equals a particular value and of the hypothesis that two population means are equal: the z test and the t test. These tests are approximate : They are based on approximations to the probability distribution of the test statistic when the null hypothesis is true, so their significance levels are not exactly what they claim to be. If the sample size is reasonably large and the population from which the sample is drawn has a nearly normal distribution —a notion defined in this chapter—the nominal significance levels of the tests are close to their actual significance levels. If these conditions are not met, the significance levels of the approximate tests can differ substantially from their nominal values. The z test is based on the normal approximation ; the t test is based on Student's t curve, which approximates some probability histograms better than the normal curve does. The chapter also presents the deep connection between hypothesis tests and confidence intervals, and shows how to compute approximate confidence intervals for the population mean of nearly normal populations using Student's t -curve.
where \(\phi\) is the pooled sample percentage of the two samples. The estimate of \(SE(\phi^{t-c})\) under the null hypothesis is
\[ se = s^*\times(1/n_t + 1/n_c)^{1/2}, \]
where \(n_t\) and \(n_c\) are the sizes of the two samples. If the null hypothesis is true, the Z statistic,
\[ Z=\phi^{t-c}/se, \]
is the original test statistic \(\phi^{t-c}\) in approximately standard units , and Z has a probability histogram that is approximated well by the normal curve , which allowed us to select the rejection region for the approximate test.
This strategy—transforming a test statistic approximately to standard units under the assumption that the null hypothesisis true, and then using the normal approximation to determine the rejection region for the test—works to construct approximate hypothesis tests in many other situations, too. The resulting hypothesis test is called a z test. Suppose that we are testing a null hypothesis using a test statistic \(X\) , and the following conditions hold:
Then, under the null hypothesis, the probability histogram of the Z statistic
\[ Z = (X-E(X))/se \]
is approximated well by the normal curve, and we can use the normal approximation to select the rejection region for the test using \(Z\) as the test statistic. If the null hypothesis is true,
\[ P(Z < z_a) \approx a \]
\[ P(Z > z_{1-a} ) \approx a, \]
\[ P(|Z| > z_{1-a/2} ) \approx a. \]
These three approximations yield three different z tests of the hypothesis that \(\mu = \mu_0\) at approximate significance level \(a\) :
The word "tail" refers to the tails of the normal curve: In a left-tail test, the probability of a Type I error is approximately the area of the left tail of the normal curve, from minus infinity to \(z_a\) . In a right-tail test, the probability of a Type I error is approximately the area of the right tail of the normal curve, from \(z_{1-a}\) to infinity. In a two-tail test, the probability of a Type I error is approximately the sum of the areas of both tails of the normal curve, the left tail from minus infinity to \(z_{a/2}\) and the right tail from \(z_{1-a/2}\) to infinity. All three of these tests are called z tests. The observed value of Z is called the z score .
Which of these three tests, if any, should one use? The answer depends on the probability distribution of Z when the alternative hypothesis is true. As a rule of thumb, if, under the alternative hypothesis, \(E(Z) , use the left-tail test. If, under the alternative hypothesis, \(E(Z) > 0\) , use the right-tail test. If, under the alternative hypothesis, it is possible that \(E(Z) and it is possible that \(E(Z) > 0\) , use the two-tail test. If, under the alternative hypothesis, \(E(Z) = 0\) , consult a statistician. Generally (but not always), this rule of thumb selects the test with the most power for a given significance level.
Each of the three z tests gives us a family of procedures for testing the null hypothesis at any (approximate) significance level \(a\) between 0 and 100%—we just use the appropriate quantile of the normal curve. This makes it particularly easy to find the P value for a z test. Recall that the P value is the smallest significance level for which we would reject the null hypothesis, among a family of tests of the null hypothesis at different significance levels.
Suppose the z score (the observed value of \(Z\) ) is \(x\) . In a left-tail test, the P value is the area under the normal curve to the left of \(x\) : Had we chosen the significance level \(a\) so that \(z_a=x\) , we would have rejected the null hypothesis, but we would not have rejected it for any smaller value of \(a\) , because for all smaller values of \(a\) , \(z_a . Similarly, for a right-tail z test, the P value is the area under the normal curve to the right of \(x\) : If \(x=z_{1-a}\) we would reject the null hypothesis at approximate significance level \(a\) , but not at smaller significance levels. For a two-tail z test, the P value is the sum of the area under the normal curve to the left of \(-|x|\) and the area under the normal curve to the right of \(|x|\) .
Finding P values and specifying the rejection region for the z test involves the probability distribution of \(Z\) under the assumption that the null hypothesis is true. Rarely is the alternative hypothesis sufficiently detailed to specify the probability distribution of \(Z\) completely, but often the alternative does help us choose intelligently among left-tail, right-tail, and two-tail z tests. This is perhaps the most important issue in deciding which hypothesis to take as the null hypothesis and which as the alternative: We calculate the significance level under the null hypothesis, and that calculation must be tractable.
However, to construct a z test, we need to know the expected value and SE of the test statistic under the null hypothesis. Usually it is easy to determine the expected value, but often the SE must be estimated from the data. Later in this chapter we shall see what to do if the SE cannot be estimated accurately, but the shape of the distribution of the numbers in the population is known. The next section develops z tests for the population percentage and mean, and for the difference between two population means.
The central limit theorem assures us that the probability histogram of the sample mean of random draws with replacement from a box of tickets—transformed to standard units—can be approximated increasingly well by a normal curve as the number of draws increases. In the previous section, we learned that the probability histogram of a sum or difference of independent sample means of draws with replacement also can be approximated increasingly well by a normal curve as the two sample sizes increase. We shall use these facts to derive z tests for population means and percentages and differences of population means and percentages.
Suppose we have a population of \(N\) units of which \(G\) are labeled "1" and the rest are labeled "0." Let \(p = G/N\) be the population percentage. Consider testing the null hypothesis that \(p = p_0\) against the alternative hypothesis that \(p \ne p_0\) , using a random sample of \(n\) units drawn with replacement. (We could assume instead that \(N >> n\) and allow the draws to be without replacement.)
Under the null hypothesis, the sample percentage
\[ \phi = \frac{\mbox{# tickets labeled "1" in the sample}}{n} \]
has expected value \(E(\phi) = p_0\) and standard error
\[ SE(\phi) = \sqrt{\frac{p_0 \times (1 - p_0)}{n}}. \]
Let \(Z\) be \(\phi\) transformed to standard units :
\[ Z = (\phi - p_0)/SE(\phi). \]
Provided \(n\) is large and \(p_0\) is not too close to zero or 100% (say \(n \times p > 30\) and \(n \times (1-p) > 30)\) , the probability histogram of \(Z\) will be approximated reasonably well by the normal curve, and we can use it as the Z statistic in a z test. For example, if we reject the null hypothesis when \(|Z| > 1.96\) , the significance level of the test will be about 95%.
The approach in the previous subsection applies, mutatis mutandis , to testing the hypothesis that the population mean equals a given value, even when the population contains numbers other than just 0 and 1. However, in contrast to the hypothesis that the population percentage equals a given value, the null hypothesis that a more general population mean equals a given value does not specify the SD of the population, which poses difficulties that are surmountable (by approximation and estimation) if the sample size is large enough. (There are also nonparametric methods that can be used.)
Consider testing the null hypothesis that the population mean \(\mu\) is equal to a specific null value \(\mu_0\) , against the alternative hypothesis that \(\mu , on the basis of a random sample with replacement of size \(n\) . Recall that the sample mean \(M\) of \(n\) random draws with or without replacement from a box of numbered tickets is an unbiased estimator of the population mean \(\mu\) : If
\[ M = \frac{\mbox{sum of sample values}}{n}, \]
\[ E(M) = \mu = \frac{\mbox{sum of population values}}{N}, \]
where \(N\) is the size of the population. The population mean determines the expected value of the sample mean. The SE of the sample mean of a random sample with replacement is
\[ \frac{SD(\mbox{box})}{\sqrt{n}}, \]
where SD(box) is the SD of the list of all the numbers in the box, and \(n\) is the sample size. As a special case, the sample percentage \phi of \(n\) independent random draws from a 0-1 box is an unbiased estimator of the population percentage p , with SE equal to
\[ \sqrt{\frac{p\times(1-p)}{n}}. \]
In testing the null hypothesis that a population percentage \(p\) equals \(p_0\) , the null hypothesis specifies not only the expected value of the sample percentage \phi, it automatically specifies the SE of the sample percentage as well, because the SD of the values in a 0-1 box is determined by the population percentage \(p\) :
\[ SD(box) = \sqrt{p\times(1-p)}. \]
The null hypothesis thus gives us all the information we need to standardize the sample percentage under the null hypothesis. In contrast, the SD of the values in a box of tickets labeled with arbitrary numbers bears no particular relation to the mean of the values, so the null hypothesis that the population mean \(\mu\) of a box of tickets labeled with arbitrary numbers equals a specific value \(\mu_0\) determines the expected value of the sample mean, but not the standard error of the sample mean. To standardize the sample mean to construct a z test for the value of a population mean, we need to estimate the SE of the sample mean under the null hypothesis. When the sample size is large, the sample standard deviation s> is likely to be close to the SD of the population, and
\[ se=\frac{s}{\sqrt{n}} \]
is likely to be an accurate estimate of \(SE(M)\) . The central limit theorem tells us that when the sample size \(n\) is large, the probability histogram of the sample mean, converted to standard units, is approximated well by the normal curve. Under the null hypothesis,
\[ E(M) = \mu_0, \]
and thus when \(n\) is large
\[ Z = \frac{M-\mu_0}{s/\sqrt{n}} \]
has expected value zero, and its probability histogram is approximated well by the normal curve, so we can use \(Z\) as the Z statistic in a z test. If the alternative hypothesis is true, the expected value of \(Z\) could be either greater than zero or less than zero, so it is appropriate to use a two-tail z test. If the alternative hypothesis is \(\mu > \mu_0\) , then under the alternative hypothesis, the expected value of \(Z\) is greater than zero, and it is appropriate to use a right-tail z test. If the alternative hypothesis is \(\mu , then under the alternative hypothesis, the expected value of \(Z\) is less than zero, and it is appropriate to use a left-tail z test.
Consider the problem of testing the hypothesis that two population means are equal, using random samples from the two populations. Different sampling designs lead to different hypothesis testing procedures. In this section, we consider two kinds of random samples from the two populations: paired samples and independent samples , and construct z tests appropriate for each.
Consider a population of \(N\) individuals, each of whom is labeled with two numbers. For example, the \(N\) individuals might be a group of doctors, and the two numbers that label each doctor might be the annual payments to the doctor by an HMO under the terms of the current contract and under the terms of a proposed revision of the contract. Let the two numbers associated with individual \(i\) be \(c_i\) and \(t_i\) . (Think of \(c\) as control and \(t\) as treatment . In this example, control is the current contract, and treatment is the proposed contract.) Let \(\mu_c\) be the population mean of the \(N\) values
\[ \{c_1, c_2, \ldots, c_N \}, \]
and let \(\mu_t\) be the population mean of the \(N\) values
\[ \{t_1, t_2, \ldots, t_N\}. \]
Suppose we want to test the null hypothesis that
\[ \mu = \mu_t - \mu_c = \mu_0 \]
against the alternative hypothesis that \(\mu . With \(\mu_0=\$0\) , this null hypothesis is that the average annual payment to doctors under the proposed revision would be the same as the average payment under the current contract, and the alternative is that on average doctors would be paid less under the new contract than under the current contract. With \(\mu_0=-\$5,000\) , this null hypothesis is that the proposed contract would save the HMO an average of $5,000 per doctor, compared with the current contract; the alternative is that under the proposed contract, the HMO would save even more than that. With \(\mu_0=\$1,000\) , this null hypothesis is that doctors would be paid an average of $1,000 more per year under the new contract than under the old one; the alternative hypothesis is that on average doctors would be paid less than an additional $1,000 per year under the new contract—perhaps even less than they are paid under the current contract. For the remainder of this example, we shall take \(\mu_0=\$1,000\) .
The data on which we shall base the test are observations of both \(c_i\) and \(t_i\) for a sample of \(n\) individuals chosen at random with replacement from the population of \(N\) individuals (or a simple random sample of size \(n ): We select \(n\) doctors at random from the \(N\) doctors under contract to the HMO, record the current annual payments to them, and calculate what the payments to them would be under the terms of the new contract. This is called a paired sample , because the samples from the population of control values and from the population of treatment values come in pairs: one value for control and one for treatment for each individual in the sample. Testing the hypothesis that the difference between two population means is equal to \(\mu_0\) using a paired sample is just the problem of testing the hypothesis that the population mean \(\mu\) of the set of differences
\[ d_i = t_i - c_i, \;\; i= 1, 2, \ldots, N, \]
is equal to \(\mu_0\) . Denote the \(n\) (random) observed values of \(c_i\) and \(t_i\) by \(\{C_1, C_2, \ldots, C_n\}\) and \(\{T_1, T_2, \ldots, T_n \}\) , respectively. The sample mean \(M\) of the differences between the observed values of \(t_i\) and \(c_i\) is the difference of the two sample means:
\[ M = \frac{(T_1-C_1)+(T_2-C_2) + \cdots + (T_n-C_n)}{n} = \frac{T_1+T_2+ \cdots + T_n}{n} - \frac{C_1+C_2+ \cdots + C_n}{n} \]
\[ = (\mbox{sample mean of observed values of } t_i) - (\mbox{sample mean of observed values of } c_i). \]
\(M\) is an unbiased estimator of \(\mu\) , and if n is large, the normal approximation to its probability histogram will be accurate. The SE of \(M\) is the population standard deviation of the \(N\) values \(\{d_1, d_2, \ldots, d_N\}\) , which we shall denote \(SD_d\) , divided by the square root of the sample size, \(n^{1/2}\) . Let \(sd\) denote the sample standard deviation of the \(n\) observed differences \((T_i - C_i), \;\; i=1, 2, \ldots, n\) :
\[ sd = \sqrt{\frac{(T_1-C_1-M)^2 + (T_2-C_2-M)^2 + \cdots + (T_n-C_n-M)^2}{n-1}} \]
(recall that \(M\) is the sample mean of the observed differences). If the sample size \(n\) is large, sd is very likely to be close to SD( d ), and so, under the null hypothesis,
\[ Z = \frac{M-\mu_0}{sd/n^{1/2}} \]
has expected value zero, and when \(n\) is large the probability histogram of \(Z\) can be approximated well by the normal curve. Thus we can use \(Z\) as the Z statistic in a z test of the null hypothesis that \(\mu=\mu_0\) . Under the alternative hypothesis that \(\mu (doctors on the average are paid less than an additional $1,000 per year under the new contract), the expected value of \(Z\) is less than zero, so we should use a left-tail z test. Under the alternative hypothesis \(\mu\ne\mu_0\) (on average, the difference in average annual payments to doctors is not an increase of $1,000, but some other number instead), the expected value of \(Z\) could be positive or negative, so we would use a two-tail z test. Under the alternative hypothesis that \(\mu>\mu_0\) (on average, under the new contract, doctors are paid more than an additional $1,000 per year), the expected value of \(Z\) would be greater than zero, so we should use a right-tail z test.
Consider two separate populations of numbers, with population means \(\mu_t\) and \(\mu_c\) , respectively. Let \(\mu=\mu_t-\mu_c\) be the difference between the two population means. We would like to test the null hypothesis that \(\mu=\mu_0\) against the alternative hypothesis that \(\mu>0\) . For example, let \(\mu_t\) be the average annual payment by an HMO to doctors in the Los Angeles area, and let \(\mu_c\) be the average annual payment by the same HMO to doctors in the San Francisco area. Then the null hypothesis with \(\mu_0=0\) is that the HMO pays doctors in the two regions the same amount annually, on average; the alternative hypothesis is that the average annual payment by the HMO to doctors differs between the two areas. Suppose we draw a random sample of size \(n_t\) with replacement from the first population, and independently draw a random sample of size \(n_c\) with replacement from the second population. Let \(M_t\) and \(M_c\) be the sample means of the two samples, respectively, and let
\[ M = M_t - M_c \]
be the difference between the two sample means. Because the expected value of \(M_t\) is \(\mu_t\) and the expected value of \(M_c\) is \(\mu_c\) , the expected value of \(M\) is
\[ E(M) = E(M_t - M_c) = E(M_t) - E(M_c) = \mu_t - \mu_c = \mu. \]
Because the two random samples are independent , \(M_t\) and \(-M_c\) are independent random variables, and the SE of their sum is
\[ SE(M) = (SE^2(M_t) + SE^2(M_c))^{1/2}. \]
Let \(s_t\) and \(s_c\) be the sample standard deviations of the two samples, respectively. If \(n_t\) and \(n_c\) are both very large, the two sample standard deviations are likely to be close to the standard deviations of the corresponding populations, and so \(s_t/n_t^{1/2}\) is likely to be close to \(SE(M_t)\) , and \(s_c/n_c^{1/2}\) is likely to be close to \(SE(M_c)\) . Therefore, the pooled estimate of the standard error
\[ se_\mbox{diff} = ( (s_t/n_t^{1/2})^2 + (s_c/n_c^{1/2})^2)^{1/2} = \sqrt{ s_t^2/n_t + s_c^2/n_c} \]
is likely to be close to \(SE(M)\) . Under the null hypothesis, the statistic
\[ Z = \frac{M - \mu_0}{se_\mbox{diff}} = \frac{M_1 - M_2 - \mu_0}{\sqrt{ s_t^2/n_t + s_c^2/n_c}} \]
has expected value zero and its probability histogram is approximated well by the normal curve, so we can use it as the Z statistic in a z test.
Under the alternative hypothesis
\[ \mu = \mu_t - \mu_c > \mu_0, \]
the expected value of \(Z\) is greater than zero, so it is appropriate to use a right-tail z test.
If the alternative hypothesis were \(\mu \ne \mu_0\) , under the alternative the expected value of \(Z\) could be greater than zero or less than zero, so it would be appropriate to use a two-tail z test. If the alternative hypothesis were \(\mu , under the alternative the expected value of \(Z\) would be less than zero, so it would be appropriate to use a left-tail z test.
The following exercises check that you can compute the z test for a population mean or a difference of population means. The exercises are dynamic: the data will tend to change when you reload the page.
For the nominal significance level of the z test for a population mean to be approximately correct, the sample size typically must be large. When the sample size is small, two factors limit the accuracy of the z test: the normal approximation to the probability distribution of the sample mean can be poor, and the sample standard deviation can be an inaccurate estimate of the population standard deviation, so se is not an accurate estimate of the SE of the test statistic Z . For nearly normal populations , defined in the next subsection, the probability distribution of the sample mean is nearly normal even when the sample size is small, and the uncertainty of the sample standard deviation as an estimate of the population standard deviation can be accounted for by using a curve that is broader than the normal curve to approximate the probability distribution of the (approximately) standardized test statistic. The broader curve is Student's t curve . Student's t curve depends on the sample size: The smaller the sample size, the more spread out the curve.
A list of numbers is nearly normally distributed if the fraction of values in any range is close to the area under the normal curve for the corresponding range of standard units—that is, if the list has mean \(\mu\) and standard deviation SD, and for every pair of values \(a < b\) ,
\[ \mbox{ the fraction of numbers in the list between } a \mbox{ and } b \approx \mbox{the area under the normal curve between } (a - \mu)/SD \mbox{ and } (b - \mu)/SD. \]
A list is nearly normally distributed if the normal curve is a good approximation to the histogram of the list transformed to standard units. The histogram of a list that is approximately normally distributed is (nearly) symmetric about some point, and is (nearly) bell-shaped.
No finite population can be exactly normally distributed, because the area under the normal curve between every two distinct values is strictly positive—no matter how large or small the values nor how close together they are. No population that contains only a finite number of distinct values can be exactly normally distributed, for the same reason. In particular, populations that contain only zeros and ones are not approximately normally distributed, so results for the sample mean of samples drawn from nearly normally distributed populations need not apply to the sample percentage of samples drawn from 0-1 boxes. Such results will be more accurate for the sample percentage when the population percentage is close to 50% than when the population percentage is close to 0% or 100%, because then the histogram of population values is more nearly symmetric.
Suppose a population is nearly normally distributed. Then a histogram of the population is approximately symmetric about the mean of the population. The fraction of numbers in the population within ±1 SD of the mean of the population is about 68%, the fraction of numbers within ±2 SD of the mean of the population is about 95%, and the fraction of numbers in the population within ±3 SD of the mean of the population is about 99.7%.
The following exercises check that you understand what it means for a list to be nearly normally distributed. The exercises are dynamic: the data tend to change when you reload the page.
Student's t curve is similar to the normal curve, but broader. It is positive, has a single maximum, and is symmetric about zero. The total area under Student's t curve is 100%. Student's t curve approximates some probability histograms more accurately than the normal curve does. There are actually infinitely many Student t curves, one for each positive integer value of the degrees of freedom. As the degrees of freedom increases, the difference between Student's t curve and the normal curve decreases.
Consider a population of \(N\) units labeled with numbers. Let \(\mu\) denote the population mean of the \(N\) numbers, and let SD denote the population standard deviation of the \(N\) numbers. Let \(M\) denote the sample mean of a random sample of size \(n\) drawn with replacement from a population, and let s> denote the sample standard deviation of the sample. The expected value of \(M\) is \(\mu\) , and the SE of \(M\) is \(SD/n^{1/2}\) . Let
\[ Z = (M - \mu)/(SD/n^{1/2}). \]
Then the expected value of \(Z\) is zero, the SE of \(Z\) is 1, and if \(n\) is large enough, the normal curve is a good approximation to the probability histogram of \(Z\) . The closer to normal the distribution of values in the population is, the smaller \(n\) needs to be for the normal curve to be a good approximation to the distribution of \(Z\) . Consider the statistic
\[ T = \frac{M - \mu}{s/n^{1/2}}, \]
which replaces SD by its estimated value (the sample standard deviation \(s\) ). If \(n\) is large enough, \(s\) is very likely to be close to SD, so \(T\) will be close to \(Z\) ; the normal curve will be a good approximation to the probability histogram of \(T\) ; and we can use \(T\) as the Z statistic in a z test of hypotheses about \(\mu\) .
For many populations, when the sample size is small—say less than 25, but the accuracy depends on the population—the normal curve is not a good approximation to the probability histogram of \(T\) . For nearly normally distributed populations, when the sample size is intermediate—say 25–100, but again this depends on the population—the normal curve is a good approximation to the probability histogram of \(Z\) , but not to the probability histogram of \(T\) , because of the variability of the sample standard deviation s> from sample to sample, which tends to broaden the probability distribution of \(T\) (i.e., to make \(SE(T)>1\) ).
When you first load this page, the degrees of freedom will be set to 25, and the region from -1.96 to 1.96 will be hilighted. The area under the normal curve between ±1.96 is 95%, but for Student's t curve with 25 degrees of freedom, the area is about 93.9%: Student's t curve with d.f.=25 is broader than the normal curve. Increase the degrees of freedom to 200; you will see that the Student t curve gets slightly narrower, and the area under the curve between ±1.96 is about 94.9%.
We define quantiles of Student t curves in the same way we defined quantiles of the normal curve: For any number a between 0 and 100%, the a quantile of Student's t curve with \(d.f.=d\) , \(t_{d,a}\) , is the unique value such that the area under the Student t curve with d degrees of freedom from minus infinity to \(t_{d,a}\) is equal to \(a\) . For example, \(t_{d,0.5} = 0\) for all values of \(d\) . Generally, the value of \(t_{d,a}\) depends on the degrees of freedom \(d\) . The probability calculator allows you to find quantiles of Student's t curve.
We can use Student's t curve to construct approximate tests of hypotheses about the population mean \(\mu\) when the population standard deviation is unknown, for intermediate values of the sample size \(n\) . The approach is directly analogous to the z test, but instead of using a quantile of the normal curve, we use the corresponding quantile of Student's t curve (with the appropriate number of degrees of freedom). However, for the test to be accurate when \(n\) is small or intermediate, the distribution of values in the population must be nearly normal for the test to have approximately its nominal level. This is a somewhat bizarre restriction: It may require a very large sample to detect that the population is not nearly normal—but if the sample is very large, we can use the z test instead of the t test, so we don't need to rely as much on the assumption. It is my opinion that the t test is over-taught and overused—because its assumptions are not verifiable in the situations where it is potentially useful.
Consider testing the null hypothesis that \(\mu=\mu_0\) using the sample mean \(M\) and sample standard deviation s> of a random sample of size \(n\) drawn with replacement from a population that is known to have a nearly normal distribution. Define
\[ T = \frac{M - \mu_0}{s/n^{1/2}}. \]
Under the null hypothesis, if \(n\) is not too small, Student's t curve with \(n-1\) degrees of freedom will be an accurate approximation to the probability histogram of \(T\) , so
\[ P(T < t_{n-1,a}), \]
\[ P(T > t_{n-1,1-a}), \]
\[ P(|T| > t_{n-1,1-a/2}) \]
all are approximately equal to \(a\) . As we saw earlier in this chapter for the Z statistic, these three approximations give three tests of the null hypothesis \(\mu=\mu_0\) at approximate significance level \(a\) —a left-tail t test, a right-tail t test, and a two-tail t test:
To decide which t test to use, we can apply the same rule of thumb we used for the z test:
P-values for t tests are computed in much the same way as P-values for z tests. Let t be the observed value of \(T\) (the t score). In a left-tail t test, the P-value is the area under Student's t curve with \(n-1\) degrees of freedom, from minus infinity to \(t\) . In a right-tail t test, the P-value is the area under Student's t curve with \(n-1\) degrees of freedom, from \(t\) to infinity. In a two-tail t test, the P-value is the total area under Student's t curve with \(n-1\) degrees of freedom between minus infinity and \(-|t|\) and between \(|t|\) and infinity.
There are versions of the t test for comparing two means, as well. Just like for the z test, the method depends on how the samples from the two populations are drawn. For example, if the two samples are paired (if we are sampling individuals labeled with two numbers and for each individual in the sample, we observe both numbers), we may base the t test on the sample mean of the paired differences and the sample standard deviation of the paired differences. Let \(\mu_1\) and \(\mu_2\) be the means of the two populations, and let
\[ \mu = \mu_1 - \mu_2. \]
The \(T\) statistic to test the null hypothesis that \(\mu=\mu_0\) is
\[ T = \frac{(\mbox{sample mean of differences}) - \mu_0 }{(\mbox{sample standard deviation of differences})/n^{1/2}}, \]
and the appropriate curve to use to find the rejection region for the test is Student's t curve with \(n-1\) degrees of freedom, where \(n\) is the number of individuals (differences) in the sample.
Two-sample t tests for a difference of means using independent samples depend on additional assumptions, such as equality of the two population standard deviations; we shall not present such tests here. The following exercises check your ability to compute t tests. The exercises are dynamic: the data tend to change when you reload the page.
There is a deep connection between hypothesis tests about parameters, and confidence intervals for parameters. If we have a procedure for constructing a level \(100\% \times (1-a)\) confidence interval for a parameter \(\mu\) , then the following rule is a two-sided significance level \(a\) test of the null hypothesis that \(\mu = \mu_0\) :
reject the null hypothesis if the confidence interval does not contain \(\mu_0\).
Similarly, suppose we have an hypothesis-testing procedure that lets us test the null hypothesis that \(\mu=\mu_0\) for any value of \(\mu_0\) , at significance level \(a\) . Define
\(A\) = (all values of \(\mu_0\) for which we would not reject the null hypothesis that \(\mu = \mu_0\)).
Then \(A\) is a \(100\% \times (1-a)\) confidence set for \(\mu\) :
\[ P( A \mbox{ contains the true value of } \mu ) = 100\% \times (1-a). \]
(A confidence set is a generalization of the idea of a confidence interval: a \(1-a\) confidence set for the parameter \(\mu\) is a random set that has probability \(1-a\) of containing \(\mu\) . As is the case with confidence intervals, the probability makes sense only before collecting the data.) The set \(A\) might or might not be an interval, depending on the nature of the test. If one starts with a two-tail z test or two-tail t test, one ends up with a confidence interval rather than a more general confidence set.
The t test lets us test the hypothesis that the population mean \(\mu\) is equal to \(\mu_0\) at approximate significance level a using a random sample with replacement of size n from a population with a nearly normal distribution. If the sample size n is small, the actual significance level is likely to differ considerably from the nominal significance level. Consider a two-sided t test of the hypothesis \(\mu=\mu_0\) at significance level \(a\) . If the sample mean is \(M\) and the sample standard deviation is \(s\) , we would not reject the null hypothesis at significance level \(a\) if
\[ \frac{|M-\mu_0|}{s/n^{1/2}} \le t_{n-1,1-a/2}. \]
We rearrange this inequality:
\[ -t_{n-1,1-a/2} \le \frac{M-\mu_0}{s/n^{1/2}} \le t_{n-1,1-a/2} \]
\[ -t_{n-1,1-a/2} \times s/n^{1/2} \le M - \mu_0 \le t_{n-1,1-a/2} \times s/n^{1/2} \]
\[ -M - t_{n-1,1-a/2} \times s/n^{1/2} \le - \mu_0 \le -M + t_{n-1,1-a/2} \times s/n^{1/2} \]
\[ M + t_{n-1,1-a/2} \times s/n^{1/2} \le \mu_0 \le M - t_{n-1,1-a/2} \times s/n^{1/2} \]
That is, we would not reject the hypothesis \(\mu = \mu_0\) provided \(\mu_0\) is in the interval
\[ [M - t_{n-1,1-a/2} \times s/n^{1/2}, M + t_{n-1,1-a/2} \times s/n^{1/2}]. \]
Therefore, that interval is a \(100\%-a\) confidence interval for \(\mu\) :
\[ P([M - t_{n-1,1-a/2} \times s/n^{1/2}, M + t_{n-1,1-a/2} \times s/n^{1/2}] \mbox{ contains } \mu) \approx 1-a. \]
The following exercise checks that you can use Student's t curve to construct a confidence interval for a population mean. The exercise is dynamic: the data tend to change when you reload the page.
In hypothesis testing, a Z statistic is a random variable whose probability histogram is approximated well by the normal curve if the null hypothesis is correct: If the null hypothesis is true, the expected value of a Z statistic is zero, the SE of a Z statistic is approximately 1, and the probability that a Z statistic is between \(a\) and \(b\) is approximately the area under the normal curve between \(a\) and \(b\) . Suppose that the random variable \(Z\) is a Z statistic. If, under the alternative hypothesis, \(E(Z) , the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the left-tailed z test: Reject the null hypothesis if \(Z , where \(z_a\) is the \(a\) quantile of the normal curve. If, under the alternative hypothesis, \(E(Z)>0\) , the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the right-tailed z test: Reject the null hypothesis if \(Z>z_{1-a}\) . If, under the alternative hypothesis, \(E(Z)\ne 0 \) but could be greater than 0 or less than 0, the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the two-tailed z test: reject the null hypothesis if \(|Z|>z_{1-a/2}\) . If, under the alternative hypothesis, \(E(Z)=0\) , a z test probably is not appropriate—consult a statistician. The exact significance levels of these tests differ from \(a\) by an amount that depends on how closely the normal curve approximates the probability histogram of \(Z\) .
Z statistics often are constructed from other statistics by transforming approximately to standard units, which requires knowing the expected value and SE of the original statistic on the assumption that the null hypothesis is true. Let \(X\) be a test statistic; let \(E(X)\) be the expected value of \(X\) if the null hypothesis is true, and let \(se\) be approximately equal to the SE of \(X\) if the null hypothesis is true. If \(X\) is a sample sum of a large random sample with replacement, a sample mean of a large random sample with replacement, or a sum or difference of independent sample means of large samples with replacement,
\[ Z = \frac{X-E(X)}{se} \]
is a Z statistic.
Consider testing the null hypothesis that a population percentage \(p\) is equal to the value \(p_0\) on the basis of the sample percentage \phi of a random sample of size \(n\) with replacement. Under the null hypothesis, \(E(\phi)=p_0\) and
\[ SE(\phi) = \sqrt{\frac{p_0\times(1-p_0)}{n}}, \]
and if \(n\) is sufficiently large (say \(n \times p > 30\) and \(n \times (1-p)>30\) , but this depends on the desired accuracy), the normal approximation to
\[ Z = \frac{\phi-p_0}{\sqrt{(p_0 \times (1-p_0))/n}} \]
will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis \(p=p_0\) .
Consider testing the null hypothesis that a population mean \(\mu\) is equal to the value \(\mu_0\) , on the basis of the sample mean \(M\) of a random sample of size \(n\) with replacement. Let \(s\) denote the sample standard deviation. Under the null hypothesis, \(E(M)=\mu_0\) , and if \(n\) is large,
\[ SE(M)=SD/n^{1/2} \approx s/n^{1/2}, \]
and the normal approximation to
\[ Z = \frac{M-\mu_0}{s/n^{1/2}} \]
will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis \(\mu=\mu_0\) .
Consider a population of \(N\) individuals, each labeled with two numbers. The \(i\) th individual is labeled with the numbers \(c_i\) and \(t_i\) , \(i=1, 2, \ldots, N\) . Let \(\mu_c\) be the population mean of the \(N\) values \(\{c_1, \ldots, c_N\}\) and let \(\mu_t\) be the population mean of the \(N\) values \(\{t_1, \ldots, t_N \}\) . Let \(\mu=\mu_t-\mu_c\) be the difference between the two population means. Consider testing the null hypothesis that \(\mu=\mu_0\) on the basis of a paired random sample of size \(n\) with replacement from the population: that is, a random sample of size \(n\) is drawn with replacement from the population, and for each individual \(i\) in the sample, \(c_i\) and \(t_i\) are observed. This is equivalent to testing the hypothesis that the population mean of the \(N\) values \(\{(t_1-c_1), \ldots, (t_N-c_N)\}\) is equal to \(\mu_0\) , on the basis of the random sample of size \(n\) drawn with replacement from those \(N\) values. Let \(M_t\) be the sample mean of the \(n\) observed values of \(t_i\) and let \(M_c\) be the sample mean of the \(n\) observed values of \(c_i\) . Let \(sd\) denote the sample standard deviation of the \(n\) observed differences \(\{(t_i-c_i)\}\) . Under the null hypothesis, the expected value of \(M_t-M_c\) is \(\mu_0\) , and if \(n\) is large,
\[ SE(M_t-M_c) \approx sd/n^{1/2}, \]
and the normal approximation to the probability histogram of
\[ Z = \frac{M_t-M_c-\mu_0}{sd/n^{1/2}} \]
will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis that \(\mu_t-\mu_c=\mu_0\) .
Consider testing the hypothesis that the difference ( \(\mu_t-\mu_c\) ) between two population means, \(\mu_c\) and \(\mu_t\) , is equal to \(\mu_0\) , on the basis of the difference ( \(M_t-M_c\) ) between the sample mean \(M_c\) of a random sample of size \(n_c\) with replacement from the first population and the sample mean \(M_t\) of an independent random sample of size \(n_t\) with replacement from the second population. Let \(s_c\) denote the sample standard deviation of the sample of size \(n_c\) from the first population and let \(s_t\) denote the sample standard deviation of the sample of size \(n_t\) from the second population. If the null hypothesis is true,
\[ E(M_t-M_c)=\mu_0, \]
and if \(n_c\) and \(n_t\) are both large,
\[ SE(M_t-M_c) \approx \sqrt{s_t^2/n_t + s_c^2/n_c} \]
\[ Z = \frac{M_t-M_c-\mu_0}{\sqrt{s_t^2/n_t + s_c^2/n_c}} \]
A list of numbers is nearly normally distributed if the fraction of numbers between any pair of values, \(a , is approximately equal to the area under the normal curve between \((a-\mu)/SD\) and \((b-\mu)/SD\) , where \(\mu\) is the mean of the list and SD is the standard deviation of the list.
Student's t curve with \(d\) degrees of freedom is symmetric about 0, has a single bump centered at 0, and is broader and flatter than the normal curve. The total area under Student's t curve is 1, no matter what \(d\) is; as \(d\) increases, Student's t curve gets narrower, its peak gets higher, and it becomes closer and closer to the normal curve.
Let \(M\) be the sample mean of a random sample of size \(n\) with replacement from a population with mean \(\mu\) and a nearly normal distribution, and let \(s\) be the sample standard deviation of the random sample. For moderate values of \(n\) ( \(n or so), Student's t curve approximates the probability histogram of \((M-\mu)/(s/n^{1/2})\) better than the normal curve does, which can lead to an approximate hypothesis test about \(\mu\) that is more accurate than the z test.
Consider testing the null hypothesis that the mean \(\mu\) of a population with a nearly normal distribution is equal to \(\mu_0\) from a random sample of size \(n\) with replacement. Let
\[ T=\frac{M-\mu_0}{s/n^{1/2}}, \]
where \(M\) is the sample mean and \(s\) is the sample standard deviation. The tests that reject the null hypothesis if \(T (left-tail t test), if \(T>t_{n-1,1-a}\) (right-tail t test), or if \(|T|>t_{n-1,1-a/2}\) (two-tail t test) all have approximate significance level \(a\) . How close the nominal significance level \(a\) is to the true significance level depends on the distribution of the numbers in the population, the sample size \(n\) , and \(a\) . The same rule of thumb for selecting whether to use a left, right, or two-tailed z test (or not to use a z test at all) works to select whether to use a left, right, or two-tailed t test: If, under the alternative hypothesis, \(E(T) , use a left-tail test. If, under the alternative hypothesis, \(E(T) > 0 \) , use a right-tail test. If, under the alternative hypothesis, \(E(T)\) could be less than zero or greater than zero, use a two-tail test. If, under the alternative hypothesis, \(E(T) = 0 \) , consult an expert. Because the t test differs from the z test only when the sample size is small, and from a small sample it is not possible to tell whether the population has a nearly normal distribution, the t test should be used with caution.
A \(1-a\) confidence set for a parameter \(\mu\) is like a \(1-a\) confidence interval for a parameter \(\mu\) : It is a random set of values that has probability \(1-a\) of containing the true value of \(\mu\) . The difference is that the set need not be an interval.
There is a deep duality between hypothesis tests about a parameter \(\mu\) and confidence sets for \(\mu\) . Given a procedure for constructing a \(1-a\) confidence set for \(\mu\) , the rule reject the null hypothesis that \(\mu=\mu_0\) if the confidence set does not contain \(\mu\) is a significance level \(a\) test of the null hypothesis that \(\mu=\mu_0\) . Conversely, given a family of significance level \(a\) hypothesis tests that allow one to test the hypothesis that \(\mu=\mu_0\) for any value of \(\mu_0\) , the set of all values \(\mu_0\) for which the test does not reject the null hypothesis that \(\mu=\mu_0\) is a \(1-a\) confidence set for \(\mu\) .
IMAGES
VIDEO
COMMENTS
Learn how to use Z tests to compare group means and test hypotheses about population parameters. Find out when to use Z tests vs T tests, the assumptions, hypotheses, and formulas for one- and two-sample analyses.
Z-test is a statistical test to compare the mean of a sample with a known population mean when the sample size is large and the population standard deviation is known. Learn how to perform Z-test, its formula, types, and examples with hypothesis testing steps.
Learn how to perform a z test on one sample, two samples, or proportions using the z test formula and examples. Find out the z test statistic, the z critical value, and the decision criteria for different types of z tests.
Perform a one-sample Z-test on the population's mean using this online tool. Learn about Z-test formula, p-value, critical values, and examples.
Learn how to perform a z test to compare two proportions and test if they are the same. See the formula, steps, example, and video tutorial.
A Z-test is a statistical test for which the test statistic follows a normal distribution under the null hypothesis. Learn how to use the Z-test formula, when to apply it and how it differs from the t-test, with examples and conditions.
Learn how to conduct a Z-test to determine if the difference between the mean of a sample and a population is statistically significant. Follow the four steps of stating the null and alternate hypotheses, choosing the critical value, and calculating the Z-test statistic.
Learn how to perform a one sample z-test to test whether the mean of a population is equal to some hypothesized value. This test assumes that the population standard deviation is known and uses the z test statistic to calculate the p-value.
Equation 1. Processing alpha for a two-tailed test. Since we have calculated the alpha value for a two-tailed test, then we can determine the critical values, that is, those values that determine the rejection zone in the standard normal distribution.. To find the critical values, we look at z-table the value of z that approximates an area under the curve similar to 0.0250.
Intro :D. Hey, I'm Kay! This guide provides an introduction to the fundamental concepts of and relationships between hypothesis testing, effect size, and power analysis, using the one-sample z-test as a prime example. While the primary goal is to elucidate the idea behind hypothesis testing, this guide does try to carefully derive the math ...
Learn how to conduct a z-test to compare a sample mean with a population mean when you know the population standard deviation. See how to use the z-table, p-value, and R to make decisions and report results for one-tailed and two-tailed tests.
Learn how to conduct a Z-test, a statistical hypothesis test for the mean of a normal distribution. Find out the formula, the significance level, the critical value, and the p-value for a Z-test.
A z-score is a statistical measure that describes the position of a raw score in terms of its distance from the mean, measured in standard deviation units. Learn how to calculate, interpret, and use z-scores for probability estimation, hypothesis testing, comparing datasets, and identifying outliers.
Learn how to conduct hypothesis tests using a five step procedure: check assumptions, write hypotheses, calculate test statistic, determine p-value, and make a decision. See examples of testing mean, correlation, and proportion with StatKey software.
Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist, you must get a good understanding of the z ...
Z-test is a statistical tool for hypothesis testing when the standard deviation is known and the sample size is large. Learn how to use z-test formula, calculate z-score, and interpret z-value with examples and FAQs.
6 Hypothesis Testing: the z-test | Introduction to Statistics and ...
Learn the basic logic and process of hypothesis testing, using z tests and examples from psychology. Understand the difference between null and alternative hypotheses, probability values, significance levels, and types of errors.
Learn how to perform a z-test for a single mean using a null hypothesis of 7.47 pounds for birthweight. Find the critical value, test statistic, and p-value for a one-tailed test with α = 0.05.
Learn how to use z tests to compare one score or sample with a population mean or another sample mean. See examples of z tests for height, SAT, GRE, and more.
Z-Test is a statistical test that compares the mean of two large samples from a normal population with known variance. Learn the conditions, steps, and types of Z-Test, and how to use the Z table and critical value to calculate the P-value and reject or accept the null hypothesis.
Learn how to use the z test and the t test to test hypotheses about population means based on normal or Student's t approximations. Find out the conditions, formulas, and examples of these tests and how to compute P values.
Learn how to perform a two sample z-test to compare two population means with known standard deviations. See the formula, assumptions, and an example with p-value calculation and conclusion.