• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

How Hypothesis Tests Work: Significance Levels (Alpha) and P values

By Jim Frost 45 Comments

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population . In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics?

In this post, I answer all of these questions. I use graphs and concepts to explain how hypothesis tests function in order to provide a more intuitive explanation. This helps you move on to understanding your statistical results.

Hypothesis Test Example Scenario

To start, I’ll demonstrate why we need to use hypothesis tests using an example.

A researcher is studying fuel expenditures for families and wants to determine if the monthly cost has changed since last year when the average was $260 per month. The researcher draws a random sample of 25 families and enters their monthly costs for this year into statistical software. You can download the CSV data file: FuelsCosts . Below are the descriptive statistics for this year.

Table of descriptive statistics for our fuel cost example.

We’ll build on this example to answer the research question and show how hypothesis tests work.

Descriptive Statistics Alone Won’t Answer the Question

The researcher collected a random sample and found that this year’s sample mean (330.6) is greater than last year’s mean (260). Why perform a hypothesis test at all? We can see that this year’s mean is higher by $70! Isn’t that different?

Regrettably, the situation isn’t as clear as you might think because we’re analyzing a sample instead of the full population. There are huge benefits when working with samples because it is usually impossible to collect data from an entire population. However, the tradeoff for working with a manageable sample is that we need to account for sample error.

The sampling error is the gap between the sample statistic and the population parameter. For our example, the sample statistic is the sample mean, which is 330.6. The population parameter is μ, or mu, which is the average of the entire population. Unfortunately, the value of the population parameter is not only unknown but usually unknowable. Learn more about Sampling Error .

We obtained a sample mean of 330.6. However, it’s conceivable that, due to sampling error, the mean of the population might be only 260. If the researcher drew another random sample, the next sample mean might be closer to 260. It’s impossible to assess this possibility by looking at only the sample mean. Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. We need to use a hypothesis test to determine the likelihood of obtaining our sample mean if the population mean is 260.

Background information : The Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

A Sampling Distribution Determines Whether Our Sample Mean is Unlikely

It is very unlikely for any sample mean to equal the population mean because of sample error. In our case, the sample mean of 330.6 is almost definitely not equal to the population mean for fuel expenditures.

If we could obtain a substantial number of random samples and calculate the sample mean for each sample, we’d observe a broad spectrum of sample means. We’d even be able to graph the distribution of sample means from this process.

This type of distribution is called a sampling distribution. You obtain a sampling distribution by drawing many random samples of the same size from the same population. Why the heck would we do this?

Because sampling distributions allow you to determine the likelihood of obtaining your sample statistic and they’re crucial for performing hypothesis tests.

Luckily, we don’t need to go to the trouble of collecting numerous random samples! We can estimate the sampling distribution using the t-distribution, our sample size, and the variability in our sample.

We want to find out if the average fuel expenditure this year (330.6) is different from last year (260). To answer this question, we’ll graph the sampling distribution based on the assumption that the mean fuel cost for the entire population has not changed and is still 260. In statistics, we call this lack of effect, or no change, the null hypothesis . We use the null hypothesis value as the basis of comparison for our observed sample value.

Sampling distributions and t-distributions are types of probability distributions.

Related posts : Sampling Distributions and Understanding Probability Distributions

Graphing our Sample Mean in the Context of the Sampling Distribution

The graph below shows which sample means are more likely and less likely if the population mean is 260. We can place our sample mean in this distribution. This larger context helps us see how unlikely our sample mean is if the null hypothesis is true (μ = 260).

Sampling distribution of means for our fuel cost data.

The graph displays the estimated distribution of sample means. The most likely values are near 260 because the plot assumes that this is the true population mean. However, given random sampling error, it would not be surprising to observe sample means ranging from 167 to 352. If the population mean is still 260, our observed sample mean (330.6) isn’t the most likely value, but it’s not completely implausible either.

The Role of Hypothesis Tests

The sampling distribution shows us that we are relatively unlikely to obtain a sample of 330.6 if the population mean is 260. Is our sample mean so unlikely that we can reject the notion that the population mean is 260?

In statistics, we call this rejecting the null hypothesis. If we reject the null for our example, the difference between the sample mean (330.6) and 260 is statistically significant. In other words, the sample data favor the hypothesis that the population average does not equal 260.

However, look at the sampling distribution chart again. Notice that there is no special location on the curve where you can definitively draw this conclusion. There is only a consistent decrease in the likelihood of observing sample means that are farther from the null hypothesis value. Where do we decide a sample mean is far away enough?

To answer this question, we’ll need more tools—hypothesis tests! The hypothesis testing procedure quantifies the unusualness of our sample with a probability and then compares it to an evidentiary standard. This process allows you to make an objective decision about the strength of the evidence.

We’re going to add the tools we need to make this decision to the graph—significance levels and p-values!

These tools allow us to test these two hypotheses:

  • Null hypothesis: The population mean equals the null hypothesis mean (260).
  • Alternative hypothesis: The population mean does not equal the null hypothesis mean (260).

Related post : Hypothesis Testing Overview

What are Significance Levels (Alpha)?

A significance level, also known as alpha or α, is an evidentiary standard that a researcher sets before the study. It defines how strongly the sample evidence must contradict the null hypothesis before you can reject the null hypothesis for the entire population. The strength of the evidence is defined by the probability of rejecting a null hypothesis that is true. In other words, it is the probability that you say there is an effect when there is no effect.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Lower significance levels require stronger sample evidence to be able to reject the null hypothesis. For example, to be statistically significant at the 0.01 significance level requires more substantial evidence than the 0.05 significance level. However, there is a tradeoff in hypothesis tests. Lower significance levels also reduce the power of a hypothesis test to detect a difference that does exist.

The technical nature of these types of questions can make your head spin. A picture can bring these ideas to life!

To learn a more conceptual approach to significance levels, see my post about Understanding Significance Levels .

Graphing Significance Levels as Critical Regions

On the probability distribution plot, the significance level defines how far the sample value must be from the null value before we can reject the null. The percentage of the area under the curve that is shaded equals the probability that the sample value will fall in those regions if the null hypothesis is correct.

To represent a significance level of 0.05, I’ll shade 5% of the distribution furthest from the null value.

Graph that displays a two-tailed critical region for a significance level of 0.05.

The two shaded regions in the graph are equidistant from the central value of the null hypothesis. Each region has a probability of 0.025, which sums to our desired total of 0.05. These shaded areas are called the critical region for a two-tailed hypothesis test.

The critical region defines sample values that are improbable enough to warrant rejecting the null hypothesis. If the null hypothesis is correct and the population mean is 260, random samples (n=25) from this population have means that fall in the critical region 5% of the time.

Our sample mean is statistically significant at the 0.05 level because it falls in the critical region.

Related posts : One-Tailed and Two-Tailed Tests Explained , What Are Critical Values? , and T-distribution Table of Critical Values

Comparing Significance Levels

Let’s redo this hypothesis test using the other common significance level of 0.01 to see how it compares.

Chart that shows a two-tailed critical region for a significance level of 0.01.

This time the sum of the two shaded regions equals our new significance level of 0.01. The mean of our sample does not fall within with the critical region. Consequently, we fail to reject the null hypothesis. We have the same exact sample data, the same difference between the sample mean and the null hypothesis value, but a different test result.

What happened? By specifying a lower significance level, we set a higher bar for the sample evidence. As the graph shows, lower significance levels move the critical regions further away from the null value. Consequently, lower significance levels require more extreme sample means to be statistically significant.

You must set the significance level before conducting a study. You don’t want the temptation of choosing a level after the study that yields significant results. The only reason I compared the two significance levels was to illustrate the effects and explain the differing results.

The graphical version of the 1-sample t-test we created allows us to determine statistical significance without assessing the P value. Typically, you need to compare the P value to the significance level to make this determination.

Related post : Step-by-Step Instructions for How to Do t-Tests in Excel

What Are P values?

P values are the probability that a sample will have an effect at least as extreme as the effect observed in your sample if the null hypothesis is correct.

This tortuous, technical definition for P values can make your head spin. Let’s graph it!

First, we need to calculate the effect that is present in our sample. The effect is the distance between the sample value and null value: 330.6 – 260 = 70.6. Next, I’ll shade the regions on both sides of the distribution that are at least as far away as 70.6 from the null (260 +/- 70.6). This process graphs the probability of observing a sample mean at least as extreme as our sample mean.

Probability distribution plot shows how our sample mean has a p-value of 0.031.

The total probability of the two shaded regions is 0.03112. If the null hypothesis value (260) is true and you drew many random samples, you’d expect sample means to fall in the shaded regions about 3.1% of the time. In other words, you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true. That’s the P value!

Learn more about How to Find the P Value .

Using P values and Significance Levels Together

If your P value is less than or equal to your alpha level, reject the null hypothesis.

The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01. Again, in practice, you pick one significance level before the experiment and stick with it!

Using the significance level of 0.05, the sample effect is statistically significant. Our data support the alternative hypothesis, which states that the population mean doesn’t equal 260. We can conclude that mean fuel expenditures have increased since last year.

P values are very frequently misinterpreted as the probability of rejecting a null hypothesis that is actually true. This interpretation is wrong! To understand why, please read my post: How to Interpret P-values Correctly .

Discussion about Statistically Significant Results

Hypothesis tests determine whether your sample data provide sufficient evidence to reject the null hypothesis for the entire population. To perform this test, the procedure compares your sample statistic to the null value and determines whether it is sufficiently rare. “Sufficiently rare” is defined in a hypothesis test by:

  • Assuming that the null hypothesis is true—the graphs center on the null value.
  • The significance (alpha) level—how far out from the null value is the critical region?
  • The sample statistic—is it within the critical region?

There is no special significance level that correctly determines which studies have real population effects 100% of the time. The traditional significance levels of 0.05 and 0.01 are attempts to manage the tradeoff between having a low probability of rejecting a true null hypothesis and having adequate power to detect an effect if one actually exists.

The significance level is the rate at which you incorrectly reject null hypotheses that are actually true ( type I error ). For example, for all studies that use a significance level of 0.05 and the null hypothesis is correct, you can expect 5% of them to have sample statistics that fall in the critical region. When this error occurs, you aren’t aware that the null hypothesis is correct, but you’ll reject it because the p-value is less than 0.05.

This error does not indicate that the researcher made a mistake. As the graphs show, you can observe extreme sample statistics due to sample error alone. It’s the luck of the draw!

Related posts : Statistical Significance: Definition & Meaning and Types of Errors in Hypothesis Testing

Hypothesis tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and P values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

Keep in mind that statistical significance doesn’t necessarily mean that the effect is important in a practical, real-world sense. For more information, read my post about Practical vs. Statistical Significance .

If you like this post, read the companion post: How Hypothesis Tests Work: Confidence Intervals and Confidence Levels .

You can also read my other posts that describe how other tests work:

  • How t-Tests Work
  • How the F-test works in ANOVA
  • How Chi-Squared Tests of Independence Work

To see an alternative approach to traditional hypothesis testing that does not use probability distributions and test statistics, learn about bootstrapping in statistics !

Share this:

hypothesis testing normal distribution p value

Reader Interactions

' src=

December 11, 2022 at 10:56 am

For very easy concept about level of significance & p-value 1.Teacher has given a one assignment to student & asked how many error you have doing this assignment? Student reply, he can has error ≤ 5% (it is level of significance). After completion of assignment, teacher checked his error which is ≤ 5% (may be 4% or 3% or 2% even less, it is p-value) it means his results are significant. Otherwise he has error > 5% (may be 6% or 7% or 8% even more, it is p-value) it means his results are non-significant. 2. Teacher has given a one assignment to student & asked how many error you have doing this assignment? Student reply, he can has error ≤ 1% (it is level of significance). After completion of assignment, teacher checked his error which is ≤ 1% (may be 0.9% or 0.8% or 0.7% even less, it is p-value) it means his results are significant. Otherwise he has error > 1% (may be 1.1% or 1.5% or 2% even more, it is p-value) it means his results are non-significant. p-value is significant or not mainly dependent upon the level of significance.

' src=

December 11, 2022 at 7:50 pm

I think that approach helps explain how to determine statistical significance–is the p-value less than or equal to the significance level. However, it doesn’t really explain what statistical significance means. I find that comparing the p-value to the significance level is the easy part. Knowing what it means and how to choose your significance level is the harder part!

' src=

December 3, 2022 at 5:54 pm

What would you say to someone who believes that a p-value higher than the level of significance (alpha) means the null hypothesis has been proven? Should you support that statement or deny it?

December 3, 2022 at 10:18 pm

Hi Emmanuel,

When the p-value is greater than the significance level, you fail to reject the null hypothesis . That is different than proving it. To learn why and what it means, click the link to read a post that I’ve written that will answer your question!

' src=

April 19, 2021 at 12:27 am

Thank you so much Sir

April 18, 2021 at 2:37 pm

Hi sir, your blogs are much more helpful for clearing the concepts of statistics, as a researcher I find them much more useful. I have some quarries:

1. In many research papers I have seen authors using the statement ” means or values are statically at par at p = 0.05″ when they do some pair wise comparison between the treatments (a kind of post hoc) using some value of CD (critical difference) or we can say LSD which is calculated using alpha not using p. So with this article I think this should be alpha =0.05 or 5%, not p = 0.05 earlier I thought p and alpha are same. p it self is compared with alpha 0.05. Correct me if I am wrong.

2. When we can draw a conclusion using critical value based on critical values (CV) which is based on alpha values in different tests (e.g. in F test CV is at F (0.05, t-1, error df) when alpha is 0.05 which is table value of F and is compared with F calculated for drawing the conclusion); then why we go for p values, and draw a conclusion based on p values, even many online software do not give p value, they just mention CD (LSD)

3. can you please help me in interpreting interaction in two factor analysis (Factor A X Factor b) in Anova.

Thank You so much!

(Commenting again as I have not seen my comment in comment list; don’t know why)

April 18, 2021 at 10:57 pm

Hi Himanshu,

I manually approve comments so there will be some time lag involved before they show up.

Regarding your first question, yes, you’re correct. Test results are significant at particular significance levels or alpha. They should not use p to define the significance level. You’re also correct in that you compare p to alpha.

Critical values are a different (but related) approach for determining significance. It was more common before computer analysis took off because it reduced the calculations. Using this approach in its simplest form, you only know whether a result is significant or not at the given alpha. You just determine whether the test statistic falls within a critical region to determine statistical significance or not significant. However, it is ok to supplement this type of result with the actual p-value. Knowing the precise p-value provides additional information that significant/not significant does not provide. The critical value and p-value approaches will always agree too. For more information about why the exact p-value is useful, read my post about Five Tips for Interpreting P-values .

Finally, I’ve written about two-way ANOVA in my post, How to do Two-Way ANOVA in Excel . Additionally, I write about it in my Hypothesis Testing ebook .

' src=

January 28, 2021 at 3:12 pm

Thank you for your answer, Jim, I really appreciate it. I’m taking a Coursera stats course and online learning without being able to ask questions of a real teacher is not my forte!

You’re right, I don’t think I’m ready for that calculation! However, I think I’m struggling with something far more basic, perhaps even the interpretation of the t-table? I’m just not sure how you came up with the p-value as .03112, with the 24 degrees of freedom. When I pull up a t-table and look at the 24-degrees of freedom row, I’m not sure how any of those numbers correspond with your answer? Either the single tail of 0.01556 or the combined of 0.03112. What am I not getting? (which, frankly, could be a lot!!) Again, thank you SO much for your time.

January 28, 2021 at 11:19 pm

Ah ok, I see! First, let me point you to several posts I’ve written about t-values and the t-distribution. I don’t cover those in this post because I wanted to present a simplified version that just uses the data in its regular units. The basic idea is that the hypothesis tests actually convert all your raw data down into one value for a test statistic, such as the t-value. And then it uses that test statistic to determine whether your results are statistically significant. To be significant, the t-value must exceed a critical value, which is what you lookup in the table. Although, nowadays you’d typically let your software just tell you.

So, read the following two posts, which covers several aspects of t-values and distributions. And then if you have more questions after that, you can post them. But, you’ll have a lot more information about them and probably some of your questions will be answered! T-values T-distributions

January 27, 2021 at 3:10 pm

Jim, just found your website and really appreciate your thoughtful, thorough way of explaining things. I feel very dumb, but I’m struggling with p-values and was hoping you could help me.

Here’s the section that’s getting me confused:

“First, we need to calculate the effect that is present in our sample. The effect is the distance between the sample value and null value: 330.6 – 260 = 70.6. Next, I’ll shade the regions on both sides of the distribution that are at least as far away as 70.6 from the null (260 +/- 70.6). This process graphs the probability of observing a sample mean at least as extreme as our sample mean.

** I’m good up to this point. Draw the picture, do the subtraction, shade the regions. BUT, I’m not sure how to figure out the area of the shaded region — even with a T-table. When I look at the T-table on 24 df, I’m not sure what to do with those numbers, as none of them seem to correspond in any way to what I’m looking at in the problem. In the end, I have no idea how you calculated each shaded area being 0.01556.

I feel like there’s a (very simple) step that everyone else knows how to do, but for some reason I’m missing it.

Again, dumb question, but I’d love your help clarifying that.

thank you, Sara

January 27, 2021 at 9:51 pm

That’s not a dumb question at all. I actually don’t show or explain the calculations for figuring out the area. The reason for that is the same reason why students never calculate the critical t-values for their test, instead you look them up in tables or use statistical software. The common reason for all that is because calculating these values is extremely complicated! It’s best to let software do that for you or, when looking critical values, use the tables!

The principal though is that percentage of the area under the curve equals the probability that values will fall within that range.

Equation for t-distribution

And then, for this example, you’d need to figure out the area under the curve for particular ranges!

' src=

January 15, 2021 at 10:57 am

HI Jim, I have a question related to Hypothesis test.. in Medical imaging, there are different way to measure signal intensity (from a tumor lesion for example). I tested for the same 100 patients 4 different ways to measure tumor captation to a injected dose. So for the 100 patients, i got 4 linear regression (relation between injected dose and measured quantity at tumor sites) = so an output of 4 equations Condition A output = -0,034308 + 0,0006602*input Condition B output = 0,0117631 + 0,0005425*input Condition C output = 0,0087871 + 0,0005563*input Condition D output = 0,001911 + 0,0006255*input

My question : i want to compare the 4 methods to find the best one (compared to others) : do Hypothesis test good to me… and if Yes, i do not find test to perform it. Can you suggest me a software. I uselly used JMP for my stats… but open to other softwares…

THank for your time G

' src=

November 16, 2020 at 5:42 am

Thank you very much for writing about this topic!

Your explanation made more sense to me about: Why we reject Null Hypothesis when p value < significance level

Kind greetings, Jalal

' src=

September 25, 2020 at 1:04 pm

Hi Jim, Your explanations are so helpful! Thank you. I wondered about your first graph. I see that the mean of the graph is 260 from the null hypothesis, and it looks like the standard deviation of the graph is about 31. Where did you get 31 from? Thank you

September 25, 2020 at 4:08 pm

Hi Michelle,

That is a great question. Very observant. And it gets to how these tests work. The hypothesis test that I’m illustrating here is the one-sample t-test. And this graph illustrates the sampling distribution for the t-test. T-tests use the t-distribution to determine the sampling distribution. For the t-distribution, you need to specify the degrees of freedom, which entirely defines the distribution (i.e., it’s the only parameter). For 1-sample t-tests, the degrees of freedom equal the number of observations minus 1. This dataset has 25 observations. Hence, the 24 DF you see in the graph.

Unlike the normal distribution, there is no standard deviation parameter. Instead, the degrees of freedom determines the spread of the curve. Typically, with t-tests, you’ll see results discussed in terms of t-values, both for your sample and for defining the critical regions. However, for this introductory example, I’ve converted the t-values into the raw data units (t-value * SE mean).

So, the standard deviation you’re seeing in the graph is a result of the spread of the underlying t-distribution that has 24 degrees of freedom and then applying the conversion from t-values to raw values.

' src=

September 10, 2020 at 8:19 am

Your blog is incredible.

I am having difficulty understanding why the phrase ‘as extreme as’ is required in the definition of p-value (“P values are the probability that a sample will have an effect at least as extreme as the effect observed in your sample if the null hypothesis is correct.”)

Why can’t P-Values simply be defined as “The probability of sample observation if the null hypothesis is correct?”

In your other blog titled ‘Interpreting P values’ you have explained p-values as “P-values indicate the believability of the devil’s advocate case that the null hypothesis is correct given the sample data”. I understand (or accept) this explanation. How does one move from this definition to one that contains the phrase ‘as extreme as’?

September 11, 2020 at 5:05 pm

Thanks so much for your kind words! I’m glad that my website has been helpful!

The key to understanding the “at least as extreme” wording lies in the probability plots for p-values. Using probability plots for continuous data, you can calculate probabilities, but only for ranges of values. I discuss this in my post about understanding probability distributions . In a nutshell, we need a range of values for these probabilities because the probabilities are derived from the area under a distribution curve. A single value just produces a line on these graphs rather than an area. Those ranges are the shaded regions in the probability plots. For p-values, the range corresponds to the “at least as extreme” wording. That’s where it comes from. We need a range to calculate a probability. We can’t use the single value of the observed effect because it doesn’t produce an area under the curve.

I hope that helps! I think this is a particularly confusing part of understanding p-values that most people don’t understand.

' src=

August 7, 2020 at 5:45 pm

Hi Jim, thanks for the post.

Could you please clarify the following excerpt from ‘Graphing Significance Levels as Critical Regions’:

“The percentage of the area under the curve that is shaded equals the probability that the sample value will fall in those regions if the null hypothesis is correct.”

I’m not sure if I understood this correctly. If the sample value fall in one of the shaded regions, doesn’t mean that the null hypothesis can be rejected, hence that is not correct?

August 7, 2020 at 10:23 pm

Think of it this way. There are two basic reasons for why a sample value could fall in a critical region:

  • The null hypothesis is correct and random chance caused the sample value to be unusual.
  • The null hypothesis is not correct.

You don’t know which one is true. Remember, just because you reject the null hypothesis it doesn’t mean the null is false. However, by using hypothesis tests to determine statistical significance, you control the chances of #1 occurring. The rate at which #1 occurs equals your significance level. On the hand, you don’t know the probability of the sample value falling in a critical region if the alternative hypothesis is correct (#2). It depends on the precise distribution for the alternative hypothesis and you usually don’t know that, which is why you’re testing the hypotheses in the first place!

I hope I answered the question you were asking. If not, feel free to ask follow up questions. Also, this ties into how to interpret p-values . It’s not exactly straightforward. Click the link to learn more.

' src=

June 4, 2020 at 6:17 am

Hi Jim, thank you very much for your answer. You helped me a lot!

June 3, 2020 at 5:23 pm

Hi, Thanks for this post. I’ve been learning a lot with you. My question is regarding to lack of fit. The p-value of my lack of fit is really low, making my lack of fit significant, meaning my model does not fit well. Is my case a “false negative”? given that my pure error is really low, making the computation of the lack of fit low. So it means my model is good. Below I show some information, that I hope helps to clarify my question.

SumSq DF MeanSq F pValue ________ __ ________ ______ __________

Total 1246.5 18 69.25 Model 1241.7 6 206.94 514.43 9.3841e-14 . Linear 1196.6 3 398.87 991.53 1.2318e-14 . Nonlinear 45.046 3 15.015 37.326 2.3092e-06 Residual 4.8274 12 0.40228 . Lack of fit 4.7388 7 0.67698 38.238 0.0004787 . Pure error 0.088521 5 0.017704

June 3, 2020 at 7:53 pm

As you say, a low p-value for a lack of fit test indicates that the model doesn’t fit your data adequately. This is a positive result for the test, which means it can’t be a “false negative.” At best, it could be a false positive, meaning that your data actually fit model well despite the low p-value.

I’d recommend graphing the residuals and looking for patterns . There is probably a relationship between variables that you’re not modeling correctly, such as curvature or interaction effects. There’s no way to diagnose the specific nature of the lack-of-fit problem by using the statistical output. You’ll need the graphs.

If there are no patterns in the residual plots, then your lack-of-fit results might be a false positive.

I hope this helps!

' src=

May 30, 2020 at 6:23 am

First of all, I have to say there are not many resources that explain a complicated topic in an easier manner.

My question is, how do we arrive at “if p value is less than alpha, we reject the null hypothesis.”

Is this covered in a separate article I could read?

Thanks Shekhar

' src=

May 25, 2020 at 12:21 pm

Hi Jim, terrific website, blog, and after this I’m ordering your book. One of my biggest challenges is nomenclature, definitions, context, and formulating the hypotheses. Here’s one I want to double-be-sure I understand: From above you write: ” These tools allow us to test these two hypotheses:

Null hypothesis: The population mean equals the null hypothesis mean (260). Alternative hypothesis: The population mean does not equal the null hypothesis mean (260). ” I keep thinking that 260 is the population mean mu, the underlying population (that we never really know exactly) and that the Null Hypothesis is comparing mu to x-bar (the sample mean of the 25 families randomly sampled w mean = sample mean = x-bar = 330.6).

So is the following incorrect, and if so, why? Null hypothesis: The population mean mu=260 equals the null hypothesis mean x-bar (330.6). Alternative hypothesis: The population mean mu=269 does not equal the null hypothesis mean x-bar (330.6).

And my thinking is that usually the formulation of null and alternative hypotheses is “test value” = “mu current of underlying population”, whereas I read the formulation on the webpage above to be the reverse.

Any comments appreciated. Many Thanks,

May 26, 2020 at 8:56 pm

The null hypothesis states that population value equals the null value. Now, I know that’s not particularly helpful! But, the null value varies based on test and context. So, in this example, we’re setting the null value aa $260, which was the mean from the previous year. So, our null hypothesis states:

Null: the population mean (mu) = 260. Alternative: the population mean ≠ 260.

These hypothesis statements are about the population parameter. For this type of one-sample analysis, the target or reference value you specify is the null hypothesis value. Additionally, you don’t include the sample estimate in these statements, which is the X-bar portion you tacked on at the end. It’s strictly about the value of the population parameter you’re testing. You don’t know the value of the underlying distribution. However, given the mutually exclusive nature of the null and alternative hypothesis, you know one or the other is correct. The null states that mu equals 260 while the alternative states that it doesn’t equal 260. The data help you decide, which brings us to . . .

However, the procedure does compare our sample data to the null hypothesis value, which is how it determines how strong our evidence is against the null hypothesis.

I hope I answered your question. If not, please let me know!

' src=

May 8, 2020 at 6:00 pm

Really using the interpretation “In other words, you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true”, our head seems to tie a knot. However, doing the reverse interpretation, it is much more intuitive and easier. That is, we will observe the sample effect of at least 70.6 in about 96.9% of the time, if the null is false (that is, our hypothesis is true).

May 8, 2020 at 7:25 pm

Your phrasing really isn’t any simpler. And it has the additional misfortune of being incorrect.

What you’re essentially doing is creating a one-sided confidence interval by using the p-value from a two-sided test. That’s incorrect in two ways.

  • Don’t mix and match one-sided and two-sided test results.
  • Confidence levels are determine by the significance level, not p-values.

So, what you need is a two-sided 95% CI (1-alpha). You could then state the results are statistically significant and you have 95% confidence that the population effect is between X and Y. If you want a lower bound as you propose, then you’ll need to use a one-sided hypothesis test with a 95% Lower Bound. That’ll give you a different value for the lower bound than the one you use.

I like confidence intervals. As I write elsewhere, I think they’re easier to understand and provide more information than a binary test result. But, you need to use them correctly!

One other point. When you are talking about p-values, it’s always under the assumption that the null hypothesis is correct. You *never* state anything about the p-value in relation to the null being false (i.e. alternative is true). But, if you want to use the type of phrasing you suggest, use it in the context of CIs and incorporate the points I cover above.

' src=

February 10, 2020 at 11:13 am

Muchas gracias profesor por compartir sus conocimientos. Un saliud especial desde Colombia.

' src=

August 6, 2019 at 11:46 pm

i found this really helpful . also can you help me out ?

I’m a little confused Can you tell me if level of significance and pvalue are comparable or not and if they are what does it mean if pvalue < LS . Do we reject the null hypothesis or do we accept the null hypothesis ?

August 7, 2019 at 12:49 am

Hi Divyanshu,

Yes, you compare the p-value to the significance level. When the p-value is less than the significance level (alpha), your results are statistically significant and you reject the null hypothesis.

I’d suggest re-reading the “Using P values and Significance Levels Together” section near the end of this post more closely. That describes the process. The next section describes what it all means.

' src=

July 1, 2019 at 4:19 am

sure.. I will use only in my class rooms that too offline with due credits to your orginal page. I will encourage my students to visit your blog . I have purchased your eBook on Regressions….immensely useful.

July 1, 2019 at 9:52 am

Hi Narasimha, that sounds perfect. Thanks for buying my ebook as well. I’m thrilled to hear that you’ve found it to be helpful!

June 28, 2019 at 6:22 am

I have benefited a lot by your writings….Can I share the same with my students in the classroom?

June 30, 2019 at 8:44 pm

Hi Narasimha,

Yes, you can certainly share with your students. Please attribute my original page. And please don’t copy whole sections of my posts onto another webpage as that can be bad with Google! Thanks!

' src=

February 11, 2019 at 7:46 pm

Hello, great site and my apologies if the answer to the following question exists already.

I’ve always wondered why we put the sampling distribution about the null hypothesis rather than simply leave it about the observed mean. I can see mathematically we are measuring the same distance from the null and basically can draw the same conclusions.

For example we take a sample (say 50 people) we gather an observation (mean wage) estimate the standard error in that observation and so can build a sampling distribution about the observed mean. That sampling distribution contains a confidence interval, where say, i am 95% confident the true mean lies (i.e. in repeated sampling the true mean would reside within this interval 95% of the time).

When i use this for a hyp-test, am i right in saying that we place the sampling dist over the reference level simply because it’s mathematically equivalent and it just seems easier to gauge how far the observation is from 0 via t-stats or its likelihood via p-values?

It seems more natural to me to look at it the other way around. leave the sampling distribution on the observed value, and then look where the null sits…if it’s too far left or right then it is unlikely the true population parameter is what we believed it to be, because if the null were true it would only occur ~ 5% of the time in repeated samples…so perhaps we need to change our opinion.

Can i interpret a hyp-test that way? Or do i have a misconception?

February 12, 2019 at 8:25 pm

The short answer is that, yes, you can draw the interval around the sample mean instead. And, that is, in fact, how you construct confidence intervals. The distance around the null hypothesis for hypothesis tests and the distance around the sample for confidence intervals are the same distance, which is why the results will always agree as long as you use corresponding alpha levels and confidence levels (e.g., alpha 0.05 with a 95% confidence level). I write about how this works in a post about confidence intervals .

I prefer confidence intervals for a number of reasons. They’ll indicate whether you have significant results if they exclude the null value and they indicate the precision of the effect size estimate. Corresponding with what you’re saying, it’s easier to gauge how far a confidence interval is from the null value (often zero) whereas a p-value doesn’t provide that information. See Practical versus Statistical Significance .

So, you don’t have any misconception at all! Just refer to it as a confidence interval rather than a hypothesis test, but, of course, they are very closely related.

' src=

January 9, 2019 at 10:37 pm

Hi Jim, Nice Article.. I have a question… I read the Central limit theorem article before this article…

Coming to this article, During almost every hypothesis test, we draw a normal distribution curve assuming there is a sampling distribution (and then we go for test statistic, p value etc…). Do we draw a normal distribution curve for hypo tests because of the central limit theorem…

Thanks in advance, Surya

January 10, 2019 at 1:57 am

These distributions are actually the t-distribution which are different from the normal distribution. T-distributions only have one parameter–the degrees of freedom. As the DF of increases, the t-distribution tightens up. Around 25 degrees of freedom, the t-distribution approximates the normal distribution. Depending on the type of t-test, this corresponds to a sample size of 26 or 27. Similarly, the sampling distribution of the means also approximate the normal distribution at around these sample sizes. With a large enough sample size, both the t-distribution and the sample distribution converge to a normal distribution regardless (largely) of the underlying population distribution. So, yes, the central limit theorem plays a strong role in this.

It’s more accurate to say that central limit theorem causes the sampling distribution of the means to converge on the same distribution that the t-test uses, which allows you to assume that the test produces valid results. But, technically, the t-test is based on the t-distribution.

Problems can occur if the underlying distribution is non-normal and you have a small sample size. In that case, the sampling distribution of the means won’t approximate the t-distribution that the t-test uses. However, the test results will assume that it does and produce results based on that–which is why it causes problems!

' src=

November 19, 2018 at 9:15 am

Dear Jim! Thank you very much for your explanation. I need your help to understand my data. I have two samples (about 300 observations) with biased distributions. I did the ttest and obtained the p-value, which is quite small. Can I draw the conclusion that the effect size is small even when the distribution of my data is not normal? Thank you

November 19, 2018 at 9:34 am

Hi Tetyana,

First, when you say that your p-value is small and that you want to “draw the conclusion that the effect size is small,” I assume that you mean statistically significant. When the p-value is low, the null hypothesis must go! In other words, you reject the null and conclude that there is a statistically significant effect–not a small effect.

Now, back to the question at hand! Yes, When you have a sufficiently large sample-size, t-tests are robust to departures from normality. For a 2-sample t-test, you should have at least 15 samples per group, which you exceed by quite a bit. So, yes, you can reliably conclude that your results are statistically significant!

You can thank the central limit theorem! 🙂

' src=

September 10, 2018 at 12:18 am

Hello Jim, I am very sorry; I have very elementary of knowledge of stats. So, would you please explain how you got a p- value of 0.03112 in the above calculation/t-test? By looking at a chart? Would you also explain how you got the information that “you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true”?

' src=

July 6, 2018 at 7:02 am

A quick question regarding your use of two-tailed critical regions in the article above: why? I mean, what is a real-world scenario that would warrant a two-tailed test of any kind (z, t, etc.)? And if there are none, why keep using the two-tailed scenario as an example, instead of the one-tailed which is both more intuitive and applicable to most if not all practical situations. Just curious, as one person attempting to educate people on stats to another (my take on the one vs. two-tailed tests can be seen here: http://blog.analytics-toolkit.com/2017/one-tailed-two-tailed-tests-significance-ab-testing/ )

Thanks, Georgi

July 6, 2018 at 12:05 pm

There’s the appropriate time and place for both one-tailed and two-tailed tests. I plan to write a post on this issue specifically, so I’ll keep my comments here brief.

So much of statistics is context sensitive. People often want concrete rules for how to do things in statistics but that’s often hard to provide because the answer depends on the context, goals, etc. The question of whether to use a one-tailed or two-tailed test falls firmly in this category of it depends.

I did read the article you wrote. I’ll say that I can see how in the context of A/B testing specifically there might be a propensity to use one-tailed tests. You only care about improvements. There’s probably not too much downside in only caring about one direction. In fact, in a post where I compare different tests and different options , I suggest using a one-tailed test for a similar type of casing involving defects. So, I’m onboard with the idea of using one-tailed tests when they’re appropriate. However, I do think that two-tailed tests should be considered the default choice and that you need good reasons to move to a one-tailed test. Again, your A/B testing area might supply those reasons on a regular basis, but I can’t make that a blanket statement for all research areas.

I think your article mischaracterizes some of the pros and cons of both types of tests. Just a couple of for instances. In a two-tailed test, you don’t have to take the same action regardless of which direction the results are significant (example below). And, yes, you can determine the direction of the effect in a two-tailed test. You simply look at the estimated effect. Is it positive or negative?

On the other hand, I do agree that one-tailed tests don’t increase the overall Type I error. However, there is a big caveat for that. In a two-tailed test, the Type I error rate is evenly split in both tails. For a one-tailed test, the overall Type I error rate does not change, but the Type I errors are redistributed so they all occur in the direction that you are interested in rather than being split between the positive and negative directions. In other words, you’ll have twice as many Type I errors in the specific direction that you’re interested in. That’s not good.

My big concerns with one-tailed tests are that it makes it easier to obtain the results that you want to obtain. And, all of the Type I errors (false positives) are in that direction too. It’s just not a good combination.

To answer your question about when you might want to use two-tailed tests, there are plenty of reasons. For one, you might want to avoid the situation I describe above. Additionally, in a lot of scientific research, the researchers truly are interested in detecting effects in either direction for the sake of science. Even in cases with a practical application, you might want to learn about effects in either direction.

For example, I was involved in a research study that looked at the effects of an exercise intervention on bone density. The idea was that it might be a good way to prevent osteoporosis. I used a two-tailed test. Obviously, we’re hoping that there was positive effect. However, we’d be very interested in knowing whether there was a negative effect too. And, this illustrates how you can have different actions based on both directions. If there was a positive effect, you can recommend that as a good approach and try to promote its use. If there’s a negative effect, you’d issue a warning to not do that intervention. You have the potential for learning both what is good and what is bad. The extra false-positives would’ve cause problems because we’d think that there’d be health benefits for participants when those benefits don’t actually exist. Also, if we had performed only a one-tailed test and didn’t obtain significant results, we’d learn that it wasn’t a positive effect, but we would not know whether it was actually detrimental or not.

Here’s when I’d say it’s OK to use a one-tailed test. Consider a one-tailed test when you’re in situation where you truly only need to know whether an effect exists in one direction, and the extra Type I errors in that direction are an acceptable risk (false positives don’t cause problems), and there’s no benefit in determining whether an effect exists in the other direction. Those conditions really restrict when one-tailed tests are the best choice. Again, those restrictions might not be relevant for your specific field, but as for the usage of statistics as a whole, they’re absolutely crucial to consider.

On the other hand, according to this article, two-tailed tests might be important in A/B testing !

' src=

March 30, 2018 at 5:29 am

Dear Sir, please confirm if there is an inadvertent mistake in interpretation as, “We can conclude that mean fuel expenditures have increased since last year.” Our null hypothesis is =260. If found significant, it implies two possibilities – both increase and decrease. Please let us know if we are mistaken here. Many Thanks!

March 30, 2018 at 9:59 am

Hi Khalid, the null hypothesis as it is defined for this test represents the mean monthly expenditure for the previous year (260). The mean expenditure for the current year is 330.6 whereas it was 260 for the previous year. Consequently, the mean has increased from 260 to 330.7 over the course of a year. The p-value indicates that this increase is statistically significant. This finding does not suggest both an increase and a decrease–just an increase. Keep in mind that a significant result prompts us to reject the null hypothesis. So, we reject the null that the mean equals 260.

Let’s explore the other possible findings to be sure that this makes sense. Suppose the sample mean had been closer to 260 and the p-value was greater than the significance level, those results would indicate that the results were not statistically significant. The conclusion that we’d draw is that we have insufficient evidence to conclude that mean fuel expenditures have changed since the previous year.

If the sample mean was less than the null hypothesis (260) and if the p-value is statistically significant, we’d concluded that mean fuel expenditures have decreased and that this decrease is statistically significant.

When you interpret the results, you have to be sure to understand what the null hypothesis represents. In this case, it represents the mean monthly expenditure for the previous year and we’re comparing this year’s mean to it–hence our sample suggests an increase.

Comments and Questions Cancel reply

P-Value in Statistical Hypothesis Tests: What is it?

P value definition.

A p value is used in hypothesis testing to help you support or reject the null hypothesis . The p value is the evidence against a null hypothesis . The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage . For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“ significant “) your results.

When you run a hypothesis test , you compare the p value from your test to the alpha level you selected when you ran the test. Alpha levels can also be written as percentages.

p value

P Value vs Alpha level

Alpha levels are controlled by the researcher and are related to confidence levels . You get an alpha level by subtracting your confidence level from 100%. For example, if you want to be 98 percent confident in your research, the alpha level would be 2% (100% – 98%). When you run the hypothesis test, the test will give you a value for p. Compare that value to your chosen alpha level. For example, let’s say you chose an alpha level of 5% (0.05). If the results from the test give you:

  • A small p (≤ 0.05), reject the null hypothesis . This is strong evidence that the null hypothesis is invalid.
  • A large p (> 0.05) means the alternate hypothesis is weak, so you do not reject the null.

P Values and Critical Values

p-value

What if I Don’t Have an Alpha Level?

In an ideal world, you’ll have an alpha level. But if you do not, you can still use the following rough guidelines in deciding whether to support or reject the null hypothesis:

  • If p > .10 → “not significant”
  • If p ≤ .10 → “marginally significant”
  • If p ≤ .05 → “significant”
  • If p ≤ .01 → “highly significant.”

How to Calculate a P Value on the TI 83

Example question: The average wait time to see an E.R. doctor is said to be 150 minutes. You think the wait time is actually less. You take a random sample of 30 people and find their average wait is 148 minutes with a standard deviation of 5 minutes. Assume the distribution is normal. Find the p value for this test.

  • Press STAT then arrow over to TESTS.
  • Press ENTER for Z-Test .
  • Arrow over to Stats. Press ENTER.
  • Arrow down to μ0 and type 150. This is our null hypothesis mean.
  • Arrow down to σ. Type in your std dev: 5.
  • Arrow down to xbar. Type in your sample mean : 148.
  • Arrow down to n. Type in your sample size : 30.
  • Arrow to <μ0 for a left tail test . Press ENTER.
  • Arrow down to Calculate. Press ENTER. P is given as .014, or about 1%.

The probability that you would get a sample mean of 148 minutes is tiny, so you should reject the null hypothesis.

Note : If you don’t want to run a test, you could also use the TI 83 NormCDF function to get the area (which is the same thing as the probability value).

Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Understanding P values | Definition and Examples

Understanding P-values | Definition and Examples

Published on July 16, 2020 by Rebecca Bevans . Revised on June 22, 2023.

The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

Table of contents

What is a null hypothesis, what exactly is a p value, how do you calculate the p value, p values and statistical significance, reporting p values, caution when using p values, other interesting articles, frequently asked questions about p-values.

All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.

For example, in a two-tailed t test , the null hypothesis is that the difference between two groups is zero.

  • Null hypothesis ( H 0 ): there is no difference in longevity between the two groups.
  • Alternative hypothesis ( H A or H 1 ): there is a difference in longevity between the two groups.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The p value , or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic , which is the number calculated by a statistical test using your data.

The p value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.

The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

P values are usually automatically calculated by your statistical program (R, SPSS, etc.).

You can also find tables for estimating the p value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.

The calculation of the p value depends on the statistical test you are using to test your hypothesis :

  • Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
  • The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p value.

No matter what test you use, the p value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.

P values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.

Statistical significance is another way of saying that the p value of a statistical test is small enough to reject the null hypothesis of the test.

How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.

The threshold value for determining statistical significance is also known as the alpha value.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing normal distribution p value

P values of statistical tests are usually reported in the results section of a research paper , along with the key information needed for readers to put the p values in context – for example, correlation coefficient in a linear regression , or the average difference between treatment groups in a t -test.

P values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.

In reality, the risk of rejecting the null hypothesis is often higher than the p value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.

P values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The  p value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Understanding P-values | Definition and Examples. Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/statistics/p-value/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, an easy introduction to statistical significance (with examples), test statistics | definition, interpretation, and examples, what is effect size and why does it matter (examples), what is your plagiarism score.

p-value Calculator

Table of contents

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

What is p-value?

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

How do I calculate p-value from test statistic?

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

How to interpret p-value

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

How to use the p-value calculator to find p-value from test statistic

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

By default, the calculator uses the significance level of 0.05.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis.

How do I find p-value from z-score?

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

How do I find p-value from t?

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

p-value from chi-square score (χ² score)

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

p-value from F-score

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

What do you want?

What do you know?

Your Z-score

Z-score : the test statistic follows the standard normal distribution N(0,1).

.css-m482sy.css-m482sy{color:#2B3148;background-color:transparent;font-family:var(--calculator-ui-font-family),Verdana,sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-m482sy.css-m482sy:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-m482sy .js-external-link-button.link-like,.css-m482sy .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-m482sy .js-external-link-button.link-like:hover,.css-m482sy .js-external-link-anchor:hover,.css-m482sy .js-external-link-button.link-like:active,.css-m482sy .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-m482sy .js-external-link-button.link-like:focus-visible,.css-m482sy .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-m482sy p,.css-m482sy div{margin:0;display:block;}.css-m482sy pre{margin:0;display:block;}.css-m482sy pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-m482sy pre:not(:first-child){padding-top:8px;}.css-m482sy ul,.css-m482sy ol{display:block margin:0;padding-left:20px;}.css-m482sy ul li,.css-m482sy ol li{padding-top:8px;}.css-m482sy ul ul,.css-m482sy ol ul,.css-m482sy ul ol,.css-m482sy ol ol{padding-top:0;}.css-m482sy ul:not(:first-child),.css-m482sy ol:not(:first-child){padding-top:4px;} .css-63uqft{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-63uqft code,.css-63uqft kbd,.css-63uqft pre,.css-63uqft samp{font-family:monospace;}.css-63uqft code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-63uqft figcaption,.css-63uqft caption{text-align:center;}.css-63uqft figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-63uqft h3{font-size:1.75rem;}.css-63uqft h4{font-size:1.5rem;}.css-63uqft .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-63uqft .mathBlock .katex{font-size:24px;text-align:left;}.css-63uqft .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-63uqft .videoBlock,.css-63uqft .imageBlock{margin-bottom:16px;}.css-63uqft .imageBlock__image-align--left,.css-63uqft .videoBlock__video-align--left{float:left;}.css-63uqft .imageBlock__image-align--right,.css-63uqft .videoBlock__video-align--right{float:right;}.css-63uqft .imageBlock__image-align--center,.css-63uqft .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-63uqft .imageBlock__image-align--none,.css-63uqft .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-63uqft .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-63uqft .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-63uqft .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-63uqft .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-63uqft .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-63uqft .katex .katex-version::after{content:'0.13.13';}.css-63uqft .katex .katex-mathml{position:absolute;clip:rect(1px,1px,1px,1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-63uqft .katex .katex-html>.newline{display:block;}.css-63uqft .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-63uqft .katex .strut{display:inline-block;}.css-63uqft .katex .textbf{font-weight:bold;}.css-63uqft .katex .textit{font-style:italic;}.css-63uqft .katex .textrm{font-family:KaTeX_Main;}.css-63uqft .katex .textsf{font-family:KaTeX_SansSerif;}.css-63uqft .katex .texttt{font-family:KaTeX_Typewriter;}.css-63uqft .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-63uqft .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-63uqft .katex .mathrm{font-style:normal;}.css-63uqft .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-63uqft .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-63uqft .katex .amsrm{font-family:KaTeX_AMS;}.css-63uqft .katex .mathbb,.css-63uqft .katex .textbb{font-family:KaTeX_AMS;}.css-63uqft .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-63uqft .katex .mathfrak,.css-63uqft .katex .textfrak{font-family:KaTeX_Fraktur;}.css-63uqft .katex .mathtt{font-family:KaTeX_Typewriter;}.css-63uqft .katex .mathscr,.css-63uqft .katex .textscr{font-family:KaTeX_Script;}.css-63uqft .katex .mathsf,.css-63uqft .katex .textsf{font-family:KaTeX_SansSerif;}.css-63uqft .katex .mathboldsf,.css-63uqft .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-63uqft .katex .mathitsf,.css-63uqft .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-63uqft .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-63uqft .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-63uqft .katex .vlist-r{display:table-row;}.css-63uqft .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-63uqft .katex .vlist>span{display:block;height:0;position:relative;}.css-63uqft .katex .vlist>span>span{display:inline-block;}.css-63uqft .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-63uqft .katex .vlist-t2{margin-right:-2px;}.css-63uqft .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-63uqft .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-63uqft .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-63uqft .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-63uqft .katex .msupsub{text-align:left;}.css-63uqft .katex .mfrac>span>span{text-align:center;}.css-63uqft .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-63uqft .katex .mfrac .frac-line,.css-63uqft .katex .overline .overline-line,.css-63uqft .katex .underline .underline-line,.css-63uqft .katex .hline,.css-63uqft .katex .hdashline,.css-63uqft .katex .rule{min-height:1px;}.css-63uqft .katex .mspace{display:inline-block;}.css-63uqft .katex .llap,.css-63uqft .katex .rlap,.css-63uqft .katex .clap{width:0;position:relative;}.css-63uqft .katex .llap>.inner,.css-63uqft .katex .rlap>.inner,.css-63uqft .katex .clap>.inner{position:absolute;}.css-63uqft .katex .llap>.fix,.css-63uqft .katex .rlap>.fix,.css-63uqft .katex .clap>.fix{display:inline-block;}.css-63uqft .katex .llap>.inner{right:0;}.css-63uqft .katex .rlap>.inner,.css-63uqft .katex .clap>.inner{left:0;}.css-63uqft .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-63uqft .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-63uqft .katex .overline .overline-line,.css-63uqft .katex .underline .underline-line,.css-63uqft .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-63uqft .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-63uqft .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-63uqft .katex .sizing.reset-size1.size1,.css-63uqft .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-63uqft .katex .sizing.reset-size1.size2,.css-63uqft .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size1.size3,.css-63uqft .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-63uqft .katex .sizing.reset-size1.size4,.css-63uqft .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-63uqft .katex .sizing.reset-size1.size5,.css-63uqft .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-63uqft .katex .sizing.reset-size1.size6,.css-63uqft .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-63uqft .katex .sizing.reset-size1.size7,.css-63uqft .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-63uqft .katex .sizing.reset-size1.size8,.css-63uqft .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-63uqft .katex .sizing.reset-size1.size9,.css-63uqft .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-63uqft .katex .sizing.reset-size1.size10,.css-63uqft .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-63uqft .katex .sizing.reset-size1.size11,.css-63uqft .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-63uqft .katex .sizing.reset-size2.size1,.css-63uqft .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size2.size2,.css-63uqft .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-63uqft .katex .sizing.reset-size2.size3,.css-63uqft .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-63uqft .katex .sizing.reset-size2.size4,.css-63uqft .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-63uqft .katex .sizing.reset-size2.size5,.css-63uqft .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-63uqft .katex .sizing.reset-size2.size6,.css-63uqft .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-63uqft .katex .sizing.reset-size2.size7,.css-63uqft .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-63uqft .katex .sizing.reset-size2.size8,.css-63uqft .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-63uqft .katex .sizing.reset-size2.size9,.css-63uqft .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-63uqft .katex .sizing.reset-size2.size10,.css-63uqft .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-63uqft .katex .sizing.reset-size2.size11,.css-63uqft .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-63uqft .katex .sizing.reset-size3.size1,.css-63uqft .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-63uqft .katex .sizing.reset-size3.size2,.css-63uqft .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-63uqft .katex .sizing.reset-size3.size3,.css-63uqft .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-63uqft .katex .sizing.reset-size3.size4,.css-63uqft .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-63uqft .katex .sizing.reset-size3.size5,.css-63uqft .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-63uqft .katex .sizing.reset-size3.size6,.css-63uqft .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-63uqft .katex .sizing.reset-size3.size7,.css-63uqft .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-63uqft .katex .sizing.reset-size3.size8,.css-63uqft .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-63uqft .katex .sizing.reset-size3.size9,.css-63uqft .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-63uqft .katex .sizing.reset-size3.size10,.css-63uqft .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-63uqft .katex .sizing.reset-size3.size11,.css-63uqft .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-63uqft .katex .sizing.reset-size4.size1,.css-63uqft .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-63uqft .katex .sizing.reset-size4.size2,.css-63uqft .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-63uqft .katex .sizing.reset-size4.size3,.css-63uqft .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-63uqft .katex .sizing.reset-size4.size4,.css-63uqft .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-63uqft .katex .sizing.reset-size4.size5,.css-63uqft .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-63uqft .katex .sizing.reset-size4.size6,.css-63uqft .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-63uqft .katex .sizing.reset-size4.size7,.css-63uqft .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-63uqft .katex .sizing.reset-size4.size8,.css-63uqft .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-63uqft .katex .sizing.reset-size4.size9,.css-63uqft .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-63uqft .katex .sizing.reset-size4.size10,.css-63uqft .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-63uqft .katex .sizing.reset-size4.size11,.css-63uqft .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-63uqft .katex .sizing.reset-size5.size1,.css-63uqft .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-63uqft .katex .sizing.reset-size5.size2,.css-63uqft .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-63uqft .katex .sizing.reset-size5.size3,.css-63uqft .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-63uqft .katex .sizing.reset-size5.size4,.css-63uqft .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-63uqft .katex .sizing.reset-size5.size5,.css-63uqft .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-63uqft .katex .sizing.reset-size5.size6,.css-63uqft .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-63uqft .katex .sizing.reset-size5.size7,.css-63uqft .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-63uqft .katex .sizing.reset-size5.size8,.css-63uqft .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-63uqft .katex .sizing.reset-size5.size9,.css-63uqft .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-63uqft .katex .sizing.reset-size5.size10,.css-63uqft .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-63uqft .katex .sizing.reset-size5.size11,.css-63uqft .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-63uqft .katex .sizing.reset-size6.size1,.css-63uqft .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-63uqft .katex .sizing.reset-size6.size2,.css-63uqft .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-63uqft .katex .sizing.reset-size6.size3,.css-63uqft .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-63uqft .katex .sizing.reset-size6.size4,.css-63uqft .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-63uqft .katex .sizing.reset-size6.size5,.css-63uqft .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-63uqft .katex .sizing.reset-size6.size6,.css-63uqft .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-63uqft .katex .sizing.reset-size6.size7,.css-63uqft .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size6.size8,.css-63uqft .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-63uqft .katex .sizing.reset-size6.size9,.css-63uqft .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-63uqft .katex .sizing.reset-size6.size10,.css-63uqft .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-63uqft .katex .sizing.reset-size6.size11,.css-63uqft .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-63uqft .katex .sizing.reset-size7.size1,.css-63uqft .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-63uqft .katex .sizing.reset-size7.size2,.css-63uqft .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-63uqft .katex .sizing.reset-size7.size3,.css-63uqft .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-63uqft .katex .sizing.reset-size7.size4,.css-63uqft .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-63uqft .katex .sizing.reset-size7.size5,.css-63uqft .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-63uqft .katex .sizing.reset-size7.size6,.css-63uqft .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size7.size7,.css-63uqft .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-63uqft .katex .sizing.reset-size7.size8,.css-63uqft .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size7.size9,.css-63uqft .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-63uqft .katex .sizing.reset-size7.size10,.css-63uqft .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-63uqft .katex .sizing.reset-size7.size11,.css-63uqft .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-63uqft .katex .sizing.reset-size8.size1,.css-63uqft .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-63uqft .katex .sizing.reset-size8.size2,.css-63uqft .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-63uqft .katex .sizing.reset-size8.size3,.css-63uqft .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-63uqft .katex .sizing.reset-size8.size4,.css-63uqft .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-63uqft .katex .sizing.reset-size8.size5,.css-63uqft .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-63uqft .katex .sizing.reset-size8.size6,.css-63uqft .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-63uqft .katex .sizing.reset-size8.size7,.css-63uqft .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size8.size8,.css-63uqft .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-63uqft .katex .sizing.reset-size8.size9,.css-63uqft .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size8.size10,.css-63uqft .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-63uqft .katex .sizing.reset-size8.size11,.css-63uqft .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-63uqft .katex .sizing.reset-size9.size1,.css-63uqft .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-63uqft .katex .sizing.reset-size9.size2,.css-63uqft .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-63uqft .katex .sizing.reset-size9.size3,.css-63uqft .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-63uqft .katex .sizing.reset-size9.size4,.css-63uqft .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-63uqft .katex .sizing.reset-size9.size5,.css-63uqft .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-63uqft .katex .sizing.reset-size9.size6,.css-63uqft .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-63uqft .katex .sizing.reset-size9.size7,.css-63uqft .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-63uqft .katex .sizing.reset-size9.size8,.css-63uqft .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size9.size9,.css-63uqft .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-63uqft .katex .sizing.reset-size9.size10,.css-63uqft .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-63uqft .katex .sizing.reset-size9.size11,.css-63uqft .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-63uqft .katex .sizing.reset-size10.size1,.css-63uqft .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-63uqft .katex .sizing.reset-size10.size2,.css-63uqft .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-63uqft .katex .sizing.reset-size10.size3,.css-63uqft .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-63uqft .katex .sizing.reset-size10.size4,.css-63uqft .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-63uqft .katex .sizing.reset-size10.size5,.css-63uqft .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-63uqft .katex .sizing.reset-size10.size6,.css-63uqft .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-63uqft .katex .sizing.reset-size10.size7,.css-63uqft .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-63uqft .katex .sizing.reset-size10.size8,.css-63uqft .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-63uqft .katex .sizing.reset-size10.size9,.css-63uqft .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-63uqft .katex .sizing.reset-size10.size10,.css-63uqft .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-63uqft .katex .sizing.reset-size10.size11,.css-63uqft .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-63uqft .katex .sizing.reset-size11.size1,.css-63uqft .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-63uqft .katex .sizing.reset-size11.size2,.css-63uqft .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-63uqft .katex .sizing.reset-size11.size3,.css-63uqft .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-63uqft .katex .sizing.reset-size11.size4,.css-63uqft .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-63uqft .katex .sizing.reset-size11.size5,.css-63uqft .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-63uqft .katex .sizing.reset-size11.size6,.css-63uqft .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-63uqft .katex .sizing.reset-size11.size7,.css-63uqft .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-63uqft .katex .sizing.reset-size11.size8,.css-63uqft .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-63uqft .katex .sizing.reset-size11.size9,.css-63uqft .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-63uqft .katex .sizing.reset-size11.size10,.css-63uqft .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-63uqft .katex .sizing.reset-size11.size11,.css-63uqft .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-63uqft .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-63uqft .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-63uqft .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-63uqft .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-63uqft .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-63uqft .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-63uqft .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-63uqft .katex .delimcenter{position:relative;}.css-63uqft .katex .op-symbol{position:relative;}.css-63uqft .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-63uqft .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-63uqft .katex .op-limits>.vlist-t{text-align:center;}.css-63uqft .katex .accent>.vlist-t{text-align:center;}.css-63uqft .katex .accent .accent-body{position:relative;}.css-63uqft .katex .accent .accent-body:not(.accent-full){width:0;}.css-63uqft .katex .overlay{display:block;}.css-63uqft .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-63uqft .katex .mtable .arraycolsep{display:inline-block;}.css-63uqft .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-63uqft .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-63uqft .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-63uqft .katex .svg-align{text-align:left;}.css-63uqft .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-63uqft .katex svg path{stroke:none;}.css-63uqft .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-63uqft .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-63uqft .katex .stretchy::before,.css-63uqft .katex .stretchy::after{content:'';}.css-63uqft .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-63uqft .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-63uqft .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-63uqft .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-63uqft .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-63uqft .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-63uqft .katex .x-arrow-pad{padding:0 0.5em;}.css-63uqft .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-63uqft .katex .x-arrow,.css-63uqft .katex .mover,.css-63uqft .katex .munder{text-align:center;}.css-63uqft .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-63uqft .katex .fbox,.css-63uqft .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-63uqft .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-63uqft .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-63uqft .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-63uqft .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-63uqft .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-63uqft .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-63uqft .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-63uqft .katex .mtr-glue{width:50%;}.css-63uqft .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-63uqft .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-63uqft .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-63uqft .katex-display{display:block;margin:1em 0;text-align:center;}.css-63uqft .katex-display>.katex{display:block;white-space:nowrap;}.css-63uqft .katex-display>.katex>.katex-html{display:block;position:relative;}.css-63uqft .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-63uqft .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-63uqft .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-63uqft body{counter-reset:katexEqnNo mmlEqnNo;}.css-63uqft table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-63uqft .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-63uqft .tableBlock thead,.css-63uqft .tableBlock thead th{border-bottom:1px solid #333!important;}.css-63uqft .tableBlock th,.css-63uqft .tableBlock td{padding:10px;text-align:left;}.css-63uqft .tableBlock th{font-weight:bold!important;}.css-63uqft .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-63uqft .tableBlock caption>p{margin:0;}.css-63uqft .tableBlock th>p,.css-63uqft .tableBlock td>p{margin:0;}.css-63uqft .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-63uqft .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-63uqft .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-63uqft .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-63uqft .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-63uqft .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-63uqft .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-63uqft .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-63uqft .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-63uqft .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-63uqft .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-63uqft .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-63uqft .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-63uqft .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-63uqft .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-63uqft .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-63uqft .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-63uqft .tableBlock [data-text-align='center']{text-align:center;}.css-63uqft .tableBlock [data-text-align='left']{text-align:left;}.css-63uqft .tableBlock [data-text-align='right']{text-align:right;}.css-63uqft .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-63uqft .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-63uqft .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-63uqft .tableBlock__font-size--xxsmall{font-size:10px;}.css-63uqft .tableBlock__font-size--xsmall{font-size:12px;}.css-63uqft .tableBlock__font-size--small{font-size:14px;}.css-63uqft .tableBlock__font-size--large{font-size:18px;}.css-63uqft .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-63uqft .tableBlock__border--bordered td,.css-63uqft .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-63uqft .tableBlock__border--borderless tbody+tbody,.css-63uqft .tableBlock__border--borderless td,.css-63uqft .tableBlock__border--borderless th,.css-63uqft .tableBlock__border--borderless tr,.css-63uqft .tableBlock__border--borderless thead,.css-63uqft .tableBlock__border--borderless thead th{border:0!important;}.css-63uqft .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-63uqft .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-63uqft .tableBlock__table-compactl th,.css-63uqft .tableBlock__table-compact td{padding:3px!important;}.css-63uqft .tableBlock__full-size{width:100%;}.css-63uqft .textBlock{margin-bottom:16px;}.css-63uqft .textBlock__text-formatting--finePrint{font-size:12px;}.css-63uqft .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-63uqft .textBlock__text-infoBox p{margin:0;}.css-63uqft .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-63uqft .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-63uqft .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-63uqft .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-63uqft .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-63uqft .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-63uqft .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-63uqft .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-63uqft.css-63uqft{color:#2B3148;background-color:transparent;font-family:var(--calculator-ui-font-family),Verdana,sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-63uqft.css-63uqft:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-63uqft .js-external-link-button.link-like,.css-63uqft .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-63uqft .js-external-link-button.link-like:hover,.css-63uqft .js-external-link-anchor:hover,.css-63uqft .js-external-link-button.link-like:active,.css-63uqft .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-63uqft .js-external-link-button.link-like:focus-visible,.css-63uqft .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-63uqft p,.css-63uqft div{margin:0;display:block;}.css-63uqft pre{margin:0;display:block;}.css-63uqft pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-63uqft pre:not(:first-child){padding-top:8px;}.css-63uqft ul,.css-63uqft ol{display:block margin:0;padding-left:20px;}.css-63uqft ul li,.css-63uqft ol li{padding-top:8px;}.css-63uqft ul ul,.css-63uqft ol ul,.css-63uqft ul ol,.css-63uqft ol ol{padding-top:0;}.css-63uqft ul:not(:first-child),.css-63uqft ol:not(:first-child){padding-top:4px;} Interpretation

Significance level α

P-Value And Statistical Significance: What It Is & Why It Matters

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P Value Calculator From T Score
  • P-Value Calculator For Chi-Square
  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Interpret the key results for Normality Test

In this topic, step 1: determine whether the data do not follow a normal distribution, step 2: visualize the fit of the normal distribution.

hypothesis testing normal distribution p value

Key Result: P-Value

In these results, the null hypothesis states that the data follow a normal distribution. Because the p-value is 0.463, which is greater than the significance level of 0.05, the decision is to fail to reject the null hypothesis. You cannot conclude that the data do not follow a normal distribution.

Right-skewed data

Left-skewed data.

In Minitab, hold your pointer over the fitted distribution line to see a chart of percentiles and values.

In this probability plot, the data form an approximately straight line along the line. The normal distribution appears to be a good fit to the data.

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

9.3 Probability Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with various types of hypothesis testing.

The following table summarizes various hypothesis tests and corresponding probability distributions that will be used to conduct the test (based on the assumptions shown below):

Type of Hypothesis Test Population Parameter Estimated value (point estimate) Probability Distribution Used
Hypothesis test for the mean, when the population standard deviation is known Population mean Sample mean Normal distribution,
Hypothesis test for the mean, when the population standard deviation is unknown and the distribution of the sample mean is approximately normal Population mean Sample mean Student’s t-distribution,
Hypothesis test for proportions Population proportion Sample proportion Normal distribution,

Assumptions

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed , or your sample size is sufficiently large. You know the value of the population standard deviation , which, in reality, is rarely known.

When you perform a hypothesis test of a single population mean μ using a Student's t-distribution (often called a t -test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t -test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution : there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( n p > 5   n p > 5   and n q > 5   n q > 5   ). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p   μ = p   and σ = p q n σ = p q n . Remember that q = 1 - p q q = 1 - p q .

Hypothesis Test for the Mean

Going back to the standardizing formula we can derive the test statistic for testing hypotheses concerning means.

The standardizing formula cannot be solved as it is because we do not have μ, the population mean. However, if we substitute in the hypothesized value of the mean, μ 0 in the formula as above, we can compute a Z value. This is the test statistic for a test of hypothesis for a mean and is presented in Figure 9.3 . We interpret this Z value as the associated probability that a sample with a sample mean of X ¯ X ¯ could have come from a distribution with a population mean of H 0 and we call this Z value Z c for “calculated”. Figure 9.3 and Figure 9.4 show this process.

In Figure 9.3 two of the three possible outcomes are presented. X ¯ 1 X ¯ 1 and X ¯ 3 X ¯ 3 are in the tails of the hypothesized distribution of H 0 . Notice that the horizontal axis in the top panel is labeled X ¯ X ¯ 's. This is the same theoretical distribution of X ¯ X ¯ 's, the sampling distribution, that the Central Limit Theorem tells us is normally distributed. This is why we can draw it with this shape. The horizontal axis of the bottom panel is labeled Z and is the standard normal distribution. Z α 2 Z α 2 and -Z α 2 -Z α 2 , called the critical values , are marked on the bottom panel as the Z values associated with the probability the analyst has set as the level of significance in the test, (α). The probabilities in the tails of both panels are, therefore, the same.

Notice that for each X ¯ X ¯ there is an associated Z c , called the calculated Z, that comes from solving the equation above. This calculated Z is nothing more than the number of standard deviations that the hypothesized mean is from the sample mean. If the sample mean falls "too many" standard deviations from the hypothesized mean we conclude that the sample mean could not have come from the distribution with the hypothesized mean, given our pre-set required level of significance. It could have come from H 0 , but it is deemed just too unlikely. In Figure 9.3 both X ¯ 1 X ¯ 1 and X ¯ 3 X ¯ 3 are in the tails of the distribution. They are deemed "too far" from the hypothesized value of the mean given the chosen level of alpha. If in fact this sample mean it did come from H 0 , but from in the tail, we have made a Type I error: we have rejected a good null. Our only real comfort is that we know the probability of making such an error, α, and we can control the size of α.

Figure 9.4 shows the third possibility for the location of the sample mean, x _ x _ . Here the sample mean is within the two critical values. That is, within the probability of (1-α) and we cannot reject the null hypothesis.

This gives us the decision rule for testing a hypothesis for a two-tailed test:

Decision rule: two-tail test
If < : then do not REJECT
If > : then REJECT

This rule will always be the same no matter what hypothesis we are testing or what formulas we are using to make the test. The only change will be to change the Z c to the appropriate symbol for the test statistic for the parameter being tested. Stating the decision rule another way: if the sample mean is unlikely to have come from the distribution with the hypothesized mean we cannot accept the null hypothesis. Here we define "unlikely" as having a probability less than alpha of occurring.

P-Value Approach

An alternative decision rule can be developed by calculating the probability that a sample mean could be found that would give a test statistic larger than the test statistic found from the current sample data assuming that the null hypothesis is true. Here the notion of "likely" and "unlikely" is defined by the probability of drawing a sample with a mean from a population with the hypothesized mean that is either larger or smaller than that found in the sample data. Simply stated, the p-value approach compares the desired significance level, α, to the p-value which is the probability of drawing a sample mean further from the hypothesized value than the actual sample mean. A large p -value calculated from the data indicates that we should not reject the null hypothesis . The smaller the p -value, the more unlikely the outcome, and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it. The relationship between the decision rule of comparing the calculated test statistics, Z c , and the Critical Value, Z α , and using the p -value can be seen in Figure 9.5 .

The calculated value of the test statistic is Z c in this example and is marked on the bottom graph of the standard normal distribution because it is a Z value. In this case the calculated value is in the tail and thus we cannot accept the null hypothesis, the associated X ¯ X ¯ is just too unusually large to believe that it came from the distribution with a mean of µ 0 with a significance level of α.

If we use the p -value decision rule we need one more step. We need to find in the standard normal table the probability associated with the calculated test statistic, Z c . We then compare that to the α associated with our selected level of confidence. In Figure 9.5 we see that the p -value is less than α and therefore we cannot accept the null. We know that the p -value is less than α because the area under the p-value is smaller than α/2. It is important to note that two researchers drawing randomly from the same population may find two different P-values from their samples. This occurs because the P-value is calculated as the probability in the tail beyond the sample mean assuming that the null hypothesis is correct. Because the sample means will in all likelihood be different this will create two different P-values. Nevertheless, the conclusions as to the null hypothesis should be different with only the level of probability of α.

Here is a systematic way to make a decision of whether you cannot accept or cannot reject a null hypothesis if using the p -value and a preset or preconceived α (the " significance level "). A preset α is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem. In any case, the value of α is the decision of the analyst. When you make a decision to reject or not reject H 0 , do as follows:

  • If α > p -value, cannot accept H 0 . The results of the sample data are significant. There is sufficient evidence to conclude that H 0 is an incorrect belief and that the alternative hypothesis , H a , may be correct.
  • If α ≤ p -value, cannot reject H 0 . The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis, H a , may be correct. In this case the status quo stands.
  • When you "cannot reject H 0 ", it does not mean that you should believe that H 0 is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of H 0 . Remember that the null is the status quo and it takes high probability to overthrow the status quo. This bias in favor of the null hypothesis is what gives rise to the statement "tyranny of the status quo" when discussing hypothesis testing and the scientific method.

Both decision rules will result in the same decision and it is a matter of preference which one is used.

One and Two-tailed Tests

The discussion of Figure 9.3 - Figure 9.5 was based on the null and alternative hypothesis presented in Figure 9.3 . This was called a two-tailed test because the alternative hypothesis allowed that the mean could have come from a population which was either larger or smaller than the hypothesized mean in the null hypothesis. This could be seen by the statement of the alternative hypothesis as μ ≠ 100, in this example.

It may be that the analyst has no concern about the value being "too" high or "too" low from the hypothesized value. If this is the case, it becomes a one-tailed test and all of the alpha probability is placed in just one tail and not split into α/2 as in the above case of a two-tailed test. Any test of a claim will be a one-tailed test. For example, a car manufacturer claims that their Model 17B provides gas mileage of greater than 25 miles per gallon. The null and alternative hypothesis would be:

  • H 0 : µ ≤ 25
  • H a : µ > 25

The claim would be in the alternative hypothesis. The burden of proof in hypothesis testing is carried in the alternative. This is because failing to reject the null, the status quo, must be accomplished with 90 or 95 percent confidence that it cannot be maintained. Said another way, we want to have only a 5 or 10 percent probability of making a Type I error, rejecting a good null; overthrowing the status quo.

This is a one-tailed test and all of the alpha probability is placed in just one tail and not split into α/2 as in the above case of a two-tailed test.

Figure 9.6 shows the two possible cases and the form of the null and alternative hypothesis that give rise to them.

where μ 0 is the hypothesized value of the population mean.

Sample size Test statistic
< 30
(σ unknown)
< 30
(σ known)
> 30
(σ unknown)
> 30
(σ known)

Effects of Sample Size on Test Statistic

In developing the confidence intervals for the mean from a sample, we found that most often we would not have the population standard deviation, σ. If the sample size were less than 30, we could simply substitute the point estimate for σ, the sample standard deviation, s, and use the student's t -distribution to correct for this lack of information.

When testing hypotheses we are faced with this same problem and the solution is exactly the same. Namely: If the population standard deviation is unknown, and the sample size is less than 30, substitute s, the point estimate for the population standard deviation, σ, in the formula for the test statistic and use the student's t -distribution. All the formulas and figures above are unchanged except for this substitution and changing the Z distribution to the student's t -distribution on the graph. Remember that the student's t -distribution can only be computed knowing the proper degrees of freedom for the problem. In this case, the degrees of freedom is computed as before with confidence intervals: df = (n-1). The calculated t-value is compared to the t-value associated with the pre-set level of confidence required in the test, t α , df found in the student's t tables. If we do not know σ, but the sample size is 30 or more, we simply substitute s for σ and use the normal distribution.

Table 9.5 summarizes these rules.

A Systematic Approach for Testing a Hypothesis

A systematic approach to hypothesis testing follows the following steps and in this order. This template will work for all hypotheses that you will ever test.

  • Set up the null and alternative hypothesis. This is typically the hardest part of the process. Here the question being asked is reviewed. What parameter is being tested, a mean, a proportion, differences in means, etc. Is this a one-tailed test or two-tailed test?

Decide the level of significance required for this particular case and determine the critical value. These can be found in the appropriate statistical table. The levels of confidence typical for businesses are 80, 90, 95, 98, and 99. However, the level of significance is a policy decision and should be based upon the risk of making a Type I error, rejecting a good null. Consider the consequences of making a Type I error.

Next, on the basis of the hypotheses and sample size, select the appropriate test statistic and find the relevant critical value: Z α , t α , etc. Drawing the relevant probability distribution and marking the critical value is always big help. Be sure to match the graph with the hypothesis, especially if it is a one-tailed test.

  • Take a sample(s) and calculate the relevant parameters: sample mean, standard deviation, or proportion. Using the formula for the test statistic from above in step 2, now calculate the test statistic for this particular case using the parameters you have just calculated.
  • The test statistic is in the tail: Cannot Accept the null, the probability that this sample mean (proportion) came from the hypothesized distribution is too small to believe that it is the real home of these sample data.
  • The test statistic is not in the tail: Cannot Reject the null, the sample data are compatible with the hypothesized population parameter.
  • Reach a conclusion. It is best to articulate the conclusion two different ways. First a formal statistical conclusion such as “With a 5 % level of significance we cannot accept the null hypotheses that the population mean is equal to XX (units of measurement)”. The second statement of the conclusion is less formal and states the action, or lack of action, required. If the formal conclusion was that above, then the informal one might be, “The machine is broken and we need to shut it down and call for repairs”.

All hypotheses tested will go through this same process. The only changes are the relevant formulas and those are determined by the hypothesis required to answer the original question.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Business Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/9-3-probability-distribution-needed-for-hypothesis-testing

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

hypothesis testing normal distribution p value

  • The Open University
  • Accessibility hub
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.1 The normal distribution

Here, you will look at the concept of normal distribution and the bell-shaped curve. The peak point (the top of the bell) represents the most probable occurrences, while other possible occurrences are distributed symmetrically around the peak point, creating a downward-sloping curve on either side of the peak point.

Cartoon showing a bell-shaped curve.

The cartoon shows a bell-shaped curve. The x-axis is titled ‘How high the hill is’ and the y-axis is titled ‘Number of hills’. The top of the bell-shaped curve is labelled ‘Average hill’, but on the lower right tail of the bell-shaped curve is labelled ‘Big hill’.

In order to test hypotheses, you need to calculate the test statistic and compare it with the value in the bell curve. This will be done by using the concept of ‘normal distribution’.

A normal distribution is a probability distribution that is symmetric about the mean, indicating that data near the mean are more likely to occur than data far from it. In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish to use to test the set of hypotheses is the z-score . A z-score is used to measure how far the observation (sample mean) is from the 0 value of the bell curve (population mean). In statistics, this distance is measured by standard deviation. Therefore, when the z-score is equal to 2, the observation is 2 standard deviations away from the value 0 in the normal distribution curve.

A symmetrical graph reminiscent of a bell showing normal distribution.

A symmetrical graph reminiscent of a bell. The top of the bell-shaped curve appears where the x-axis is at 0. This is labelled as Normal distribution.

Previous

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Why are p-values uniformly distributed under the null hypothesis?

Recently, I have found in a paper by Klammer, et al. a statement that p-values should be uniformly distributed. I believe the authors, but cannot understand why it is so.

Klammer, A. A., Park, C. Y., and Stafford Noble, W. (2009) Statistical Calibration of the SEQUEST XCorr Function . Journal of Proteome Research . 8(4): 2106–2113.

  • uniform-distribution

Alexis's user avatar

  • 31 $\begingroup$ This is immediate from the definition of the p-value as the probability integral transform of the test statistic using the distribution under the null hypothesis. The conclusion requires that the distribution be continuous. When the distribution is discrete (or has atoms), the distribution of p-values is discrete, too, and therefore can only approximately be uniform. $\endgroup$ –  whuber ♦ Commented May 10, 2011 at 18:46
  • 2 $\begingroup$ @whuber gave the answer which was something I suspected. I asked the original reference just to be sure that something was not lost in translation. Usually it does not matter whether the article is specific or not, statistical content always shows through :) $\endgroup$ –  mpiktas Commented May 10, 2011 at 18:56
  • 14 $\begingroup$ Only when $H_0$ is true ! ... and more strictly, only when continuous (though something like it is true in the non-continuous case; I don't know the right word for the most general case; it's not uniformity). Then it follows from the definition of p-value. $\endgroup$ –  Glen_b Commented Jun 7, 2013 at 1:35
  • 5 $\begingroup$ This could be seen as a variant of the fundamental statistical mechanics principle (that students often have similar difficulty accepting) that all micro-states of a physical system have equal probability. $\endgroup$ –  DWin Commented Jul 21, 2013 at 19:43
  • 7 $\begingroup$ How about the claim in this article: plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0076010 ? $\endgroup$ –  user54876 Commented Aug 28, 2014 at 18:26

5 Answers 5

To clarify a bit. The p-value is uniformly distributed when the null hypothesis is true and all other assumptions are met. The reason for this is really the definition of alpha as the probability of a type I error. We want the probability of rejecting a true null hypothesis to be alpha, we reject when the observed $\text{p-value} < \alpha$, the only way this happens for any value of alpha is when the p-value comes from a uniform distribution. The whole point of using the correct distribution (normal, t, f, chisq, etc.) is to transform from the test statistic to a uniform p-value. If the null hypothesis is false then the distribution of the p-value will (hopefully) be more weighted towards 0.

The Pvalue.norm.sim and Pvalue.binom.sim functions in the TeachingDemos package for R will simulate several data sets, compute the p-values and plot them to demonstrate this idea.

Murdoch, D, Tsai, Y, and Adcock, J (2008). P-Values are Random Variables. The American Statistician , 62 , 242-245.

for some more details.

Since people are still reading this answer and commenting, I thought that I would address @whuber's comment.

It is true that when using a composite null hypothesis like $\mu_1 \leq \mu_2$ that the p-values will only be uniformly distributed when the 2 means are exactly equal and will not be a uniform if $\mu_1$ is any value that is less than $\mu_2$. This can easily be seen using the Pvalue.norm.sim function and setting it to do a one sided test and simulating with the simulation and hypothesized means different (but in the direction to make the null true).

As far as statistical theory goes, this does not matter. Consider if I claimed that I am taller than every member of your family, one way to test this claim would be to compare my height to the height of each member of your family one at a time. Another option would be to find the member of your family that is the tallest and compare their height with mine. If I am taller than that one person then I am taller than the rest as well and my claim is true, if I am not taller than that one person then my claim is false. Testing a composite null can be seen as a similar process, rather than testing all the possible combinations where $\mu_1 \leq \mu_2$ we can test just the equality part because if we can reject that $\mu_1 = \mu_2$ in favour of $\mu_1 > \mu_2$ then we know that we can also reject all the possibilities of $\mu_1 < \mu_2$. If we look at the distribution of p-values for cases where $\mu_1 < \mu_2$ then the distribution will not be perfectly uniform but will have more values closer to 1 than to 0 meaning that the probability of a type I error will be less than the selected $\alpha$ value making it a conservative test. The uniform becomes the limiting distribution as $\mu_1$ gets closer to $\mu_2$ (the people who are more current on the stat-theory terms could probably state this better in terms of distributional supremum or something like that). So by constructing our test assuming the equal part of the null even when the null is composite, then we are designing our test to have a probability of a type I error that is at most $\alpha$ for any conditions where the null is true.

Greg Snow's user avatar

  • 2 $\begingroup$ The article "P-Values are Random Variables" is really interesting, is there any introductory book that adheres to the principles stated in the article? $\endgroup$ –  Alessandro Jacopson Commented Jun 30, 2011 at 13:09
  • $\begingroup$ @uvts_cvs, I think most intro books follow the general idea, but I don't know of any that make it as explicit as the article. The theory books are more likely to talk about how the p-value is a transform from the statistic to something that is uniform under the null. $\endgroup$ –  Greg Snow Commented Jun 30, 2011 at 15:34
  • 12 $\begingroup$ Despite the comment I posted to the question, I have since realized that the conclusion is not true except in special cases. The problem occurs with composite hypotheses, such as $\mu_1 \le \mu_2$. "The null hypothesis is true" now covers many possibilities, such as the case $\mu_1 = \mu_2 - 10^6$. In such a case, the p-values will not be uniformly distributed. I suspect one could manufacture (somewhat artificial) situations in which, no matter what element of the null hypothesis holds, the distribution of p-values would never be anywhere near uniform. $\endgroup$ –  whuber ♦ Commented Jul 20, 2012 at 14:50
  • 2 $\begingroup$ @Greg Snow: I think that the distribution of the p-values is not always uniform, it is uniform when they are computed from a continuous distribution, but not when they are computed from a discrete distribution $\endgroup$ –  user83346 Commented Aug 16, 2015 at 16:58
  • 2 $\begingroup$ I have expanded the answer above to address the comment by @whuber. $\endgroup$ –  Greg Snow Commented Aug 17, 2015 at 15:42

Under the null hypothesis, your test statistic $T$ has the distribution $F(t)$ (e.g., standard normal). We show that the p-value $P=F(T)$ has a probability distribution $$\begin{equation*} \Pr(P < p) = \Pr(F^{-1}(P) < F^{-1}(p)) = \Pr(T < t) \equiv p; \end{equation*}$$ in other words, $P$ is distributed uniformly. This holds so long as $F(\cdot)$ is invertible, a necessary condition of which is that $T$ is not a discrete random variable.

This result is general: the distribution of an invertible CDF of a random variable is uniform on $[0,1]$.

Charlie's user avatar

  • 12 $\begingroup$ you might want to rephrase your last comment, which is a little confusing. Continuous CDFs do not necessarily have a (proper) inverse. (Can you think of a counterexample?) So your proof requires additional conditions to hold. The standard way to get around this is to define the pseudoinverse $F^{\,\leftarrow}(y) = \inf\{x: F(x) \geq y\}$. The argument becomes more subtle, too. $\endgroup$ –  cardinal Commented May 26, 2011 at 23:36
  • 2 $\begingroup$ Concerning working with generalized inverses, see link.springer.com/article/10.1007%2Fs00186-013-0436-7 (in particular, F(T) is only uniform if F is continuous -- doesn't matter whether F is invertible or not). Concerning your definition of a p-value: I don't think it's always 'F(T)'. It's the probability (under the null) of taking on a value more extreme than the observed one, so it could also be the survival function (just to be precise here). $\endgroup$ –  mathlete Commented Mar 5, 2016 at 9:03
  • $\begingroup$ Isn't $F(t)$ the CDF? $\endgroup$ –  zyxue Commented May 2, 2018 at 21:58
  • 1 $\begingroup$ @zyxue Yes, the cdf is sometimes referred to as the "distribution". $\endgroup$ –  mai Commented Sep 22, 2018 at 2:50
  • 2 $\begingroup$ Why is the p-value = $F(T)$? $\endgroup$ –  qwr Commented Dec 18, 2020 at 4:13

Let $T$ denote the random variable with cumulative distribution function $F(t) \equiv \Pr(T<t)$ for all $t$. Assuming that $F$ is invertible we can derive distribution of the random p-value $P = F(T)$ as follows:

$$ \Pr(P<p) = \Pr(F(T) < p) = \Pr(T < F^{-1}(p)) = F(F^{-1}(p)) = p, $$

from which we can conclude that the distribution of $P$ is uniform on $[0,1]$.

This answer is similar to Charlie's, but avoids having to define $t = F^{-1}(p)$.

jII's user avatar

  • 1 $\begingroup$ As you've defined F, isn't P = F(T) = Pr(T < T) = 0? $\endgroup$ –  TrynnaDoStat Commented Jun 27, 2019 at 19:24
  • 3 $\begingroup$ Not exactly, the "syntactic replacement" of $F(T) = \Pr(T<T)$ is somewhat misleading. Formally speaking, $F(T)$ is the random variable defined by $(F(T))(\omega) = F(T(\omega)) := \Pr(T < T(\omega))$ $\endgroup$ –  jII Commented Jun 27, 2019 at 21:17
  • 1 $\begingroup$ Isn't $F(t) = 1 - Pr(T < t)$? The derivation isn't any different, but just wondering. $\endgroup$ –  student010101 Commented Apr 9, 2021 at 18:38
  • 1 $\begingroup$ @student010101 I think it depends on whether the example in your head is a one-sided left-tail test or a one-sided right-tail test. For the right-tail test, $F(t) = 1-P(T<t)$ as you said. I suppose jll used the left-tail test for ease of derivation. See: en.wikipedia.org/wiki/P-value#Definition_and_interpretation $\endgroup$ –  EssentialAnonymity Commented Apr 9, 2021 at 20:59
  • 1 $\begingroup$ @StrugglingStudent42 Oh hah, it's just a coincidence. I asked because this is a super old post and we both commented within a couple hours of each other. $\endgroup$ –  student010101 Commented Apr 9, 2021 at 21:22

I think the answer as to " Why are p-values uniformly distributed under the null hypothesis? " has been sufficiently discussed from a mathematical perspective. What I thought is missing is a visual explanation of this and the idea of thinking of p-values as areas to the left of a set of quantiles under a given continuous distribution (probability density function). By quantiles I mean cut-off points along a distribution (in this example the standard normal distribution), which split the distribution into equal parts containing exactly the same area under the curve.

For this example, I generated 100 random data points from the standard normal distribution with a mean of 0 and a standard deviation of 1, $\mathcal{N}(\mu = 0, \sigma = 1)$ . Then I plotted those points in a histogram and we can see a bell-shaped distribution forming (Fig. 1A). Then I calculated the p-values of those points, i.e. the areas to the left of those points given the standard normal distribution, plotted those p-values in a histogram (Fig. 1B) and a uniform(ish) distribution is emerging binning those p-values in 0.1 intervals.

This step, i.e. the step from Fig 1A to Fig 1B is puzzling for many people and has been for me as well for some time - until I started thinking of p-values as areas under the curve . My thought was that if I split the standard normal distribution into equal chunks containing the same area (in this case 0.1 to match the histogram in Fig 1B), I will have larger intervals in the tails (Fig 1C). Now if I go back to Fig 1A, I will be able to fit all points ranging from -4 to -1.28 (the interval in Fig 1C) into the first bin of Fig 1B since they all result into areas (or p-values) of less than or equal to 0.1. As the density of points is increasing towards the mean, the intervals that cover an area of 0.1 are becoming increasingly smaller (Fig 1C) but the number of points in those intervals remains roughly equal and in this case matches the count in Fig 1B.

enter image description here

Once I understood this it was also easy for me to explain why a random sample of 100 points from a normal distribution with mean of 0 and a standard deviation of 3, $\mathcal{N}(\mu = 0, \sigma = 3)$ results into a higher frequency of p-values around 0 and 1 or in the tails (Fig 2B). The reason is that the p-values are calculated based on the standard normal distribution yet the sample comes from a normal distribution with mean of 0 and a standard deviation of 3. This will result into many more points in the tails than it would be for a sample coming from the standard normal distribution.

enter image description here

I hope this was not overly confusing and added some value to this thread.

Stefan's user avatar

Simple simulation of distribution of p-values in case of linear regression between two independent variables :

Qbik's user avatar

  • 10 $\begingroup$ Could you elaborate on how this answers the question? Although its output illustrates a special case of the assertion, no amount of code would be capable of addressing the question of why ? That requires additional explanation. $\endgroup$ –  whuber ♦ Commented Jun 2, 2015 at 14:11

Not the answer you're looking for? Browse other questions tagged p-value uniform-distribution or ask your own question .

  • Featured on Meta
  • Site maintenance - Mon, Sept 16 2024, 21:00 UTC to Tue, Sept 17 2024, 2:00...
  • User activation: Learnings and opportunities
  • Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...

Hot Network Questions

  • getting weird shaping using curves on objects
  • Function with memories of its past life
  • Place with signs in Chinese & Arabic
  • siunitx dollar per hour broken going from SI to qty
  • What are the pros and cons of the classic portfolio by Wealthfront?
  • How did people know that the war against the mimics was over?
  • Color nested bonds
  • How should I email HR after an unpleasant / annoying interview?
  • Сhanging borders of shared polygon shapefile features in QGIS
  • What would a planet need for rain drops to trigger explosions upon making contact with the ground?
  • What unintended side effects may arise from making bards count as both arcane and divine spellcasters?
  • Rocky Mountains Elevation Cutout
  • Doesn't nonlocality follow from nonrealism in the EPR thought experiment and Bell tests?
  • Color an item in an enumerated list (continued; not a duplicate)
  • Proving commutator of velocity differentiation and total time differentiation is position differentiation
  • Longtable goes beyond the right margin and footnote does not fit under the table
  • Does SpaceX Starship have significant methane emissions?
  • Offset+Length vs 2 Offsets
  • Convert base-10 to base-0.1
  • Do carbon fiber wings need a wing spar?
  • Why does a capacitor act as an open circuit under a DC circuit?
  • Why does Sfas Emes start his commentary on Parshat Noach by saying he doesn't know it? Is the translation faulty?
  • What’s the name of this horror movie where a girl dies and comes back to life evil?
  • Is it true that before European modernity, there were no "nations"?

hypothesis testing normal distribution p value

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

8.1.2.1 - normal approximation method formulas.

Here we will be using the five step hypothesis testing procedure to compare the proportion in one random sample to a specified population proportion using the normal approximation method.

In order to use the normal approximation method, the assumption is that both \(n p_0 \geq 10\) and \(n (1-p_0) \geq 10\). Recall that \(p_0\) is the population proportion in the null hypothesis.

Research Question Is the proportion different from \(p_0\)? Is the proportion greater than \(p_0\)? Is the proportion less than \(p_0\)?
Null Hypothesis, \(H_{0}\) \(p=p_0\) \(p= p_0\) \(p= p_0\)
Alternative Hypothesis, \(H_{a}\) \(p\neq p_0\) \(p> p_0\) \(p< p_0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional

Where \(p_0\) is the hypothesized population proportion that you are comparing your sample to.

When using the normal approximation method we will be using a z test statistic. The z test statistic tells us how far our sample proportion is from the hypothesized population proportion in standard error units. Note that this formula follows the basic structure of a test statistic that you learned in the last lesson:

\(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

\(\widehat{p}\) = sample proportion \(p_{0}\) = hypothesize population proportion \(n\) = sample size

Given that the null hypothesis is true, the p value is the probability that a randomly selected sample of n would have a sample proportion as different, or more different, than the one in our sample, in the direction of the alternative hypothesis. We can find the p value by mapping the test statistic from step 2 onto the z distribution. 

Note that p-values are also symbolized by \(p\). Do not confuse this with the population proportion which shares the same symbol.

We can look up the \(p\)-value using Minitab by constructing the sampling distribution.  Because we are using the normal approximation here, we have a \(z\) test statistic that we can map onto the \(z\) distribution. Recall, the z distribution is a normal distribution with a mean of 0 and standard deviation of 1. If we are conducting a one-tailed (i.e., right- or left-tailed) test, we look up the area of the sampling distribution that is beyond our test statistic. If we are conducting a two-tailed (i.e., non-directional) test there is one additional step: we need to multiple the area by two to take into account the possibility of being in the right or left tail. 

We can decide between the null and alternative hypotheses by examining our p-value. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis. Unless stated otherwise, assume that \(\alpha=.05\).

When we reject the null hypothesis our results are said to be statistically significant.

Based on our decision in step 4, we will write a sentence or two concerning our decision in relation to the original research question.

COMMENTS

  1. S.3.2 Hypothesis Testing (P-Value Approach)

    Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...

  2. How to Find the P value: Process and Calculations

    To find the p value for your sample, do the following: Identify the correct test statistic. Calculate the test statistic using the relevant properties of your sample. Specify the characteristics of the test statistic's sampling distribution. Place your test statistic in the sampling distribution to find the p value.

  3. 7.4.1

    The p value will be the area on the z distribution that is more extreme than the test statistic of 2.542, in the direction of the alternative hypothesis. This is a two-tailed test: Dist r ibution Plot Normal , Mean = 0 , S t D e v=1 0.0 0 . 1 0 . 2 0 . 3 0.4 0 X Densi t y - 2 . 5 4 20 0 0. 0 0 5 5 1 1 0 0. 0 0 5 5 1 1 0 2 . 5 42

  4. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  5. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  6. Understanding P-values

    The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true. P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null ...

  7. Statistical Significance Explained

    Hypothesis Testing: A technique used to test a theory; Normal Distribution: An approximate representation of the data in a hypothesis test. p-value: The probability a result at least as extreme at that observed would have occurred if the null hypothesis is true. Now, let's put the pieces together in our example. Here are the basics:

  8. 7.4.1

    Determine the p-value. The p-value is the area under the standard normal distribution that is more extreme than the test statistic in the direction of the alternative hypothesis. Make a decision. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis. State a "real world" conclusion.

  9. p-value

    Definition. The p -value is the probability under the null hypothesis of obtaining a real-valued test statistic at least as extreme as the one obtained. Consider an observed test-statistic from unknown distribution . Then the p -value is what the prior probability would be of observing a test-statistic value at least as "extreme" as if null ...

  10. p-value Calculator

    To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true.Then, with the help of the cumulative distribution function (cdf) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:Left-tailed test:

  11. Understanding P-Values and Statistical Significance

    The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01. ... Two-Tailed Test In a normal distribution, the significance level corresponds to regions in the tails of the ...

  12. Interpret the key results for Normality Test

    Key Result: P-Value. In these results, the null hypothesis states that the data follow a normal distribution. Because the p-value is 0.463, which is greater than the significance level of 0.05, the decision is to fail to reject the null hypothesis. You cannot conclude that the data do not follow a normal distribution.

  13. Data analysis: hypothesis testing: 6.1 Defining the p-value

    6.1 Defining the p-value. To conduct a hypothesis test using the p-value, you calculate the test statistic, such as a z-score or t-score, and then use it to determine the corresponding p-value from a probability distribution table or statistical software. The decision rules for using p-values are: If p-value ≤ α, you reject the null hypothesis.

  14. 9.3 Probability Distribution Needed for Hypothesis Testing

    Assumptions. When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed, or your sample size is sufficiently large.You know the value of the population standard deviation, which, in reality, is rarely known.

  15. 7.5: Critical values, p-values, and significance level

    When we use z z -scores in this way, the obtained value of z z (sometimes called z z -obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis. The formula for our z z -statistic has not changed: z = X¯¯¯¯ − μ σ¯/ n−−√ (7.5.1) (7.5.1) z = X ¯ − μ σ ¯ / n.

  16. Data analysis: hypothesis testing: 4.1 The normal distribution

    In graph form, a normal distribution appears as a bell curve. The values in the x-axis of the normal distribution graph represent the z-scores. The test statistic that you wish to use to test the set of hypotheses is the z-score. A z-score is used to measure how far the observation (sample mean) is from the 0 value of the bell curve (population ...

  17. Hypothesis Testing

    The p-value is a probability computed assuming the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Since it's a probability, it is a number between 0 and 1. The closer the number is to 0 means the event is "unlikely." So if p-value is "small," (typically, less ...

  18. Normal Hypothesis Testing

    Step 3. Assuming the null hypothesis to be true, define the test statistic, usually. Step 4. Calculate either the critical value (s) or the p - value (probability of the observed value) for the test. Step 5. Compare the observed value of the test statistic with the critical value (s) or the p - value with the significance level.

  19. 8.1.3: Distribution Needed for Hypothesis Testing

    If you are testing a single population mean, the distribution for the test is for means: ˉX ∼ N(μx, σx √n) or. tdf. The population parameter is μ. The estimated value (point estimate) for μ is ˉx, the sample mean. If you are testing a single population proportion, the distribution for the test is for proportions or percentages:

  20. Introduction to Hypothesis testing for Normal distribution

    Introduction to Hypothesis testing for Normal distributionIn this tutorial, we learn how to conduct a hypothesis test for normal distribution using p values ...

  21. PDF §5.1 HYPOTHESIS TESTS USING NORMAL DISTRIBUTIONS

    Three things Examples Normal distributions The standard normal distribution Dotplots Hypothesis testing, II §5.1 HYPOTHESIS TESTS USING NORMAL DISTRIBUTIONS TomLewis SpringSemester ... tofindthep-value. Showalldetailsofthetest. ... Three things Examples Normal distributions The standard normal distribution Dotplots Hypothesis testing, II ...

  22. Why are p-values uniformly distributed under the null hypothesis?

    The whole point of using the correct distribution (normal, t, f, chisq, etc.) is to transform from the test statistic to a uniform p-value. If the null hypothesis is false then the distribution of the p-value will (hopefully) be more weighted towards 0.

  23. 8.2: The controversy over proper hypothesis testing

    We can calibrate the Bayesian probability to the frequentist p-value (Selke et al 2001; Goodman 2008; Held 2010; Greenland and Poole 2012). Methods to achieve this calibration vary, but the Fagan nomogram proposed by Held (2010) is a good tool for us as we go forward. We can calculate our NHST p-value, but then convert the p-value to a Bayes factor by looking at the nomogram.

  24. 8.1.2.1

    Here we will be using the five step hypothesis testing procedure to compare the proportion in one random sample to a specified population proportion using the normal approximation method. 1. Check assumptions and write hypotheses. In order to use the normal approximation method, the assumption is that both n p 0 ≥ 10 and n (1 − p 0) ≥ 10.