Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null and Alternative Hypotheses | Definitions & Examples

Published on 5 October 2022 by Shaun Turney . Revised on 6 December 2022.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis (H 0 ): There’s no effect in the population .
  • Alternative hypothesis (H A ): There’s an effect in the population.

The effect is usually the effect of the independent variable on the dependent variable .

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, differences between null and alternative hypotheses, how to write null and alternative hypotheses, frequently asked questions about null and alternative hypotheses.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”, the null hypothesis (H 0 ) answers “No, there’s no effect in the population.” On the other hand, the alternative hypothesis (H A ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample.

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept. Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect”, “no difference”, or “no relationship”. When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis (H A ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect”, “a difference”, or “a relationship”. When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes > or <). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question
  • They both make claims about the population
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis (H 0 ): Independent variable does not affect dependent variable .
  • Alternative hypothesis (H A ): Independent variable affects dependent variable .

Test-specific

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2022, December 06). Null and Alternative Hypotheses | Definitions & Examples. Scribbr. Retrieved 3 September 2024, from https://www.scribbr.co.uk/stats/null-and-alternative-hypothesis/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, levels of measurement: nominal, ordinal, interval, ratio, the standard normal distribution | calculator, examples & uses, types of variables in research | definitions & examples.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.1 - introduction to hypothesis testing.

Previously we used confidence intervals to estimate unknown population parameters. We compared confidence intervals to specified parameter values and when the specific value was contained in the interval, we concluded that there was not sufficient evidence of a difference between the population parameter and the specified value. In other words, any values within the confidence intervals were reasonable estimates of the population parameter and any values outside of the confidence intervals were not reasonable estimates. Here, we are going to look at a more formal method for testing whether a given value is a reasonable value of a population parameter. To do this we need to have a hypothesized value of the population parameter. 

In this lesson we will compare data from a sample to a hypothesized parameter. In each case, we will compute the probability that a population with the specified parameter would produce a sample statistic as extreme or more extreme to the one we observed in our sample. This probability is known as the  p-value  and it is used to evaluate statistical significance.

A test is considered to be statistically significant  when the p-value is less than or equal to the level of significance, also known as the alpha (\(\alpha\)) level. For this class, unless otherwise specified, \(\alpha=0.05\); this is the most frequently used alpha level in many fields. 

Sample statistics vary from the population parameter randomly. When results are statistically significant, we are concluding that the difference observed between our sample statistic and the hypothesized parameter is unlikely due to random sampling variation.

Logo for LOUIS Pressbooks

Chapter 8: Hypothesis Testing with One Sample

8.1 Null and Alternative Hypotheses

Learning objectives.

By the end of this section, the student should be able to:

  • Describe hypothesis testing in general and in practice.

Hypothesis Testing

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 : The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

H a : The alternative hypothesis: It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are “reject H 0 ” if the sample information favors the alternative hypothesis or “do not reject H 0 ” or “decline to reject H 0 ” if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ 30

H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

H 0 : The drug reduces cholesterol by 25%. p = 0.25

H a : The drug does not reduce cholesterol by 25%. p ≠ 0.25

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

H 0 : μ = 2.0

H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0  : μ = 66
  • H a  : μ ≠ 66

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

H 0 : μ ≥ 5

H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • Ha: μ < 45

In an issue of U.S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.

H0: p ≤ 0.066

Ha: p > 0.066

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p = 0.40
  • H a : p > 0.40

Data from the National Institute of Mental Health. Available online at http://www.nimh.nih.gov/publicat/depression.cfm.

a statement about the value of a population parameter, in case of two hypotheses, the statement assumed to be true is called the null hypothesis (notation H0) and the contradictory statement is called the alternative hypothesis (notation Ha).

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Module 9: Hypothesis Testing With One Sample

Null and alternative hypotheses, learning outcomes.

  • Describe hypothesis testing in general and in practice

The actual test begins by considering two  hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 : The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

H a : The alternative hypothesis : It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make adecision. There are two options for a  decision . They are “reject H 0 ” if the sample information favors the alternative hypothesis or “do not reject H 0 ” or “decline to reject H 0 ” if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in  H 0 and H a :

equal (=) not equal (≠)
greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ 30

H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

H 0 : The drug reduces cholesterol by 25%. p = 0.25

H a : The drug does not reduce cholesterol by 25%. p ≠ 0.25

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

H 0 : μ = 2.0

H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 66 H a : μ __ 66

  • H 0 : μ = 66
  • H a : μ ≠ 66

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

H 0 : μ ≥ 5

H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 45 H a : μ __ 45

  • H 0 : μ ≥ 45
  • H a : μ < 45

In an issue of U.S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.

H 0 : p ≤ 0.066

H a : p > 0.066

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : p __ 0.40 H a : p __ 0.40

  • H 0 : p = 0.40
  • H a : p > 0.40

Concept Review

In a  hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis , typically denoted with H 0 . The null is not rejected unless the hypothesis test shows otherwise. The null statement must always contain some form of equality (=, ≤ or ≥) Always write the alternative hypothesis , typically denoted with H a or H 1 , using less than, greater than, or not equals symbols, i.e., (≠, >, or <). If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis. Never state that a claim is proven true or false. Keep in mind the underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only in terms of non-absolute certainties.

Formula Review

H 0 and H a are contradictory.

  • OpenStax, Statistics, Null and Alternative Hypotheses. Provided by : OpenStax. Located at : http://cnx.org/contents/[email protected]:58/Introductory_Statistics . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]
  • Simple hypothesis testing | Probability and Statistics | Khan Academy. Authored by : Khan Academy. Located at : https://youtu.be/5D1gV37bKXY . License : All Rights Reserved . License Terms : Standard YouTube License

Footer Logo Lumen Candela

Privacy Policy

Logo for Pressbooks at Virginia Tech

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

6.5 Introduction to Hypothesis Tests

Dalmation puppy near man sitting on the floor.

One job of a statistician is to make statistical inferences about populations based on samples taken from the population. Confidence intervals are one way to estimate a population parameter.

Another way to make a statistical inference is to make a decision about a parameter. For instance, a car dealer advertises that its new small truck gets 35 miles per gallon, on average. A tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. A company says that women managers in their company earn an average of $60,000 per year.  A statistician may want to make a decision about or evaluate these claims. A hypothesis test can be used to do this .

A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, to reject the null hypothesis.

In this section you will conduct hypothesis tests on single means when the population standard deviation is known.

Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will perform some variation of these steps:

  • Define hypotheses.
  • Collect and/OR use the sample data to determine the correct distribution to use.
  • Calculate Test Statistic.
  • Make a decision
  • Write a conclusion.

Defining your hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

The null hypothesis ( H 0 ): It is often a statement of the accepted historical value or norm. This is your starting point that you must assume from the beginning in order to show an effect exists.

The alternative hypothesis ( H a ) : It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision . There are two options for a decision. They are “reject H 0 ” if the sample information favors the alternative hypothesis or “do not reject H 0 ” or “decline to reject H 0 ” if the sample information is insufficient to reject the null hypothesis.

Mathematical symbols used in H 0 and H a :

Figure 6.12: Null and Alternative Hypotheses
equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null hypothesis is: H 0 : μ = 2.0. What is the alternative hypothesis?

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

Using the Sample to Test the Null Hypothesis

Once you have defined your hypotheses the next step in the process, is to collect sample data.  In a classroom context most of the time the data or summary statistics will be given to you.

Then you will have to determine the correct distribution to perform the hypothesis test, given the assumptions you are able to make about the situation.  Right now we are demonstrating these ideas in a test for a mean when the population standard deviation is known using the Z distribution.  We will see other scenarios in the future.

Calculating a Test Statistic

Next, you will start evaluating the data.  This begins with calculating your test statistic , which is a measure of how far what you observed is from what you are assuming to he true.  In this context, your test statistic, z ο , quantifies the number of standard deviations between the sample mean x and the population mean µ.  Calculating the test statistic is analogous to standardizing observations with Z-scores as discussed previously:

z=\frac{\overline{x}-{\mu }_{o}}{\left(\frac{\sigma }{\sqrt{n}}\right)}

where µ o   is the value assumed to be true in the null hypothesis.

Making a Decision

Once you have your test statistic there are two methods to use it to make your decision:

  • Critical value method – This is one way you can make a decision, but will not be discussed in detail at this time.
  • P-Value method – This is the preferred method we will focus on.

P-Value Method

To find a p-value we use the test statistic to calculate the actual probability of getting the test result. Formally, the p -value is the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample.

A large p -value calculated from the data indicates that we should not reject the null hypothesis. The smaller the p -value, the more unlikely the outcome, and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it.

Draw a graph that shows the p -value. The hypothesis test is easier to perform if you use a graph because you see the problem more clearly.

Suppose a baker claims that his bread height is more than 15 cm, on average. Several of his customers do not believe him. To persuade his customers that he is right, the baker decides to do a hypothesis test. He bakes 10 loaves of bread. The mean height of the sample loaves is 17 cm. The baker knows from baking hundreds of loaves of bread that the standard deviation for the height is 0.5 cm. and the distribution of heights is normal.

The null hypothesis could be H 0 : μ ≤ 15

The alternate hypothesis is H a : μ > 15

The words “is more than” translates as a “>” so “ μ > 15″ goes into the alternate hypothesis. The null hypothesis must contradict the alternate hypothesis.

\frac{\sigma }{\sqrt{n}}=\frac{0.5}{\sqrt{10}}=0.16

Suppose the null hypothesis is true (the mean height of the loaves is no more than 15 cm). Then is the mean height (17 cm) calculated from the sample unexpectedly large? The hypothesis test works by asking the question how unlikely the sample mean would be if the null hypothesis were true. The graph shows how far out the sample mean is on the normal curve. The p -value is the probability that, if we were to take other samples, any other sample mean would fall at least as far out as 17 cm.

The p -value, then, is the probability that a sample mean is the same or greater than 17 cm. when the population mean is, in fact, 15 cm. We can calculate this probability using the normal distribution for means.

Normal distribution curve on average bread heights with values 15, as the population mean, and 17, as the point to determine the p-value, on the x-axis.

A p -value of approximately zero tells us that it is highly unlikely that a loaf of bread rises no more than 15 cm, on average. That is, almost 0% of all loaves of bread would be at least as high as 17 cm. purely by CHANCE had the population mean height really been 15 cm. Because the outcome of 17 cm. is so unlikely (meaning it is happening NOT by chance alone), we conclude that the evidence is strongly against the null hypothesis (the mean height is at most 15 cm.). There is sufficient evidence that the true mean height for the population of the baker’s loaves of bread is greater than 15 cm.

A normal distribution has a standard deviation of 1. We want to verify a claim that the mean is greater than 12. A sample of 36 is taken with a sample mean of 12.5.

Find The P-value:

Decision and conclusion

A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the p -value and a preset or preconceived α (also called a significance level ). A preset α is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem.  If there is no given preconceived α , then use α = 0.05.

When you make a decision to reject or not reject H 0 , do as follows:

  • If α > p -value, reject H 0 . The results of the sample data are statistically significant . You can say there is sufficient evidence to conclude that H 0 is an incorrect belief and that the alternative hypothesis, H a , may be correct.
  • If α ≤ p -value, fail to reject H 0 . The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis, H a , may be correct.

After you make your decision, write a thoughtful conclusion in the context of the scenario incorporating the hypotheses.

NOTE: When you “do not reject H 0 “, it does not mean that you should believe that H 0 is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of H o .

When using the p -value to evaluate a hypothesis test, it is sometimes useful to use the following memory device

If the p -value is low, the null must go.

If the p -value is high, the null must fly.

This memory aid relates a p -value less than the established alpha (the p is low) as rejecting the null hypothesis and, likewise, relates a p -value higher than the established alpha (the p is high) as not rejecting the null hypothesis.

Fill in the blanks.

Reject the null hypothesis when                            .

The results of the sample data                           .

Do not reject the null when hypothesis when                           .

It’s a Boy Genetics Labs claim their procedures improve the chances of a boy being born. The results for a test of a single population proportion are as follows:

H 0 : p = 0.50, H a : p > 0.50

p -value = 0.025

Interpret the results and state a conclusion in simple, non-technical terms.

Image Credits

Figure 6.11: Alora Griffiths (2019). “Dalmation puppy near man…” Public domain. Retrieved from https://unsplash.com/photos/7aRQZtLsvqw

Figure 6.13: Kindred Grey via Virginia Tech (2020). “Figure 6.11” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_6.11.png . Adaptation of Figure 5.39 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/statistics/pages/5-practice

A decision making procedure for determining whether sample evidence supports a hypothesis

The claim that is assumed to be true and is tested in a hypothesis test

A working hypothesis that is contradictory to the null hypothesis

A measure of how far what you observed is from the hypothesized (or claimed) value

The probability that an event will occur, assuming the null hypothesis is true

Probability that a true null hypothesis will be rejected, also known as Type I error and denoted by α

Finding sufficient evidence that the effect we see is not just due to variability, often from rejecting the null hypothesis

Significant Statistics Copyright © 2020 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.2 Null and Alternative Hypotheses

Learning objectives.

  • Describe hypothesis testing in general and in practice.

A hypothesis test begins by considering two hypotheses .  They are called the null hypothesis and the alternative hypothesis .  These hypotheses contain opposing viewpoints and only one of these hypotheses is true.  The hypothesis test determines which hypothesis is most likely true.

  • The null hypothesis is a claim that a population parameter equals some value.  For example, [latex]H_0: \mu=5[/latex].
  • The alternative hypothesis is a claim that a population parameter is greater than, less than, or not equal to some value.  For example, [latex]H_a: \mu>5[/latex], [latex]H_a: \mu<5[/latex], or [latex]H_a: \mu \neq 5[/latex].  The form of the alternative hypothesis depends on the wording of the hypothesis test.
  • An alternative notation for [latex]H_a[/latex] is [latex]H_1[/latex].

Because the null and alternative hypotheses are contradictory, we must examine evidence to decide if we have enough evidence to reject the null hypothesis or not reject the null hypothesis.  The evidence is in the form of sample data.  After we have determined which hypothesis the sample data supports, we make a decision.  There are two options for a decision . They are “ reject [latex]H_0[/latex] ” if the sample information favors the alternative hypothesis or “ do not reject [latex]H_0[/latex] ” if the sample information is insufficient to reject the null hypothesis.

Watch this video: Simple hypothesis testing | Probability and Statistics | Khan Academy by Khan Academy [6:24]

A candidate in a local election claims that 30% of registered voters voted in a recent election.  Information provided by the returning office suggests that the percentage is higher than the 30% claimed.

The parameter under study is the proportion of registered voters, so we use [latex]p[/latex] in the statements of the hypotheses.  The hypotheses are

[latex]\begin{eqnarray*} \\ H_0: & & p=30\% \\ \\ H_a: & & p \gt 30\% \\ \\ \end{eqnarray*}[/latex]

  • The null hypothesis [latex]H_0[/latex] is the claim that the proportion of registered voters that voted equals 30%.
  • The alternative hypothesis [latex]H_a[/latex] is the claim that the proportion of registered voters that voted is greater than (i.e. higher) than 30%.

A medical researcher believes that a new medicine reduces cholesterol by 25%.  A medical trial suggests that the percent reduction is different than claimed.  State the null and alternative hypotheses.

[latex]\begin{eqnarray*} H_0: & & p=25\% \\ \\ H_a: & & p \neq 25\% \end{eqnarray*}[/latex]

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0).  State the null and alternative hypotheses.

[latex]\begin{eqnarray*} H_0: & & \mu=2  \mbox{ points} \\ \\ H_a: & & \mu \neq 2 \mbox{ points}  \end{eqnarray*}[/latex]

We want to test whether or not the mean height of eighth graders is 66 inches.  State the null and alternative hypotheses.

[latex]\begin{eqnarray*}  H_0: & & \mu=66 \mbox{ inches} \\ \\ H_a: & & \mu \neq 66 \mbox{ inches}  \end{eqnarray*}[/latex]

We want to test if college students take less than five years to graduate from college, on the average.  The null and alternative hypotheses are:

[latex]\begin{eqnarray*} H_0: & & \mu=5 \mbox{ years} \\ \\ H_a: & & \mu \lt 5 \mbox{ years}   \end{eqnarray*}[/latex]

We want to test if it takes fewer than 45 minutes to teach a lesson plan.  State the null and alternative hypotheses.

[latex]\begin{eqnarray*}  H_0: & & \mu=45 \mbox{ minutes} \\ \\ H_a: & & \mu \lt 45 \mbox{ minutes}  \end{eqnarray*}[/latex]

In an issue of U.S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass.  The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass.  Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%.  State the null and alternative hypotheses.

[latex]\begin{eqnarray*}  H_0: & & p=6.6\% \\ \\ H_a: & & p \gt 6.6\%  \end{eqnarray*}[/latex]

On a state driver’s test, about 40% pass the test on the first try.  We want to test if more than 40% pass on the first try.   State the null and alternative hypotheses.

[latex]\begin{eqnarray*}  H_0: & & p=40\% \\ \\ H_a: & & p \gt 40\%  \end{eqnarray*}[/latex]

Concept Review

In a  hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim.  If certain conditions about the sample are satisfied, then the claim can be evaluated for a population.  In a hypothesis test, we evaluate the null hypothesis , typically denoted with [latex]H_0[/latex]. The null hypothesis is not rejected unless the hypothesis test shows otherwise.  The null hypothesis always contain an equal sign ([latex]=[/latex]).  Always write the alternative hypothesis , typically denoted with [latex]H_a[/latex] or [latex]H_1[/latex], using less than, greater than, or not equals symbols ([latex]\lt[/latex], [latex]\gt[/latex], [latex]\neq[/latex]).  If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis.  But we can never state that a claim is proven true or false.  All we can conclude from the hypothesis test is which of the hypothesis is most likely true.  Because the underlying facts about hypothesis testing is based on probability laws, we can talk only in terms of non-absolute certainties.

Attribution

“ 9.1   Null and Alternative Hypotheses “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Logo for University of Missouri System

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7 Chapter 7: Introduction to Hypothesis Testing

alternative hypothesis

critical value

effect size

null hypothesis

probability value

rejection region

significance level

statistical power

statistical significance

test statistic

Type I error

Type II error

This chapter lays out the basic logic and process of hypothesis testing. We will perform z  tests, which use the z  score formula from Chapter 6 and data from a sample mean to make an inference about a population.

Logic and Purpose of Hypothesis Testing

A hypothesis is a prediction that is tested in a research study. The statistician R. A. Fisher explained the concept of hypothesis testing with a story of a lady tasting tea. Here we will present an example based on James Bond who insisted that martinis should be shaken rather than stirred. Let’s consider a hypothetical experiment to determine whether Mr. Bond can tell the difference between a shaken martini and a stirred martini. Suppose we gave Mr. Bond a series of 16 taste tests. In each test, we flipped a fair coin to determine whether to stir or shake the martini. Then we presented the martini to Mr. Bond and asked him to decide whether it was shaken or stirred. Let’s say Mr. Bond was correct on 13 of the 16 taste tests. Does this prove that Mr. Bond has at least some ability to tell whether the martini was shaken or stirred?

This result does not prove that he does; it could be he was just lucky and guessed right 13 out of 16 times. But how plausible is the explanation that he was just lucky? To assess its plausibility, we determine the probability that someone who was just guessing would be correct 13/16 times or more. This probability can be computed to be .0106. This is a pretty low probability, and therefore someone would have to be very lucky to be correct 13 or more times out of 16 if they were just guessing. So either Mr. Bond was very lucky, or he can tell whether the drink was shaken or stirred. The hypothesis that he was guessing is not proven false, but considerable doubt is cast on it. Therefore, there is strong evidence that Mr. Bond can tell whether a drink was shaken or stirred.

Let’s consider another example. The case study Physicians’ Reactions sought to determine whether physicians spend less time with obese patients. Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache. They were then asked to estimate how long they would spend with the patient. The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of average weight. The chart a particular physician viewed was determined randomly. Thirty-three physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients.

The mean time physicians reported that they would spend with obese patients was 24.7 minutes as compared to a mean of 31.4 minutes for normal-weight patients. How might this difference between means have occurred? One possibility is that physicians were influenced by the weight of the patients. On the other hand, perhaps by chance, the physicians who viewed charts of the obese patients tend to see patients for less time than the other physicians. Random assignment of charts does not ensure that the groups will be equal in all respects other than the chart they viewed. In fact, it is certain the groups differed in many ways by chance. The two groups could not have exactly the same mean age (if measured precisely enough such as in days). Perhaps a physician’s age affects how long the physician sees patients. There are innumerable differences between the groups that could affect how long they view patients. With this in mind, is it plausible that these chance differences are responsible for the difference in times?

To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference (31.4 − 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance. Using methods presented in later chapters, this probability can be computed to be .0057. Since this is such a low probability, we have confidence that the difference in times is due to the patient’s weight and is not due to chance.

The Probability Value

It is very important to understand precisely what the probability values mean. In the James Bond example, the computed probability of .0106 is the probability he would be correct on 13 or more taste tests (out of 16) if he were just guessing. It is easy to mistake this probability of .0106 as the probability he cannot tell the difference. This is not at all what it means.

The probability of .0106 is the probability of a certain outcome (13 or more out of 16) assuming a certain state of the world (James Bond was only guessing). It is not the probability that a state of the world is true. Although this might seem like a distinction without a difference, consider the following example. An animal trainer claims that a trained bird can determine whether or not numbers are evenly divisible by 7. In an experiment assessing this claim, the bird is given a series of 16 test trials. On each trial, a number is displayed on a screen and the bird pecks at one of two keys to indicate its choice. The numbers are chosen in such a way that the probability of any number being evenly divisible by 7 is .50. The bird is correct on 9/16 choices. We can compute that the probability of being correct nine or more times out of 16 if one is only guessing is .40. Since a bird who is only guessing would do this well 40% of the time, these data do not provide convincing evidence that the bird can tell the difference between the two types of numbers. As a scientist, you would be very skeptical that the bird had this ability. Would you conclude that there is a .40 probability that the bird can tell the difference? Certainly not! You would think the probability is much lower than .0001.

To reiterate, the probability value is the probability of an outcome (9/16 or better) and not the probability of a particular state of the world (the bird was only guessing). In statistics, it is conventional to refer to possible states of the world as hypotheses since they are hypothesized states of the world. Using this terminology, the probability value is the probability of an outcome given the hypothesis. It is not the probability of the hypothesis given the outcome.

This is not to say that we ignore the probability of the hypothesis. If the probability of the outcome given the hypothesis is sufficiently low, we have evidence that the hypothesis is false. However, we do not compute the probability that the hypothesis is false. In the James Bond example, the hypothesis is that he cannot tell the difference between shaken and stirred martinis. The probability value is low (.0106), thus providing evidence that he can tell the difference. However, we have not computed the probability that he can tell the difference.

The Null Hypothesis

The hypothesis that an apparent effect is due to chance is called the null hypothesis , written H 0 (“ H -naught”). In the Physicians’ Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:

introduction of alternative hypothesis

The null hypothesis in a correlational study of the relationship between high school grades and college grades would typically be that the population correlation is 0. This can be written as

introduction of alternative hypothesis

Although the null hypothesis is usually that the value of a parameter is 0, there are occasions in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the U.S., as our null value and test for differences against that.

For now, we will focus on testing a value of a single mean against what we expect from the population. Using birth weight as an example, our null hypothesis takes the form:

introduction of alternative hypothesis

Keep in mind that the null hypothesis is typically the opposite of the researcher’s hypothesis. In the Physicians’ Reactions study, the researchers hypothesized that physicians would expect to spend less time with obese patients. The null hypothesis that the two types of patients are treated identically is put forward with the hope that it can be discredited and therefore rejected. If the null hypothesis were true, a difference as large as or larger than the sample difference of 6.7 minutes would be very unlikely to occur. Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients.

In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relationship between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups. If we are trying to predict job performance, we want to find a relationship between conscientiousness and evaluation scores. However, until we have evidence against it, we must use the null hypothesis as our starting point.

The Alternative Hypothesis

If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, H A or H 1 . The alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form

introduction of alternative hypothesis

based on the research question itself. We should only use a directional hypothesis if we have good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative:

introduction of alternative hypothesis

We will set different criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative. To understand why, we need to see where our criteria come from and how they relate to z  scores and distributions.

Critical Values, p Values, and Significance Level

alpha

The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis; if not, we fail to reject the null (we never “accept” the null).

Figure 7.1. The rejection region for a one-tailed test. (“ Rejection Region for One-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

The rejection region is bounded by a specific z  value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value , z crit  (“ z  crit”), or z * (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z  score corresponding to any area under the curve as we did in Unit 1 . If we go to the normal table, we will find that the z  score corresponding to 5% of the area under the curve is equal to 1.645 ( z = 1.64 corresponds to .0505 and z = 1.65 corresponds to .0495, so .05 is exactly in between them) if we go to the right and −1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing and shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For a = .05, this means 2.5% of the area is in each tail, which, based on the z  table, corresponds to critical values of z * = ±1.96. This is shown in Figure 7.2 .

Figure 7.2. Two-tailed rejection region. (“ Rejection Region for Two-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

Thus, any z  score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z  scores in this way, the obtained value of z (sometimes called z  obtained and abbreviated z obt ) is something known as a test statistic , which is simply an inferential statistic used to test a null hypothesis. The formula for our z  statistic has not changed:

introduction of alternative hypothesis

Figure 7.3. Relationship between a , z obt , and p . (“ Relationship between alpha, z-obt, and p ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

When the null hypothesis is rejected, the effect is said to have statistical significance , or be statistically significant. For example, in the Physicians’ Reactions case study, the probability value is .0057. Therefore, the effect of obesity is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is important to keep in mind that statistical significance means only that the null hypothesis of exactly no effect is rejected; it does not mean that the effect is important, which is what “significant” usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is significant does not tell you about how large or important the effect is.

Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

The Hypothesis Testing Process

A four-step procedure.

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook and course, and although the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above and in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Step 3: calculate the test statistic and effect size.

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic—in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same. As part of this step, we will also calculate effect size to better quantify the magnitude of the difference between our groups. Although effect size is not considered part of hypothesis testing, reporting it as part of the results is approved convention.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example A Movie Popcorn

Our manager is looking for a difference in the mean weight of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

introduction of alternative hypothesis

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that a = .05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z  test at a = .05 are z * = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution, as shown in Figure 7.4 , so we can visualize the rejection region and make sure it makes sense.

Figure 7.4. Rejection region for z * = ±1.96. (“ Rejection Region z+-1.96 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average weight of this employee’s popcorn bags is M = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z :

introduction of alternative hypothesis

So our test statistic is z = −2.50, which we can draw onto our rejection region distribution as shown in Figure 7.5 .

Figure 7.5. Test statistic location. (“ Test Statistic Location z-2.50 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect size gives us an idea of how large, important, or meaningful a statistically significant effect is. For mean differences like we calculated here, our effect size is Cohen’s d :

introduction of alternative hypothesis

This is very similar to our formula for z , but we no longer take into account the sample size (since overly large samples can make it too easy to reject the null). Cohen’s d is interpreted in units of standard deviations, just like z . For our example:

introduction of alternative hypothesis

Cohen’s d is interpreted as small, moderate, or large. Specifically, d = 0.20 is small, d = 0.50 is moderate, and d = 0.80 is large. Obviously, values can fall in between these guidelines, so we should use our best judgment and the context of the problem to make our final interpretation of size. Our effect size happens to be exactly equal to one of these, so we say that there is a moderate effect.

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Any time you perform a hypothesis test, whether statistically significant or not, you should always calculate and report effect size.

Looking at Figure 7.5 , we can see that our obtained z  statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, −2.50 > −1.96, so we reject the null hypothesis. We can now write our conclusion:

Reject H 0 . Based on the sample of 25 bags, we can conclude that the average popcorn bag from this employee is smaller ( M = 7.75 cups) than the average weight of popcorn bags at this movie theater, and the effect size was moderate, z = −2.50, p < .05, d = 0.50.

Example B Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degrees Fahrenheit during the summer months but is allowed to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

introduction of alternative hypothesis

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

introduction of alternative hypothesis

You know that the most common level of significance is a  = .05, so you keep that the same and know that the critical value for a one-tailed z  test is z * = 1.645. To keep track of the directionality of the test and rejection region, you draw out your distribution as shown in Figure 7.6 .

Figure 7.6. Rejection region. (“ Rejection Region z1.645 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

Now that you have everything set up, you spend one week collecting temperature data:

Day

Temp

Monday

77

Tuesday

76

Wednesday

74

Thursday

78

Friday

78

introduction of alternative hypothesis

This value falls so far into the tail that it cannot even be plotted on the distribution ( Figure 7.7 )! Because the result is significant, you also calculate an effect size:

introduction of alternative hypothesis

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Figure 7.7. Obtained z statistic. (“ Obtained z5.77 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction of alternative hypothesis

You compare your obtained z  statistic, z = 5.77, to the critical value, z * = 1.645, and find that z > z *. Therefore you reject the null hypothesis, concluding:

Reject H 0 . Based on 5 observations, the average temperature ( M = 76.6 degrees) is statistically significantly higher than it is supposed to be, and the effect size was large, z = 5.77, p < .05, d = 2.60.

Example C Different Significance Level

Finally, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, a = .01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value:

introduction of alternative hypothesis

We will assume a two-tailed test:

introduction of alternative hypothesis

We have seen the critical values for z  tests at a = .05 levels of significance several times. To find the values for a = .01, we will go to the Standard Normal Distribution Table and find the z  score cutting off .005 (.01 divided by 2 for a two-tailed test) of the area in the tail, which is z * = ±2.575. Notice that this cutoff is much higher than it was for a = .05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic. We will use s = 10 as our known population standard deviation and the following data to calculate our sample mean:

introduction of alternative hypothesis

The average of these scores is M = 60.40. From this we calculate our z  statistic as:

introduction of alternative hypothesis

The Cohen’s d effect size calculation is:

introduction of alternative hypothesis

Our obtained z  statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Fail to reject H 0 . Based on the sample of 10 scores, we cannot conclude that there is an effect causing the mean ( M  = 60.40) to be statistically significantly different from 60.00, z = 0.13, p > .01, d = 0.04, and the effect size supports this interpretation.

Other Considerations in Hypothesis Testing

There are several other considerations we need to keep in mind when performing hypothesis testing.

Errors in Hypothesis Testing

In the Physicians’ Reactions case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type I error. More generally, a Type I error occurs when a significance test results in the rejection of a true null hypothesis.

The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error . Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.

A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called b (“beta”). The probability of correctly rejecting a false null hypothesis equals 1 − b and is called statistical power . Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the null).

Misconceptions in Hypothesis Testing

Misconceptions about significance testing are common. This section lists three important ones.

  • Misconception: The probability value ( p value) is the probability that the null hypothesis is false. Proper interpretation: The probability value ( p value) is the probability of a result as extreme or more extreme given that the null hypothesis is true. It is the probability of the data given the null hypothesis. It is not the probability that the null hypothesis is false.
  • Misconception: A low probability value indicates a large effect. Proper interpretation: A low probability value indicates that the sample outcome (or an outcome more extreme) would be very unlikely if the null hypothesis were true. A low probability value can occur with small effect sizes, particularly if the sample size is large.
  • Misconception: A non-significant outcome means that the null hypothesis is probably true. Proper interpretation: A non-significant outcome means that the data do not conclusively demonstrate that the null hypothesis is false.
  • In your own words, explain what the null hypothesis is.
  • What are Type I and Type II errors?
  • Why do we phrase null and alternative hypotheses with population parameters and not sample means?
  • Why do we state our hypotheses and decision criteria before we collect our data?
  • Why do you calculate an effect size?
  • z = 1.99, two-tailed test at a = .05
  • z = 0.34, z * = 1.645
  • p = .03, a = .05
  • p = .015, a = .01

Answers to Odd-Numbered Exercises

Your answer should include mention of the baseline assumption of no difference between the sample and the population.

Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.

We always calculate an effect size to see if our research is practically meaningful or important. NHST (null hypothesis significance testing) is influenced by sample size but effect size is not; therefore, they provide complimentary information.

introduction of alternative hypothesis

“ Null Hypothesis ” by Randall Munroe/xkcd.com is licensed under CC BY-NC 2.5 .)

introduction of alternative hypothesis

Introduction to Statistics in the Psychological Sciences Copyright © 2021 by Linda R. Cote Ph.D.; Rupa G. Gordon Ph.D.; Chrislyn E. Randell Ph.D.; Judy Schmitt; and Helena Marvin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

9.1 Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 : The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

H a : The alternative hypothesis: It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 . This is usually what the researcher is trying to prove.

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are "reject H 0 " if the sample information favors the alternative hypothesis or "do not reject H 0 " or "decline to reject H 0 " if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example 9.1

H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ .30 H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

Example 9.2

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are: H 0 : μ = 2.0 H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 66
  • H a : μ __ 66

Example 9.3

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are: H 0 : μ ≥ 5 H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 45
  • H a : μ __ 45

Example 9.4

In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses. H 0 : p ≤ 0.066 H a : p > 0.066

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p __ 0.40
  • H a : p __ 0.40

Collaborative Exercise

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics-2e/pages/9-1-null-and-alternative-hypotheses

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Crit Care Med
  • v.23(Suppl 3); 2019 Sep

An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors

Priya ranganathan.

1 Department of Anesthesiology, Critical Care and Pain, Tata Memorial Hospital, Mumbai, Maharashtra, India

2 Department of Surgical Oncology, Tata Memorial Centre, Mumbai, Maharashtra, India

The second article in this series on biostatistics covers the concepts of sample, population, research hypotheses and statistical errors.

How to cite this article

Ranganathan P, Pramesh CS. An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23(Suppl 3):S230–S231.

Two papers quoted in this issue of the Indian Journal of Critical Care Medicine report. The results of studies aim to prove that a new intervention is better than (superior to) an existing treatment. In the ABLE study, the investigators wanted to show that transfusion of fresh red blood cells would be superior to standard-issue red cells in reducing 90-day mortality in ICU patients. 1 The PROPPR study was designed to prove that transfusion of a lower ratio of plasma and platelets to red cells would be superior to a higher ratio in decreasing 24-hour and 30-day mortality in critically ill patients. 2 These studies are known as superiority studies (as opposed to noninferiority or equivalence studies which will be discussed in a subsequent article).

SAMPLE VERSUS POPULATION

A sample represents a group of participants selected from the entire population. Since studies cannot be carried out on entire populations, researchers choose samples, which are representative of the population. This is similar to walking into a grocery store and examining a few grains of rice or wheat before purchasing an entire bag; we assume that the few grains that we select (the sample) are representative of the entire sack of grains (the population).

The results of the study are then extrapolated to generate inferences about the population. We do this using a process known as hypothesis testing. This means that the results of the study may not always be identical to the results we would expect to find in the population; i.e., there is the possibility that the study results may be erroneous.

HYPOTHESIS TESTING

A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the “alternate” hypothesis, and the opposite is called the “null” hypothesis; every study has a null hypothesis and an alternate hypothesis. For superiority studies, the alternate hypothesis states that one treatment (usually the new or experimental treatment) is superior to the other; the null hypothesis states that there is no difference between the treatments (the treatments are equal). For example, in the ABLE study, we start by stating the null hypothesis—there is no difference in mortality between groups receiving fresh RBCs and standard-issue RBCs. We then state the alternate hypothesis—There is a difference between groups receiving fresh RBCs and standard-issue RBCs. It is important to note that we have stated that the groups are different, without specifying which group will be better than the other. This is known as a two-tailed hypothesis and it allows us to test for superiority on either side (using a two-sided test). This is because, when we start a study, we are not 100% certain that the new treatment can only be better than the standard treatment—it could be worse, and if it is so, the study should pick it up as well. One tailed hypothesis and one-sided statistical testing is done for non-inferiority studies, which will be discussed in a subsequent paper in this series.

STATISTICAL ERRORS

There are two possibilities to consider when interpreting the results of a superiority study. The first possibility is that there is truly no difference between the treatments but the study finds that they are different. This is called a Type-1 error or false-positive error or alpha error. This means falsely rejecting the null hypothesis.

The second possibility is that there is a difference between the treatments and the study does not pick up this difference. This is called a Type 2 error or false-negative error or beta error. This means falsely accepting the null hypothesis.

The power of the study is the ability to detect a difference between groups and is the converse of the beta error; i.e., power = 1-beta error. Alpha and beta errors are finalized when the protocol is written and form the basis for sample size calculation for the study. In an ideal world, we would not like any error in the results of our study; however, we would need to do the study in the entire population (infinite sample size) to be able to get a 0% alpha and beta error. These two errors enable us to do studies with realistic sample sizes, with the compromise that there is a small possibility that the results may not always reflect the truth. The basis for this will be discussed in a subsequent paper in this series dealing with sample size calculation.

Conventionally, type 1 or alpha error is set at 5%. This means, that at the end of the study, if there is a difference between groups, we want to be 95% certain that this is a true difference and allow only a 5% probability that this difference has occurred by chance (false positive). Type 2 or beta error is usually set between 10% and 20%; therefore, the power of the study is 90% or 80%. This means that if there is a difference between groups, we want to be 80% (or 90%) certain that the study will detect that difference. For example, in the ABLE study, sample size was calculated with a type 1 error of 5% (two-sided) and power of 90% (type 2 error of 10%) (1).

Table 1 gives a summary of the two types of statistical errors with an example

Statistical errors

(a) Types of statistical errors
: Null hypothesis is
TrueFalse
Null hypothesis is actuallyTrueCorrect results!Falsely rejecting null hypothesis - Type I error
FalseFalsely accepting null hypothesis - Type II errorCorrect results!
(b) Possible statistical errors in the ABLE trial
There is difference in mortality between groups receiving fresh RBCs and standard-issue RBCsThere difference in mortality between groups receiving fresh RBCs and standard-issue RBCs
TruthThere is difference in mortality between groups receiving fresh RBCs and standard-issue RBCsCorrect results!Falsely rejecting null hypothesis - Type I error
There difference in mortality between groups receiving fresh RBCs and standard-issue RBCsFalsely accepting null hypothesis - Type II errorCorrect results!

In the next article in this series, we will look at the meaning and interpretation of ‘ p ’ value and confidence intervals for hypothesis testing.

Source of support: Nil

Conflict of interest: None

  • Neuroscience

Reevaluating the Neural Noise Hypothesis in Dyslexia: Insights from EEG and 7T MRS Biomarkers

Agnieszka glica, katarzyna wasilewska, julia jurkowska, jarosław żygierewicz, bartosz kossowski.

  • Katarzyna Jednoróg author has email address
  • Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteur 3 Street, 02-093 Warsaw, Poland
  • Faculty of Physics, University of Warsaw, Pasteur 5 Street, 02-093 Warsaw, Poland
  • Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteur 3 Street, 02-093 Warsaw, Poland
  • https://doi.org/ 10.7554/eLife.99920.1
  • Open access
  • Copyright information

The neural noise hypothesis of dyslexia posits an imbalance between excitatory and inhibitory (E/I) brain activity as an underlying mechanism of reading difficulties. This study provides the first direct test of this hypothesis using both indirect EEG power spectrum measures in 120 Polish adolescents and young adults (60 with dyslexia, 60 controls) and direct glutamate (Glu) and gamma-aminobutyric acid (GABA) concentrations from magnetic resonance spectroscopy (MRS) at 7T MRI scanner in half of the sample. Our results, supported by Bayesian statistics, show no evidence of E/I balance differences between groups, challenging the hypothesis that cortical hyperexcitability underlies dyslexia. These findings suggest alternative mechanisms must be explored and highlight the need for further research into the E/I balance and its role in neurodevelopmental disorders.

eLife assessment

The authors combined neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) measures to empirically evaluate the neural noise hypothesis of developmental dyslexia. Their results are solid , supported by consistent findings from the two complementary methodologies and Bayesian statistics. Additional analyses, particularly on the neurochemical measures, are necessary to further substantiate the results. This study is useful for understanding the neural mechanisms of dyslexia and neural development in general.

  • https://doi.org/ 10.7554/eLife.99920.1.sa3
  • Read the peer reviews
  • About eLife assessments

Introduction

According to the neural noise hypothesis of dyslexia, reading difficulties stem from an imbalance between excitatory and inhibitory (E/I) neural activity ( Hancock et al., 2017 ). The hypothesis predicts increased cortical excitation leading to more variable and less synchronous neural firing. This instability supposedly results in disrupted sensory representations and impedes phonological awareness and multisensory integration skills, crucial for learning to read ( Hancock et al., 2017 ). Yet, studies testing this hypothesis are lacking.

The non-invasive measurement of the E/I balance can be derived through assessment of glutamate (Glu) and gamma-aminobutyric acid (GABA) neurotransmitters concentration via magnetic resonance spectroscopy (MRS) ( Finkelman et al., 2022 ) or through global, indirect estimations from the electroencephalography (EEG) signal ( Ahmad et al., 2022 ).

Direct measurements of Glu and GABA yielded conflicting findings. Higher Glu concentrations in the midline occipital cortex correlated with poorer reading performance in children ( Del Tufo et al., 2018 ; Pugh et al., 2014 ), while elevated Glu levels in the anterior cingulate cortex (ACC) corresponded to greater phonological skills ( Lebel et al., 2016 ). Elevated GABA in the left inferior frontal gyrus was linked to reduced verbal fluency in adults ( Nakai and Okanoya, 2016 ), and increased GABA in the midline occipital cortex in children was associated with slower reaction times in a linguistic task ( Del Tufo et al., 2018 ). However, notable null findings exist regarding dyslexia status and Glu levels in the ACC among children ( Horowitz-Kraus et al., 2018 ) as well as Glu and GABA levels in the visual and temporo-parietal cortices in both children and adults ( Kossowski et al., 2019 ).

Both beta (∼13-28 Hz) and gamma (> 30 Hz) oscillations may serve as E/I balance indicators ( Ahmad et al., 2022 ), as greater GABA-ergic activity has been associated with greater beta power ( Jensen et al., 2005 ; Porjesz et al., 2002 ) and gamma power or peak frequency ( Brunel and Wang, 2003 ; Chen et al., 2017 ). Resting-state analyses often reported nonsignificant beta power associations with dyslexia ( Babiloni et al., 2012 ; Fraga González et al., 2018 ; Xue et al., 2020 ), however, one study indicated lower beta power in dyslexic compared to control boys ( Fein et al., 1986 ). Mixed results were also observed during tasks. One study found decreased beta power in the dyslexic group ( Spironelli et al., 2008 ), while the other increased beta power relative to the control group ( Rippon and Brunswick, 2000 ). Insignificant relationship between resting gamma power and dyslexia was reported ( Babiloni et al., 2012 ; Lasnick et al., 2023 ). When analyzing auditory steady-state responses, the dyslexic group had a lower gamma peak frequency, while no significant differences in gamma power were observed ( Rufener and Zaehle, 2021 ). Essentially, the majority of studies in dyslexia examining gamma frequencies evaluated cortical entrainment to auditory stimuli ( Lehongre et al., 2011 ; Marchesotti et al., 2020 ; Van Hirtum et al., 2019 ). Therefore, the results from these tasks do not provide direct evidence of differences in either gamma power or peak frequency between the dyslexic and control groups.

The EEG signal comprises both oscillatory, periodic activity, and aperiodic activity, characterized by a gradual decrease in power as frequencies rise (1/f signal) ( Donoghue et al., 2020 ). Recently recognized as a biomarker of E/I balance, a lower exponent of signal decay (flatter slope) indicates a greater dominance of excitation over inhibition in the brain, as shown by the simulation models of local field potentials, ratio of AMPA/GABA a synapses in the rat hippocampus ( Gao et al., 2017 ) and recordings under propofol or ketamine in macaques and humans ( Gao et al., 2017 ; Waschke et al., 2021 ). However, there are also pharmacological studies providing mixed results ( Colombo et al., 2019 ; Salvatore et al., 2024 ). Nonetheless, the 1/f signal has shown associations with various conditions putatively characterized by changes in E/I balance, such as early development in infancy ( Schaworonkow and Voytek, 2021 ), healthy aging ( Voytek et al., 2015 ) and neurodevelopmental disorders like ADHD ( Ostlund et al., 2021 ), autism spectrum disorder ( Manyukhina et al., 2022 ) or schizophrenia ( Molina et al., 2020 ). Despite its potential relevance, the evaluation of the 1/f signal in dyslexia remains limited to one study, revealing flatter slopes among dyslexic compared to control participants at rest ( Turri et al., 2023 ), thereby lending support to the notion of neural noise in dyslexia.

Here, we examined both indirect (1/f signal, beta, and gamma oscillations during both rest and a spoken language task) and direct (Glu and GABA) biomarkers of E/I balance in participants with dyslexia and age-matched controls. The neural noise hypothesis predicts flatter slopes of 1/f signal, decreased beta and gamma power, and higher Glu concentrations in the dyslexic group. Furthermore, we tested the relationships between different E/I measures. Flatter slopes of 1/f signal should be related to higher Glu level, while enhanced beta and gamma power to increased GABA level.

No evidence for group differences in the EEG E/I biomarkers

We recruited 120 Polish adolescents and young adults – 60 with dyslexia diagnosis and 60 controls matched in sex, age, and family socio-economic status. The dyslexic group scored lower in all reading and reading-related tasks and higher in the Polish version of the Adult Reading History Questionnaire (ARHQ-PL) ( Bogdanowicz et al., 2015 ),where a higher score indicates a higher risk of dyslexia (see Table S1 in the Supplementary Material). Although all participants were within the intellectual norm, the dyslexic group scored lower on the IQ scale (including nonverbal subscale only) than the control group. However, the Bayesian statistics did not provide evidence for the difference between groups in the nonverbal IQ.

We analyzed the aperiodic (exponent and offset) components of the EEG signal at rest and during a spoken language task, where participants listened to a sentence and had to indicate its veracity. Due to a technical error, the signal from one person (a female from the dyslexic group) was not recorded during most of the language task and was excluded from the analyses. Hence, the results are provided for 119 participants – 59 in the dyslexic and 60 in the control group.

First, aperiodic parameter values were averaged across all electrodes and compared between groups (dyslexic, control) and conditions (resting state, language task) using a 2×2 repeated measures ANOVA. Age negatively correlated both with the exponent ( r = -.27, p = .003, BF 10 = 7.96) and offset ( r = -.40, p < .001, BF 10 = 3174.29) in line with previous investigations ( Cellier et al., 2021 ; McSweeney et al., 2021 ; Schaworonkow and Voytek, 2021 ; Voytek et al., 2015 ), therefore we included age as a covariate. Post-hoc tests are reported with Bonferroni corrected p -values.

For the mean exponent, we found a significant effect of age ( F (1,116) = 8.90, p = .003, η 2 p = .071, BF incl = 10.47), while the effects of condition ( F (1,116) = 2.32, p = .131, η 2 p = .020, BF incl = 0.39) and group ( F (1,116) = 0.08, p = .779, η 2 p = .001, BF incl = 0.40) were not significant and Bayes Factor did not provide evidence for either inclusion or exclusion. Interaction between group and condition ( F (1,116) = 0.16, p = .689, η 2 p = .001, BF incl = 0.21) was not significant and Bayes Factor indicated against including it in the model.

For the mean offset, we found significant effects of age ( F (1,116) = 22.57, p < .001, η 2 p = .163, BF incl = 1762.19) and condition ( F (1,116) = 23.04, p < .001, η 2 p = .166, BF incl > 10000) with post-hoc comparison indicating that the offset was lower in the resting state condition ( M = -10.80, SD = 0.21) than in the language task ( M = -10.67, SD = 0.26, p corr < .001). The effect of group ( F (1,116) = 0.00, p = .964, η 2 p = .000, BF incl = 0.54) was not significant while Bayes Factor did not provide evidence for either inclusion or exclusion. Interaction between group and condition was not significant ( F (1,116) = 0.07, p = .795, η 2 p = .001, BF incl = 0.22) and Bayes Factor indicated against including it in the model.

Next, we restricted analyses to language regions and averaged exponent and offset values from the frontal electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), as well as temporal electrodes, corresponding to the left (T7, TP7, TP9) and right superior temporal sulcus, STS (T8, TP8, TP10)( Giacometti et al., 2014 )( Scrivener and Reader, 2022 ). A 2×2×2×2 (group, condition, hemisphere, region) repeated measures ANOVA with age as a covariate was applied. Power spectra from the left STS at rest and during the language task are presented in Figure 1A and C , while the results for the exponent, offset, and beta power are presented in Figure 1B and D .

introduction of alternative hypothesis

Overview of the main results obtained in the study. (A) Power spectral densities averaged across 3 electrodes (T7, TP7, TP9) corresponding to the left superior temporal sulcus (STS) separately for dyslexic (DYS) and control (CON) groups at rest and (C) during the language task. (B) Plots illustrating results for the exponent, offset, and the beta power from the left STS electrodes at rest and (D ) during the language task. (E) Group results (CON > DYS) from the fMRI localizer task for words compared to the control stimuli (p < .05 FWE cluster threshold) and overlap of the MRS voxel placement across participants. (F) MRS spectra separately for DYS and CON groups. (G) Plots illustrating results for the Glu, GABA, Glu/GABA ratio and the Glu/GABA imbalance. (H ) Semi-partial correlation between offset at rest (left STS electrodes) and Glu controlling for age and gray matter volume (GMV).

For the exponent, there were significant effects of age ( F (1,116) = 14.00, p < .001, η 2 p = .108, BF incl = 11.46) and condition F (1,116) = 4.06, p = .046, η 2 p = .034, BF incl = 1.88), however, Bayesian statistics did not provide evidence for either including or excluding the condition factor. Furthermore, post-hoc comparisons did not reveal significant differences between the exponent at rest ( M = 1.51, SD = 0.17) and during the language task ( M = 1.51, SD = 0.18, p corr = .546). There was also a significant interaction between region and group, although Bayes Factor indicated against including it in the model ( F (1,116) = 4.44, p = .037, η 2 p = .037, BF incl = 0.25). Post-hoc comparisons indicated that the exponent was higher in the frontal than in the temporal region both in the dyslexic ( M frontal = 1.54, SD frontal = 0.15, M temporal = 1.49, SD temporal = 0.18, p corr < .001) and in the control group ( M frontal = 1.54, SD frontal = 0.17, M temporal = 1.46, SD temporal = 0.20, p corr < .001). The difference between groups was not significant either in the frontal ( p corr = .858) or temporal region ( p corr = .441). The effects of region ( F (1,116) = 1.17, p = .282, η 2 p = .010, BF incl > 10000) and hemisphere ( F (1,116) = 1.17, p = .282, η 2 p = .010, BF incl = 12.48) were not significant, although Bayesian statistics indicated in favor of including them in the model. Furthermore, the interactions between condition and group ( F (1,116) = 0.18, p = .673, η 2 p = .002, BF incl = 3.70), and between region, hemisphere, and condition ( F (1,116) = 0.11, p = .747, η 2 p = .001, BF incl = 7.83) were not significant, however Bayesian statistics indicated in favor of including these interactions in the model. The effect of group ( F (1,116) = 0.12, p = .733, η 2 p = .001, BF incl = 1.19) was not significant, while Bayesian statistics did not provide evidence for either inclusion or exclusion. Any other interactions were not significant and Bayes Factor indicated against including them in the model.

In the case of offset, there were significant effects of condition ( F (1,116) = 20.88, p < .001, η 2 p = .153, BF incl > 10000) and region ( F (1,116) = 6.18, p = .014, η 2 p = .051, BF incl > 10000). For the main effect of condition, post-hoc comparison indicated that the offset was lower in the resting state condition ( M = -10.88, SD = 0.33) than in the language task ( M = -10.76, SD = 0.38, p corr < .001), while for the main effect of region, post-hoc comparison indicated that the offset was lower in the temporal ( M = -10.94, SD = 0.37) as compared to the frontal region ( M = -10.69, SD = 0.34, p corr < .001). There was also a significant effect of age ( F (1,116) = 20.84, p < .001, η 2 p = .152, BF incl = 0.23) and interaction between condition and hemisphere, ( F (1,116) = 4.35, p = .039, η 2 p = .036, BF incl = 0.21), although Bayes Factor indicated against including these factors in the model. Post-hoc comparisons for the condition*hemisphere interaction indicated that the offset was lower in the resting state condition than in the language task both in the left ( M rest = -10.85, SD rest = 0.34, M task = -10.73, SD task = 0.40, p corr < .001) and in the right hemisphere ( M rest = -10.91, SD rest = 0.31, M task = -10.79, SD task = 0.37, p corr < .001) and that the offset was lower in the right as compared to the left hemisphere both at rest ( p corr < .001) and during the language task ( p corr < .001). The interactions between region and condition ( F (1,116) = 1.76, p = .187, η 2 p = .015, BF incl > 10000), hemisphere and group ( F (1,116) = 1.58, p = .211, η 2 p = .013, BF incl = 1595.18), region and group ( F (1,116) = 0.27, p = .605, η 2 p = .002, BF incl = 9.32), as well as between region, condition, and group ( F (1,116) = 0.21, p = .651, η 2 p = .002, BF incl = 2867.18) were not significant, although Bayesian statistics indicated in favor of including them in the model. The effect of group ( F (1,116) = 0.18, p = .673, η 2 p = .002, BF incl < 0.00001) was not significant and Bayesian statistics indicated against including it in the model. Any other interactions were not significant and Bayesian statistics indicated against including them in the model or did not provide evidence for either inclusion or exclusion.

Then, we analyzed the aperiodic-adjusted brain oscillations. Since the algorithm did not find the gamma peak (30-43 Hz) above the aperiodic component in the majority of participants, we report the results only for the beta (14-30 Hz) power. We performed a similar regional analysis as for the exponent and offset with a 2×2×2×2 (group, condition, hemisphere, region) repeated measures ANOVA. However, we did not include age as a covariate, as it did not correlate with any of the periodic measures. The sample size was 117 (DYS n = 57, CON n = 60) since in 2 participants the algorithm did not find the beta peak above the aperiodic component in the left frontal electrodes during the task.

The analysis revealed a significant effect of condition ( F (1,115) = 8.58, p = .004, η 2 p = .069, BF incl = 5.82) with post-hoc comparison indicating that the beta power was greater during the language task ( M = 0.53, SD = 0.22) than at rest ( M = 0.50, SD = 0.19, p corr = .004). There were also significant effects of region ( F (1,115) = 10.98, p = .001, η 2 p = .087, BF incl = 23.71), and hemisphere ( F (1,115) = 12.08, p < .001, η 2 p = .095, BF incl = 23.91). For the main effect of region, post-hoc comparisons indicated that the beta power was greater in the temporal ( M = 0.52, SD = 0.21) as compared to the frontal region ( M = 0.50, SD = 0.19, p corr = .001), while for the main effect of hemisphere, post-hoc comparisons indicated that the beta power was greater in the right ( M = 0.52, SD = 0.20) than in the left hemisphere ( M = 0.51, SD = 0.20, p corr < .001). There was a significant interaction between condition and region ( F (1,115) = 12.68, p < .001, η 2 p = .099, BF incl = 55.26) with greater beta power during the language task as compared to rest significant in the temporal ( M rest = 0.50, SD rest = 0.20, M task = 0.55, SD task = 0.24, p corr < .001), while not in the frontal region ( M rest = 0.49, SD rest = 0.18, M task = 0.51, SD task = 0.22, p corr = .077). Also, greater beta power in the temporal as compared to the frontal region was significant during the language task ( p corr < .001), while not at rest ( p corr = .283). The effect of group ( F (1,115) = 0.05, p = .817, η 2 p = .000, BF incl < 0.00001) was not significant and Bayes Factor indicated against including it in the model. Any other interactions were not significant and Bayesian statistics indicated against including them in the model or did not provide evidence for either inclusion or exclusion.

Additionally, building upon previous findings which demonstrated differences in dyslexia in aperiodic and periodic components within the parieto-occipital region ( Turri et al., 2023 ), we have included analyses for the same cluster of electrodes in the Supplementary Material. However, in this region, we also did not find evidence for group differences either in the exponent, offset or beta power.

No evidence for group differences in Glu and GABA concentrations in the left STS

In total, 59 out of 120 participants underwent MRS session at 7T MRI scanner - 29 from the dyslexic group (13 females, 16 males) and 30 from the control group (14 females, 16 males). The MRS voxel was placed in the left STS, in a region showing highest activation for both visual and auditory words (compared to control stimuli) localized individually in each participant, based on an fMRI task (see Figure 1E for overlap of the MRS voxel placement across participants and Figure 1F for MRS spectra). We decided to analyze the neurometabolites’ levels derived from the left STS, as this region is consistently related to functional and structural differences in dyslexia across languages ( Yan et al., 2021 ).

Due to insufficient magnetic homogeneity or interruption of the study by the participants, 5 participants from the dyslexic group had to be excluded. We excluded further 4 participants due to poor quality of the obtained spectra thus the results for Glu are reported for 50 participants - 21 in the dyslexic (12 females, 9 males) and 29 in the control group (13 females, 16 males). In the case of GABA, we additionally excluded 3 participants based on the Cramér-Rao Lower Bounds (CRLB) > 20%. Therefore, the results for GABA, Glu/GABA ratio and Glu/GABA imbalance are reported for 47 participants - 20 in the dyslexic (12 females, 8 males) and 27 in the control group (11 females, 16 males). Demographic and behavioral characteristics for the subsample of 47 participants are provided in the Table S2.

For each metabolite, we performed a separate univariate ANCOVA with the effect of group being tested and voxel’s gray matter volume (GMV) as a covariate (see Figure 1G ). For the Glu analysis, we also included age as a covariate, due to negative correlation between variables ( r = -.35, p = .014, BF 10 = 3.41). The analysis revealed significant effect of GMV ( F (1,46) = 8.18, p = .006, η 2 p = .151, BF incl = 12.54), while the effects of age ( F (1,46) = 3.01, p = .090, η 2 p = .061, BF incl = 1.15) and group ( F (1,46) = 1.94, p = .170, 1 = .040, BF incl = 0.63) were not significant and Bayes Factor did not provide evidence for either inclusion or exclusion.

Conversely, GABA did not correlate with age ( r = -.11, p = .481, BF 10 = 0.23), thus age was not included as a covariate. The analysis revealed a significant effect of GMV ( F (1,44) = 4.39, p = .042, η 2 p = .091, BF incl = 1.64), however Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.49, p = .490, η 2 p = .011, BF incl = 0.35) although Bayesian statistics did not provide evidence for either inclusion or exclusion.

Also, Glu/GABA ratio did not correlate with age ( r = -.05, p = .744, BF 10 = 0.19), therefore age was not included as a covariate. The results indicated that the effect of GMV was not significant ( F (1,44) = 0.95, p = .335, η 2 p = .021, BF incl = 0.43) while Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.01, p = .933, η 2 p = .000, BF incl = 0.29) and Bayes Factor indicated against including it in the model.

Following a recent study examining developmental changes in both EEG and MRS E/I biomarkers ( McKeon et al., 2024 ), we calculated an additional measure of Glu/GABA imbalance, computed as the absolute residual value from the linear regression of Glu predicted by GABA with greater values indicating greater Glu/GABA imbalance. Alike the previous work ( McKeon et al., 2024 ), we took the square root of this value to ensure a normal distribution of the data. This measure did not correlate with age ( r = -.05, p = .719, BF 10 = 0.19); thus, age was not included as a covariate. The results indicated that the effect of GMV was not significant ( F (1,44) = 0.63, p = .430, η 2 p = .014, BF incl = 0.37) while Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.74, p = .396, η 2 p = .016, BF incl = 0.39) although Bayesian statistics did not provide evidence for either inclusion or exclusion.

Correspondence between Glu and GABA concentrations and EEG E/I biomarkers is limited

Next, we investigated correlations between Glu and GABA concentrations in the left STS and EEG markers of E/I balance. Semi-partial correlations were performed ( Table 1 ) to control for confounding variables - for Glu the effects of age and GMV were regressed, for GABA, Glu/GABA ratio and Glu/GABA imbalance the effect of GMV was regressed, while for exponents and offsets the effect of age was regressed. For zero-order correlations between variables see Table S3.

introduction of alternative hypothesis

Semi-partial Correlations Between Direct and Indirect Markers of Excitatory-Inhibitory Balance. For Glu the Effects of Age and Gray Matter Volume (GMV) Were Regressed, for GABA, Glu/GABA Ratio and Glu/GABA Imbalance the Effect of GMV was Regressed, While for Exponents and Offsets the Effect of Age was Regressed

Glu negatively correlated with offset in the left STS both at rest ( r = -.38, p = .007, BF 10 = 6.28; Figure 1H ) and during the language task ( r = -.37, p = .009, BF 10 = 5.05), while any other correlations between Glu and EEG markers were not significant and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis. Furthermore, Glu/GABA imbalance positively correlated with exponent at rest both averaged across all electrodes ( r = .29, p = .048, BF 10 = 1.21), as well as in the left STS electrodes ( r = .35, p = .017, BF 10 = 2.87) although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Conversely, GABA and Glu/GABA ratio were not significantly correlated with any of the EEG markers and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis.

Testing the paths from neural noise to reading

The neural noise hypothesis of dyslexia predicts impact of the neural noise on reading through the impairment of 1) phonological awareness, 2) lexical access and generalization and 3) multisensory integration ( Hancock et al., 2017 ). Therefore, we analyzed correlations between these variables, reading skills and direct and indirect markers of E/I balance. For the composite score of phonological awareness, we averaged z-scores from phoneme deletion, phoneme and syllable spoonerisms tasks. For the composite score of lexical access and generalization we averaged z-scores from objects, colors, letters and digits subtests from rapid automatized naming (RAN) task, while for the composite score of reading we averaged z-scores from words and pseudowords read per minute, and text reading time in reading comprehension task. The outcomes from the RAN and reading comprehension task have been transformed from raw time scores to items/time scores in order to provide the same direction of relationships for all z-scored measures, with greater values indicating better skills. For the multisensory integration score we used results from the redundant target effect task reported in our previous work ( Glica et al., 2024 ), with greater values indicating a greater magnitude of multisensory integration.

Age positively correlated with multisensory integration ( r = .38, p < .001, BF 10 = 87.98), composite scores of reading ( r = .22, p = .014, BF 10 = 2.24) and phonological awareness ( r = .21, p = .021, BF 10 = 1.59), while not with the composite score of RAN ( r = .13, p = .151, BF 10 = 0.32). Hence, we regressed the effect of age from multisensory integration, reading and phonological awareness scores and performed semi-partial correlations ( Table 2 , for zero-order correlations see Table S4).

introduction of alternative hypothesis

Semi-partial Correlations Between Reading, Phonological Awareness, Rapid Automatized Naming, Multisensory Integration and Markers of Excitatory-Inhibitory Balance. For Reading, Phonological Awareness and Multisensory Integration the Effect of Age was Regressed, for Glu the Effects of Age and Gray Matter Volume (GMV) Were Regressed, for GABA, Glu/GABA Ratio and Glu/GABA Imbalance the Effect of GMV was Regressed, While for Exponents and Offsets the Effect of Age was Regressed

Phonological awareness positively correlated with offset in the left STS at rest ( r = .18, p = .049, BF 10 = 0.77) and with beta power in the left STS both at rest ( r = .23, p = .011, BF 10 = 2.73; Figure 2A ) and during the language task ( r = .23, p = .011, BF 10 = 2.84; Figure 2B ), although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Furthermore, multisensory integration positively correlated with GABA concentration ( r = .31, p = .034, BF 10 = 1.62) and negatively with Glu/GABA ratio ( r = -.32, p = .029, BF 10 = 1.84), although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Any other correlations between reading skills and E/I balance markers were not significant and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis.

introduction of alternative hypothesis

Associations between beta power, phonological awareness and reading. (A) Semi-partial correlation between phonological awareness controlling for age and beta power (in the left STS electrodes) at rest and (B) during the language task. (C) Partial correlation between phonological awareness and reading controlling for age. (D) Mediation analysis results. Unstandardized b regression coefficients are presented. Age was included in the analysis as a covariate. 95% CI - 95% confidence intervals. left STS - values averaged across 3 electrodes corresponding to the left superior temporal sulcus (T7, TP7, TP9).

Given that beta power correlated with phonological awareness, and considering the prediction that neural noise impedes reading by affecting phonological awareness — we examined this relationship through a mediation model. Since phonological awareness correlated with beta power in the left STS both at rest and during language task, the outcomes from these two conditions were averaged prior to the mediation analysis. Macro PROCESS v4.2 ( Hayes, 2017 ) on IBM SPSS Statistics v29 with model 4 (simple mediation) with 5000 Bootstrap samples to assess the significance of indirect effect was employed. Since age correlated both with phonological awareness and reading, we also included age as a covariate.

The results indicated that both effects of beta power in the left STS ( b = .96, t (116) = 2.71, p = .008, BF incl = 7.53) and age ( b = .06, t (116) = 2.55, p = .012, BF incl = 5.98) on phonological awareness were significant. The effect of phonological awareness on reading was also significant ( b = .69, t (115) = 8.16, p < .001, BF incl > 10000), while the effects of beta power ( b = -.42, t (115) = -1.25, p = .213, BF incl = 0.52) and age ( b = .03, t (115) = 1.18, p = .241, BF incl = 0.49) on reading were not significant when controlling for phonological awareness. Finally, the indirect effect of beta power on reading through phonological awareness was significant ( b = .66, SE = .24, 95% CI = [.24, 1.18]), while the total effect of beta power was not significant ( b = .24, t (116) = 0.61, p = .546, BF incl = 0.41). The results from the mediation analysis are presented in Figure 2D .

Although similar mediation analysis could have been conducted for the Glu/GABA ratio, multisensory integration, and reading based on the correlations between these variables, we did not test this model due to the small sample size (47 participants), which resulted in insufficient statistical power.

The current study aimed to validate the neural noise hypothesis of dyslexia ( Hancock et al., 2017 ) utilizing E/I balance biomarkers from EEG power spectra and ultra-high-field MRS. Contrary to its predictions, we did not observe differences either in 1/f slope, beta power, or Glu and GABA concentrations in participants with dyslexia. Relations between E/I balance biomarkers were limited to significant correlations between Glu and the offset when controlling for age, and between Glu/GABA imbalance and the exponent.

In terms of indirect markers, our study found no evidence of group differences in the aperiodic components of the EEG signal. In most of the models, we did not find evidence for either including or excluding the effect of the group when Bayesian statistics were evaluated. The only exception was the regional analysis for the offset, where results indicated against including the group factor in the model. These findings diverge from previous research on an Italian cohort, which reported decreased exponent and offset in the dyslexic group at rest, specifically within the parieto-occipital region, but not the frontal region ( Turri et al., 2023 ). Despite our study involving twice the number of participants and utilizing a longer acquisition time, we observed no group differences, even in the same cluster of electrodes (refer to Supplementary Material). The participants in both studies were of similar ages. The only methodological difference – EEG acquisition with eyes open in our study versus both eyes-open and eyes-closed in the work by Turri and colleagues (2023) – cannot fully account for the overall lack of group differences observed. The diverging study outcomes highlight the importance of considering potential inflation of effect sizes in studies with smaller samples.

Although a lower exponent of the EEG power spectrum has been associated with other neurodevelopmental disorders, such as ADHD ( Ostlund et al., 2021 ) or ASD (but only in children with IQ below average) ( Manyukhina et al., 2022 ), our study suggests that this is not the case for dyslexia. Considering the frequent comorbidity of dyslexia and ADHD ( Germanò et al., 2010 ; Langer et al., 2019 ), increased neural noise could serve as a common underlying mechanism for both disorders. However, our specific exclusion of participants with a comorbid ADHD diagnosis indicates that the EEG spectral exponent cannot serve as a neurobiological marker for dyslexia in isolation. No information regarding such exclusion criteria was provided in the study by Turri et al. (2023) ; thus, potential comorbidity with ADHD may explain the positive findings related to dyslexia reported therein.

Regarding the aperiodic-adjusted oscillatory EEG activity, Bayesian statistics for beta power, indicated in favor of excluding the group factor from the model. Non-significant group differences in beta power at rest have been previously reported in studies that did not account for aperiodic components ( Babiloni et al., 2012 ; Fraga González et al., 2018 ; Xue et al., 2020 ). This again contrasts with the study by Turri et al. (2023) , which observed lower aperiodic-adjusted beta power (at 15-25 Hz) in the dyslexic group. Concerning beta power during task, our results also contrast with previous studies which showed either reduced ( Spironelli et al., 2008 ) or increased ( Rippon and Brunswick, 2000 ) beta activity in participants with dyslexia. Nevertheless, since both of these studies employed phonological tasks and involved children’s samples, their relevance to our work is limited.

In terms of direct neurometabolite concentrations derived from the MRS, we found no evidence for group differences in either Glu, GABA or Glu/GABA imbalance in the language-sensitive left STS. Conversely, the Bayes Factor suggested against including the group factor in the model for the Glu/GABA ratio. While no previous study has localized the MRS voxel based on the individual activation levels, nonsignificant group differences in Glu and GABA concentrations within the temporo-parietal and visual cortices have been reported in both children and adults ( Kossowski et al., 2019 ), as well as in the ACC in children ( Horowitz-Kraus et al., 2018 ). Although our MRS sample size was half that of the EEG sample, previous research reporting group differences in Glu concentrations involved an even smaller dyslexic cohort (10 participants with dyslexia and 45 typical readers in Pugh et al., 2014 ). Consistent with earlier studies that identified group differences in Glu and GABA concentrations ( Del Tufo et al., 2018 ; Pugh et al., 2014 ) we reported neurometabolite levels relative to total creatine (tCr), indicating that the absence of corresponding results cannot be ascribed to reference differences. Notably, our analysis of the fMRI localizer task revealed greater activation in the control group as compared to the dyslexic group within the left STS for words than control stimuli (see Figure 1E and the Supplementary Material) in line with previous observations ( Blau et al., 2009 ; Dębska et al., 2021 ; Yan et al., 2021 ).

Irrespective of dyslexia status, we found negative correlations between age and exponent and offset, consistent with previous research ( Cellier et al., 2021 ; McSweeney et al., 2021 ; Schaworonkow and Voytek, 2021 ; Voytek et al., 2015 ) and providing further evidence for maturational changes in the aperiodic components (indicative of increased E/I ratio). At the same time, in line with previous MRS works ( Kossowski et al., 2019 ; Marsman et al., 2013 ), we observed a negative correlation between age and Glu concentrations. This suggests a contrasting pattern to EEG results, indicating a decrease in neuronal excitation with age. We also found a condition-dependent change in offset, with a lower offset observed at rest than during the language task. The offset value represents the uniform shift in power across frequencies ( Donoghue et al., 2020 ), with a higher offset linked to increased neuronal spiking rates ( Manning et al., 2009 ). Change in offset between conditions is consistent with observed increased alpha and beta power during the task, indicating elevated activity in both broadband (offset) and narrowband (alpha and beta oscillations) frequency ranges during the language task.

In regard to relationships between EEG and MRS E/I balance biomarkers, we observed a negative correlation between the offset in the left STS (both at rest and during the task) and Glu levels, after controlling for age and GMV. This correlation was not observed in zero-order correlations (see Supplementary Material). Contrary to our predictions, informed by previous studies linking the exponent to E/I ratio ( Colombo et al., 2019 ; Gao et al., 2017 ; Waschke et al., 2021 ), we found the correlation with Glu levels to involve the offset rather than the exponent. This outcome was unexpected, as none of the referenced studies reported results for the offset. However, given the strong correlation between the exponent and offset observed in our study ( r = .68, p < .001, BF 10 > 10000 and r = .72, p < .001, BF 10 > 10000 at rest and during the task respectively) it is conceivable that similar association might be identified for the offset if it were analyzed.

Nevertheless, previous studies examining relationships between EEG and MRS E/I balance biomarkers ( McKeon et al., 2024 ; van Bueren et al., 2023 ) did not identify a similar negative association between Glu and the offset. Instead, one study noted a positive correlation between the Glu/GABA ratio and the exponent ( van Bueren et al., 2023 ), which was significant in the intraparietal sulcus but not in the middle frontal gyrus. This finding presents counterintuitive evidence, suggesting that an increased E/I balance, as indicated by MRS, is associated with a higher aperiodic exponent, considered indicative of decreased E/I balance. In line with this pattern, another study discovered a positive relationship between the exponent and Glu levels in the dorsolateral prefrontal cortex ( McKeon et al., 2024 ). Furthermore, they observed a positive correlation between the exponent and the Glu/GABA imbalance measure, calculated as the absolute residual value of a linear relationship between Glu and GABA ( McKeon et al., 2024 ), a finding replicated in the current work. This implies that a higher spectral exponent might not be directly linked to MRS-derived Glu or GABA levels, but rather to a greater disproportion (in either direction) between these neurotransmitters. These findings, alongside the contrasting relationships between EEG and MRS biomarkers and age, suggest that these methods may reflect distinct biological mechanisms of E/I balance.

Evidence regarding associations between neurotransmitters levels and oscillatory activity also remains mixed. One study found a positive correlation between gamma peak frequency and GABA concentration in the visual cortex ( Muthukumaraswamy et al., 2009 ), a finding later challenged by a study with a larger sample ( Cousijn et al., 2014 ). Similarly, a different study noted a positive correlation between GABA in the left STS and gamma power ( Balz et al., 2016 ), another study, found non-significant relation between these measures ( Wyss et al., 2017 ). Moreover, in a simultaneous EEG and MRS study, an event-related increase in Glu following visual stimulation was found to correlate with greater gamma power ( Lally et al., 2014 ). We could not investigate such associations, as the algorithm failed to identify a gamma peak above the aperiodic component for the majority of participants. Also, contrary to previous findings showing associations between GABA in the motor and sensorimotor cortices and beta power ( Cheng et al., 2017 ; Gaetz et al., 2011 ) or beta peak frequency ( Baumgarten et al., 2016 ), we observed no correlation between Glu or GABA levels and beta power. However, these studies placed MRS voxels in motor regions which are typically linked to movement-related beta activity ( Baker et al., 1999 ; Rubino et al., 2006 ; Sanes and Donoghue, 1993 ) and did not adjust beta power for aperiodic components, making direct comparisons with our findings limited.

Finally, we examined pathways posited by the neural noise hypothesis of dyslexia, through which increased neural noise may impact reading: phonological awareness, lexical access and generalization, and multisensory integration ( Hancock et al., 2017 ). Phonological awareness was positively correlated with the offset in the left STS at rest, and with beta power in the left STS, both at rest and during the task. Additionally, multisensory integration showed correlations with GABA and the Glu/GABA ratio. Since the Bayes Factor did not provide conclusive evidence supporting either the alternative or null hypothesis, these associations appear rather weak. Nonetheless, given the hypothesis’s prediction of a causal link between these variables, we further examined a mediation model involving beta power, phonological awareness, and reading skills. The results suggested a positive indirect effect of beta power on reading via phonological awareness, whereas both the direct (controlling for phonological awareness and age) and total effects (without controlling for phonological awareness) were not significant. This finding is noteworthy, considering that participants with dyslexia exhibited reduced phonological awareness and reading skills, despite no observed differences in beta power. Given the cross-sectional nature of our study, further longitudinal research is necessary to confirm the causal relation among these variables. The effects of GABA and the Glu/GABA ratio on reading, mediated by multisensory integration, warrant further investigation. Additionally, considering our finding that only males with dyslexia showed deficits in multisensory integration ( Glica et al., 2024 ), sex should be considered as a potential moderating factor in future analyses. We did not test this model here due to the smaller sample size for GABA measurements.

Our findings suggest that the neural noise hypothesis, as proposed by Hancock and colleagues (2017) , does not fully explain the reading difficulties observed in dyslexia. Despite the innovative use of both EEG and MRS biomarkers to assess excitatory-inhibitory (E/I) balance, neither method provided evidence supporting an E/I imbalance in dyslexic individuals. Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis. Additionally, our findings are consistent with another study by Tan et al. (2022) which found no evidence for increased variability (’noise’) in behavioral and fMRI response patterns in dyslexia. Together, these results highlight the need to explore alternative neural mechanisms underlying dyslexia and suggest that cortical hyperexcitability may not be the primary cause of reading difficulties.

In conclusion, while our study challenges the neural noise hypothesis as a sole explanatory framework for dyslexia, it also underscores the complexity of the disorder and the necessity for multifaceted research approaches. By refining our understanding of the neural underpinnings of dyslexia, we can better inform future studies and develop more effective interventions for those affected by this condition.

Materials and methods

Participants.

A total of 120 Polish participants aged between 15.09 and 24.95 years ( M = 19.47, SD = 3.06) took part in the study. This included 60 individuals with a clinical diagnosis of dyslexia performed by the psychological and pedagogical counseling centers (28 females and 32 males) and 60 control participants without a history of reading difficulties (28 females and 32 males). All participants were right-handed, born at term, without any reported neurological/psychiatric diagnosis and treatment (including ADHD), without hearing impairment, with normal or corrected-to-normal vision, and IQ higher than 80 as assessed by the Polish version of the Abbreviated Battery of the Stanford-Binet Intelligence Scale-Fifth Edition (SB5) ( Roid et al., 2017 ).

The study was approved by the institutional review board at the University of Warsaw, Poland (reference number 2N/02/2021). All participants (or their parents in the case of underaged participants) provided written informed consent and received monetary remuneration for taking part in the study.

Reading and Reading-Related Tasks

Participants’ reading skills were assessed by multiple paper-pencil tasks described in detail in our previous work ( Glica et al., 2024 ). Briefly, we evaluated words and pseudowords read in one minute ( Szczerbiński and Pelc-Pękała, 2013 ), rapid automatized naming ( Fecenec et al., 2013 ), and reading comprehension speed. We also assessed phonological awareness by a phoneme deletion task ( Szczerbiński and Pelc-Pękała, 2013 ) and spoonerisms tasks ( Bogdanowicz et al., 2016 ), as well as orthographic awareness (Awramiuk and Krasowicz-Kupis, 2013). Furthermore, we evaluated non-verbal perception speed ( Ciechanowicz and Stańczak, 2006 ) and short-term and working memory by forward and backward conditions from the Digit Span subtest from the WAIS-R ( Wechsler, 1981 ). We also assessed participants’ multisensory audiovisual integration by a redundant target effect task, which results have been reported in our previous work ( Glica et al., 2024 ).

Electroencephalography Acquisition and Procedure

EEG was recorded from 62 scalp and 2 ear electrodes using the Brain Products system (actiCHamp Plus, Brain Products GmbH, Gilching, Germany). Data were recorded in BrainVision Recorder Software (Vers. 1.22.0002, Brain Products GmbH, Gilching, Germany) with a 500 Hz sampling rate. Electrodes were positioned in line with the extended 10-20 system. Electrode Cz served as an online reference, while the Fpz as a ground electrode. All electrodes’ impedances were kept below 10 kΩ. Participants sat in a chair with their heads on a chin-rest in a dark, sound-attenuated, and electrically shielded room while the EEG was recorded during both a 5-minute eyes-open resting state and the spoken language comprehension task. The paradigm was prepared in the Presentation software (Version 20.1, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com ).

During rest, participants were instructed to relax and fixate their eyes on a white cross presented centrally on a black background. After 5 minutes, the spoken language comprehension task automatically started. The task consisted of 3 to 5 word-long sentences recorded in a speech synthesizer which were presented binaurally through sound-isolating earphones. After hearing a sentence, participants were asked to indicate whether the sentence was true or false by pressing a corresponding button. In total, there were 256 sentences – 128 true (e.g., “Plants need water”) and 128 false (e.g., “Dogs can fly”).

Sentences were presented in a random order in two blocks of 128 trials. At the beginning of each trial, a white fixation cross was presented centrally on a black background for 500 ms, then a blank screen appeared for either 500, 600, 700, or 800 ms (durations set randomly and equiprobably) followed by an auditory sentence presentation. The length of sentences ranged between 1.17 and 2.78 seconds and was balanced between true ( M = 1.82 seconds, SD = 0.29) and false sentences ( M = 1.82 seconds, SD = 0.32; t (254) = -0.21, p = .835; BF 10 = 0.14). After a sentence presentation, a blank screen was displayed for 1000 ms before starting the next trial. To reduce participants’ fatigue, a 1-minute break between two blocks of trials was introduced, and it took approximately 15 minutes to complete the task.

fMRI Acquisition and Procedure

MRI data were acquired using Siemens 3T Trio system with a 32-channel head coil. Structural data were acquired using whole brain 3D T1-weighted image (MP_RAGE, TI = 1100 ms, GRAPPA parallel imaging with acceleration factor PE = 2, voxel resolution = 1mm 3 , dimensions = 256×256×176). Functional data were acquired using whole-brain echo planar imaging sequence (TE = 30ms, TR = 1410 ms, flip angle FA = 90°, FOV = 212 mm, matrix size = 92×92, 60 axial slices 2.3mm thick, 2.3×2.3 mm in-plane resolution, multiband acceleration factor = 3). Due to a technical issue, data from two participants were acquired with a 12-channel coil (see Supplementary Material).

The fMRI task served as a localizer for later MRS voxel placement in language-sensitive left STS. The task was prepared using Presentation software (Version 20.1, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com ) and consisted of three runs, each lasting 5 minutes and 9 seconds. Two runs involved the presentation of visual stimuli, while the third run of auditory stimuli. In each run, stimuli were presented in 12 blocks, with 14 stimuli per block. In visual runs, there were four blocks from each category: 1) 3 to 4 letters-long words, 2) the same words presented as a false font string (BACS font) ( Vidal et al., 2017 ), and 3) strings of 3 to 4-long consonants. Similarly, in the auditory run, there were four blocks from each category: 1) words recorded in a speech synthesizer, 2) the same words presented backward, and 3) consonant strings recorded in a speech synthesizer. Stimuli within each block were presented for 800 ms with a 400 ms break in between. The duration of each block was 16.8 seconds. Between blocks, a fixation cross was displayed for 8 seconds. Participants performed a 1-back task to maintain focus. The blocks were presented in a pseudorandom order and each block included 2 to 3 repeated stimuli.

MRS Acquisition and Procedure

The GE 7T system with a 32-channel coil was utilized. Structural data were acquired using whole brain 3D T1-weighted image (3D-SPGR BRAVO, TI = 450ms, TE = 2.6ms, TR = 6.6ms, flip angle = 12 deg, bandwidth = ±32.5kHz, ARC acceleration factor PE = 2, voxel resolution = 1mm, dimensions = 256 x 256 x 180). MRS spectra with 320 averages were acquired from the left STS using single-voxel spectroscopy semiLaser sequence ( Deelchand et al., 2021 ) (voxel size = 15 x 15 x 15 mm, TE = 28ms, TR = 4000ms, 4096 data points, water suppressed using VAPOR). Eight averages with unsuppressed water as a reference were collected.

To localize left STS, T1-weighted images from fMRI and MRS sessions were coregistered and fMRI peak coordinates were used as a center of voxel volume for MRS. Voxels were then adjusted to include only the brain tissue. During the acquisition, participants took part in a simple orthographic task.

Statistical Analyses

The continuous EEG signal was preprocessed in the EEGLAB ( Delorme and Makeig, 2004 ). The data were filtered between 0.5 and 45 Hz (Butterworth filter, 4th order) and re-referenced to the average of both ear electrodes. The data recorded during the break between blocks, as well as bad channels, were manually rejected. The number of rejected channels ranged between 0 and 4 ( M = 0.19, SD = 0.63). Next, independent component analysis (ICA) was applied. Components were automatically labeled by ICLabel ( Pion-Tonachini et al., 2019 ), and those classified with 50-100% source probability as eye blinks, muscle activity, heart activity, channel noise, and line noise, or with 0-50% source probability as brain activity, were excluded. Components labeled as “other” were visually inspected, and those identified as eye blinks and muscle activity were also rejected. The number of rejected components ranged between 11 and 46 ( M = 28.43, SD = 7.26). Previously rejected bad channels were interpolated using the nearest neighbor spline ( Perrin et al., 1989 , 1987 ).

The preprocessed data were divided into a 5-minute resting-state signal and a signal recorded during a spoken language comprehension task using MNE ( Gramfort, 2013 ) and custom Python scripts. The signal from the task was cut up based on the event markers indicating the beginning and end of a sentence. Only trials with correct responses given between 0 and 1000 ms after the end of a sentence were included. The signals recorded during every trial were further multiplied by the Tukey window with α = 0.01 in order to normalize signal amplitudes at the beginning and end of every trial. This allowed a smooth concatenation of signals recorded during task trials, resulting in a continuous signal derived only when participants were listening to the sentences.

The continuous signal from the resting state and the language task was epoched into 2-second-long segments. An automatic rejection criterion of +/-200 μV was applied to exclude epochs with excessive amplitudes. The number of epochs retained in the analysis ranged between 140–150 ( M = 149.66, SD = 1.20) in the resting state condition and between 102–226 ( M = 178.24, SD = 28.94) in the spoken language comprehension task.

Power spectral density (PSD) for 0.5-45 Hz in 0.5 Hz increments was calculated for every artifact-free epoch using Welch’s method for 2-second-long data segments windowed with a Hamming window with no overlap. The estimated PSDs were averaged for each participant and each channel separately for the resting state condition and the language task. Aperiodic and periodic (oscillatory) components were parameterized using the FOOOF method ( Donoghue et al., 2020 ). For each PSD, we extracted parameters for the 1-43 Hz frequency range using the following settings: peak_width_limits = [1, 12], max_n_peaks = infinite, peak_threshold = 2.0, mean_peak_height = 0.0, aperiodic_mode = ‘fixed’. Apart from broad-band aperiodic parameters (exponent and offset), we also extracted power, bandwidth, and the center frequency parameters for the theta (4-7 Hz), alpha (7-14 Hz), beta (14-30 Hz) and gamma (30-43 Hz) bands. Since in the majority of participants, the algorithm did not find the peak above the aperiodic component in theta and gamma bands, we calculated the results only for the alpha and beta bands. The results for other periodic parameters than the beta power are reported in Supplementary Material.

Apart from the frequentist statistics, we also performed Bayesian statistics using JASP ( JASP Team, 2023 ). For Bayesian repeated measures ANOVA, we reported the Bayes Factor for the inclusion of a given effect (BF incl ) with the ’across matched model’ option, as suggested by Keysers and colleagues (2020) , calculated as a likelihood ratio of models with a presence of a specific factor to equivalent models differing only in the absence of the specific factor. For Bayesian t -tests and correlations, we reported the BF 10 value, indicating the ratio of the likelihood of an alternative hypothesis to a null hypothesis. We considered BF incl/10 > 3 and BF incl/10 < 1/3 as evidence for alternative and null hypotheses respectively, while 1/3 < BF incl/10 < 3 as the absence of evidence ( Keysers et al., 2020 ).

MRS voxel localization in the native space

The data were analyzed using Statistical Parametric Mapping (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK) run on MATLAB R2020b (The MathWorks Inc., Natick, MA, USA). First, all functional images were realigned to the participant’s mean. Then, T1-weighted images were coregistered to functional images for each subject. Finally, fMRI data were smoothed with a 6mm isotropic Gaussian kernel.

In each subject, the left STS was localized in the native space as a cluster in the middle and posterior left superior temporal sulcus, exhibiting higher activation for visual words versus false font strings and auditory words versus backward words (logical AND conjunction) at p < .01 uncorrected. For 6 participants, the threshold was lowered to p < .05 uncorrected, while for another 6 participants, the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.

In the Supplementary Material, we also performed the group-level analysis of the fMRI data (Tables S5-S7 and Figure S1).

MRS data were analyzed using fsl-mrs version 2.0.7 ( Clarke et al., 2021 ). Data stored in pfile format were converted into NIfTI-MRS using spec2nii tool. We then used the fsl_mrs_preproc function to automatically perform coil combination, frequency and phase alignment, bad average removal, combination of spectra, eddy current correction, shifting frequency to reference peak and phase correction.

To obtain information about the percentage of WM, GM and CSF in the voxel we used the svs_segmentation with results of fsl_anat as an input. Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12. Next, quantitative fitting was performed using fsl_mrs function. As a basis set, we utilized a collection of 27 metabolite spectra simulated using FID-A ( Simpson et al., 2017 ) and a script tailored for our experiment. We supplemented this with synthetic macromolecule spectra provided by fsl_mrs . Signals acquired with unsuppressed water served as water reference.

Spectra underwent quantitative assessment and visual inspection and those with linewidth higher than 20Hz, %CRLB higher than 20%, and poor fit to the model were excluded from the analysis (see Table S8 in the Supplementary Material for a detailed checklist). Glu and GABA concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine).

Data Availability Statement

Behavioral data, raw and preprocessed EEG data, 2 nd level fMRI data, preprocessed MRS data and Python script for the analysis of preprocessed EEG data can be found at OSF: https://osf.io/4e7ps/

Acknowledgements

This study was supported by the National Science Centre grant (2019/35/B/HS6/01763) awarded to Katarzyna Jednoróg.

We gratefully acknowledge valuable discussions with Ralph Noeske from GE Healthcare for his support in setting up the protocol for an ultra-high field MR spectroscopy and sharing the set-up for basis set simulation in FID-A.

  • Buitelaar J
  • dos Santos FP
  • Verschure PFMJ
  • McAlonan G.
  • Krasowicz-Kupis G
  • Albertini G
  • Roa Romero Y
  • Ittermann B
  • Senkowski D
  • Baumgarten TJ
  • Oeltzschner G
  • Hoogenboom N
  • Wittsack H-J
  • Schnitzler A
  • van Atteveldt N
  • Bogdanowicz KM
  • Bogdanowicz M
  • Sajewicz-Radtke U
  • Karpińska E
  • Łockiewicz M
  • Ciechanowicz A
  • Napolitani M
  • Gosseries O
  • Casarotto S
  • Brichant J-F
  • Massimini M
  • Chieregato A
  • Harrison PJ
  • Dzięgiel-Fivet G
  • Łuniewska M
  • Grabowska A
  • Deelchand DK
  • Berrington A
  • Seraji-Bozorgzad N
  • Del Tufo SN
  • Fulbright RK
  • Peterson EJ
  • Sebastian P
  • Jaworowska A
  • Yingling CD
  • Johnstone J
  • Davenport L
  • Finkelman T
  • Furman-Haran E
  • Fraga González G
  • van der Molen MJW
  • de Geus EJC
  • van der Molen MW.
  • Roberts TPL
  • Giacometti P
  • Wasilewska K
  • Kossowski B
  • Żygierewicz J
  • Horowitz-Kraus T
  • Ermentrout B
  • Wagenmakers E-J
  • Bogorodzki P
  • Roberts M V.
  • Haenschel C
  • Lasnick OHM
  • MacMaster FP
  • Villiermet N
  • Manyukhina VO
  • Prokofyev AO
  • Obukhova TS
  • Schneiderman JF
  • Altukhov DI
  • Stroganova TA
  • Orekhova E V
  • Marchesotti S
  • Donoghue JP
  • van den Heuvel MP
  • Hilleke E. HP
  • Hetherington H
  • McSweeney M
  • Swerdlow NR
  • Muthukumaraswamy SD
  • Swettenham JB
  • Karalunas SL
  • Echallier JF
  • Pion-Tonachini L
  • Kreutz-Delgado K
  • Edenberg HJ
  • Chorlian DB
  • O’Connor SJ
  • Rohrbaugh J
  • Schuckit MA
  • Hesselbrock V
  • Conneally PM
  • Tischfield JA
  • Begleiter H
  • Grigorenko EL
  • Seidenberg MS
  • Brunswick N
  • Hatsopoulos NG
  • Salvatore S V.
  • Zorumski CF
  • Mennerick S
  • Schaworonkow N
  • Scrivener CL
  • Hennessy TJ
  • Spironelli C
  • Penolazzi B
  • Szczerbiński M
  • Pelc-Pękała O
  • van Bueren NER
  • van der Ven SHG
  • Cohen Kadosh R.
  • Van Hirtum T
  • Ghesquière P
  • Tempesta ZR
  • Achermann R

Article and author information

Katarzyna jednoróg, for correspondence:, version history.

  • Sent for peer review : June 11, 2024
  • Preprint posted : June 12, 2024
  • Reviewed Preprint version 1 : September 5, 2024

© 2024, Glica et al.

This article is distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use and redistribution provided that the original author and source are credited.

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Be the first to read new articles from eLife

IMAGES

  1. Alternative Hypothesis

    introduction of alternative hypothesis

  2. PPT

    introduction of alternative hypothesis

  3. Alternative hypothesis

    introduction of alternative hypothesis

  4. 13 Different Types of Hypothesis (2024)

    introduction of alternative hypothesis

  5. Introduction to Hypothesis Testing

    introduction of alternative hypothesis

  6. a paper with some type of text on it that says alternative hypothhes

    introduction of alternative hypothesis

VIDEO

  1. Hypothesis Testing: the null and alternative hypotheses

  2. Null Hypothesis vs Alternate Hypothesis

  3. Rejection Region and Significance Level

  4. Hypothesis Tests| Some Concepts

  5. Research understanding

  6. Types of Hypothesis

COMMENTS

  1. Null & Alternative Hypotheses

    Null & Alternative Hypotheses | Definitions, Templates & ...

  2. What is an Alternative Hypothesis in Statistics?

    Null hypothesis: µ ≥ 70 inches. Alternative hypothesis: µ < 70 inches. A two-tailed hypothesis involves making an "equal to" or "not equal to" statement. For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. The null and alternative hypotheses in this case would be: Null hypothesis: µ = 70 inches.

  3. 8.1: The null and alternative hypotheses

    Alternative hypothesis. Alternative hypothesis \(\left(H_{A}\right)\): If we conclude that the null hypothesis is false, or rather and more precisely, we find that we provisionally fail to reject the null hypothesis, then we provisionally accept the alternative hypothesis.The view then is that something other than random chance has influenced the sample observations.

  4. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

  5. 6a.1

    6a.1 - Introduction to Hypothesis Testing | STAT 500

  6. Null and Alternative Hypotheses

    The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (HA): There's an effect in the population. The effect is usually the effect of the independent variable on the dependent ...

  7. 8.4: The Alternative Hypothesis

    Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form: HA: μ > 7.47 or HA: μ < 7.47 H A: μ > 7.47 or H A: μ < 7.47.

  8. Alternative hypothesis

    The alternative hypothesis and null hypothesis are types of conjectures used in statistical tests, which are formal methods of reaching conclusions or making judgments on the basis of data. In statistical hypothesis testing, the null hypothesis and alternative hypothesis are two mutually exclusive statements. "The statement being tested in a test of statistical significance is called the null ...

  9. 5.1

    5.1 - Introduction to Hypothesis Testing. ... in the direction of the alternative hypothesis . A test is considered to be statistically significant when the p-value is less than or equal to the level of significance, also known as the alpha (\(\alpha\)) level. For this class, unless otherwise specified, \(\alpha=0.05\); this is the most ...

  10. 8.1 Null and Alternative Hypotheses

    Hypothesis Testing. The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

  11. Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

  12. 6.5 Introduction to Hypothesis Tests

    Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will perform some variation of these steps: Define hypotheses. Collect and/OR use the sample data to determine the correct distribution to use. Calculate Test Statistic.

  13. 8.2 Null and Alternative Hypotheses

    The alternative hypothesis is a claim that a population parameter is greater than, less than, or not equal to some value. For example, H a: μ> 5 H a: μ> 5, H a: μ <5 H a: μ <5, or H a: μ ≠ 5 H a: μ ≠ 5. The form of the alternative hypothesis depends on the wording of the hypothesis test. An alternative notation for H a H a is H 1 H 1.

  14. 7.4: The Alternative Hypothesis

    Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form: HA: μ> 7.47 or HA: μ <7.47 H A: μ> 7.47 or H A: μ <7.47.

  15. Hypothesis Testing

    Hypothesis Testing | A Step-by-Step Guide with Easy ...

  16. Chapter 7: Introduction to Hypothesis Testing

    The Alternative Hypothesis. If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, H A or H 1. ... As discussed in the introduction to hypothesis testing, it is better to interpret the probability value as an indication of the weight of evidence against the null hypothesis than as ...

  17. 6.2: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  18. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement of no difference between the variables-they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  19. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  20. 10.2: Null and Alternative Hypotheses

    They are called the null hypothesis and the alternative hypothesis. These hypotheses contain opposing viewpoints. The null hypothesis (\ (H_ {0}\)) is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

  21. Introduction to Hypothesis Testing

    The null hypothesis, denoted as H 0, is the hypothesis that the sample data occurs purely from chance. The alternative hypothesis, denoted as H 1 or H a, is the hypothesis that the sample data is influenced by some non-random cause. Hypothesis Tests. A hypothesis test consists of five steps: 1. State the hypotheses. State the null and ...

  22. 9.1: Introduction to Hypothesis Testing

    The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\). An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor of the alternative, or to fail to reject the null hypothesis. The decision that we make must, of course, be based on the ...

  23. An Introduction to Statistics: Understanding Hypothesis Testing and

    HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...

  24. Reevaluating the Neural Noise Hypothesis in Dyslexia: Insights ...

    Introduction. According to the neural noise hypothesis of dyslexia, reading difficulties stem from an imbalance ... For Bayesian t-tests and correlations, we reported the BF 10 value, indicating the ratio of the likelihood of an alternative hypothesis to a null hypothesis. We considered BF incl/10 > 3 and BF incl/10 < 1/3 as evidence for ...