Sample Size Calculator

Find out the sample size.

This calculator computes the minimum number of necessary samples to meet the desired statistical constraints.

Confidence Level:  
Margin of Error:  
Population Proportion: Use 50% if not sure
Population Size: Leave blank if unlimited population size.
 

Find Out the Margin of Error

This calculator gives out the margin of error or confidence interval of observation or survey.

Confidence Level:  
Sample Size:  
Population Proportion:  
Population Size: Leave blank if unlimited population size.
 

Related Standard Deviation Calculator | Probability Calculator

In statistics, information is often inferred about a population by studying a finite number of individuals from that population, i.e. the population is sampled, and it is assumed that characteristics of the sample are representative of the overall population. For the following, it is assumed that there is a population of individuals where some proportion, p , of the population is distinguishable from the other 1-p in some way; e.g., p may be the proportion of individuals who have brown hair, while the remaining 1-p have black, blond, red, etc. Thus, to estimate p in the population, a sample of n individuals could be taken from the population, and the sample proportion, p̂ , calculated for sampled individuals who have brown hair. Unfortunately, unless the full population is sampled, the estimate p̂ most likely won't equal the true value p , since p̂ suffers from sampling noise, i.e. it depends on the particular individuals that were sampled. However, sampling statistics can be used to calculate what are called confidence intervals, which are an indication of how close the estimate p̂ is to the true value p .

Statistics of a Random Sample

The uncertainty in a given random sample (namely that is expected that the proportion estimate, p̂ , is a good, but not perfect, approximation for the true proportion p ) can be summarized by saying that the estimate p̂ is normally distributed with mean p and variance p(1-p)/n . For an explanation of why the sample estimate is normally distributed, study the Central Limit Theorem . As defined below, confidence level, confidence intervals, and sample sizes are all calculated with respect to this sampling distribution. In short, the confidence interval gives an interval around p in which an estimate p̂ is "likely" to be. The confidence level gives just how "likely" this is – e.g., a 95% confidence level indicates that it is expected that an estimate p̂ lies in the confidence interval for 95% of the random samples that could be taken. The confidence interval depends on the sample size, n (the variance of the sample distribution is inversely proportional to n , meaning that the estimate gets closer to the true proportion as n increases); thus, an acceptable error rate in the estimate can also be set, called the margin of error, ε , and solved for the sample size required for the chosen confidence interval to be smaller than e ; a calculation known as "sample size calculation."

Confidence Level

The confidence level is a measure of certainty regarding how accurately a sample reflects the population being studied within a chosen confidence interval. The most commonly used confidence levels are 90%, 95%, and 99%, which each have their own corresponding z-scores (which can be found using an equation or widely available tables like the one provided below) based on the chosen confidence level. Note that using z-scores assumes that the sampling distribution is normally distributed, as described above in "Statistics of a Random Sample." Given that an experiment or survey is repeated many times, the confidence level essentially indicates the percentage of the time that the resulting interval found from repeated tests will contain the true result.

Confidence Levelz-score (±)
0.701.04
0.751.15
0.801.28
0.851.44
0.921.75
0.951.96
0.962.05
0.982.33
0.992.58
0.9993.29
0.99993.89
0.999994.42

Confidence Interval

In statistics, a confidence interval is an estimated range of likely values for a population parameter, for example, 40 ± 2 or 40 ± 5%. Taking the commonly used 95% confidence level as an example, if the same population were sampled multiple times, and interval estimates made on each occasion, in approximately 95% of the cases, the true population parameter would be contained within the interval. Note that the 95% probability refers to the reliability of the estimation procedure and not to a specific interval. Once an interval is calculated, it either contains or does not contain the population parameter of interest. Some factors that affect the width of a confidence interval include: size of the sample, confidence level, and variability within the sample.

There are different equations that can be used to calculate confidence intervals depending on factors such as whether the standard deviation is known or smaller samples (n<30) are involved, among others. The calculator provided on this page calculates the confidence interval for a proportion and uses the following equations:

confidence interval equations

where is z score
is the population proportion
and are sample size
is the population size

Within statistics, a population is a set of events or elements that have some relevance regarding a given question or experiment. It can refer to an existing group of objects, systems, or even a hypothetical group of objects. Most commonly, however, population is used to refer to a group of people, whether they are the number of employees in a company, number of people within a certain age group of some geographic area, or number of students in a university's library at any given time.

It is important to note that the equation needs to be adjusted when considering a finite population, as shown above. The (N-n)/(N-1) term in the finite population equation is referred to as the finite population correction factor, and is necessary because it cannot be assumed that all individuals in a sample are independent. For example, if the study population involves 10 people in a room with ages ranging from 1 to 100, and one of those chosen has an age of 100, the next person chosen is more likely to have a lower age. The finite population correction factor accounts for factors such as these. Refer below for an example of calculating a confidence interval with an unlimited population.

EX: Given that 120 people work at Company Q, 85 of which drink coffee daily, find the 99% confidence interval of the true proportion of people who drink coffee at Company Q on a daily basis.

confidence interval example

Sample Size Calculation

Sample size is a statistical concept that involves determining the number of observations or replicates (the repetition of an experimental condition used to estimate the variability of a phenomenon) that should be included in a statistical sample. It is an important aspect of any empirical study requiring that inferences be made about a population based on a sample. Essentially, sample sizes are used to represent parts of a population chosen for any given survey or experiment. To carry out this calculation, set the margin of error, ε , or the maximum distance desired for the sample estimate to deviate from the true value. To do this, use the confidence interval equation above, but set the term to the right of the ± sign equal to the margin of error, and solve for the resulting equation for sample size, n . The equation for calculating sample size is shown below.

sample size equations

where is the z score
is the margin of error
is the population size
is the population proportion

EX: Determine the sample size necessary to estimate the proportion of people shopping at a supermarket in the U.S. that identify as vegan with 95% confidence, and a margin of error of 5%. Assume a population proportion of 0.5, and unlimited population size. Remember that z for a 95% confidence level is 1.96. Refer to the table provided in the confidence level section for z scores of a range of confidence levels.

sample size example

Thus, for the case above, a sample size of at least 385 people would be necessary. In the above example, some studies estimate that approximately 6% of the U.S. population identify as vegan, so rather than assuming 0.5 for p̂ , 0.06 would be used. If it was known that 40 out of 500 people that entered a particular supermarket on a given day were vegan, p̂ would then be 0.08.

Search

SAMPLE SIZE CALCULATION FOR THESIS (MD/MS/DNB)

sample size Calculator

Sample Size Calculator

Determination of sample size from number of cases in pilot study.

Determination of Sample Size from a pilot study is the easiest way of determining sample size for Your study. In this type of sample size determination 3 values can be obtained from the sample size taken from pilot study.

1) Standard Error of Mean (2) Precision (3) Sample Size

The Formula Used for This type of Sample size determination is

N = (Z α2) X SD2/ Precision2

Z α– Statistical constant (1.96)

SD – Expected Standard Deviation (that can be obtained from previous studies or a    pilot study).

d – Precision/ allowable error (corresponding to effect size)

Remember following general Guidelines while trying to determine sample size from a pilot study.

1. Use the pilot study which deals with the same topic as of yours. Like for example if your study is about neonatal seizures then pilot study must also deal with neonatal seizures.

2. Use the pilot study in which number of cases are near to the number of cases you want to include in your paper. For example, if you want to include 80 cases in your study then use pilot study in which close to 80 patients has been included.

Standard Error Of Mean:

Sample Size (n):

Publication in indexed medical journals

Disclaimer!

There are many methods of sample size determination. It is one of the first hurdle when someone starts writing a thesis. I have tried to give simplest way of determination of sample size. You need to show the method to your PG teacher before you include this method in your thesis. First confirm from your PG teacher and then only proceed.

Sample Size Calculator

Sample size estimation in clinical research: from randomized controlled trials to observational studies.

CCF Quantitative Health Department

Introduction

research design description

Wang, X. and Ji, X., 2020. Sample size estimation in clinical research: from randomized controlled trials to observational studies. Chest, 158(1), pp.S12-S20.

Wang, X. and Ji, X., 2020. Sample size formulas for different study designs: supplement document for sample size estimation in clinical research.

CCF Quantitative Health Department

  • Continuous Outcome
  • Dichotomous Outcome
  • Time-to-event

Reference Example

Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample Size Calculations in Clinical Research. Third ed: Chapman and Hall/CRC; 2017.

Type I error rate, \(\alpha\)

Power, \(1-\beta\)

Ratio of case to control, \(k\)

Allowable difference, \(d=\mu_T-\mu_C\)

Expected population standard deviation, \(\text{SD}\)

\(\delta (>0)\)

Drop rate (%, 0 ~ 99)

Margin on risk difference scale (\(\delta \geq 0)\)

Margin for log-scale odds ratio (\(\delta>0)\)

Schoenfeld D. The Asymptotic Properties of Nonparametric-Tests for Comparing Survival Distributions. Biometrika. 1981;68(1):316-319.

Schoenfeld D. Sample-Size Formula for the Proportional-Hazards Regression-Model. Biometrics. 1983;39(2):499-503.

Margin for log-scale hazard ratio (\(\delta\)>0)

Accrual time period, \(T_a\)

Follow-up time period, \(T_b\)

Hazard for the control group , \(\lambda_C\)

Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. Third ed: John Wiley & Sons; 2013.

A case-control study of the relationship between smoking and CHD is planned. A sample of men with newly diagnosed CHD will be compared for smoking status with a sample of controls. Assuming an equal number of cases and controls (i.e., \(k = 1\)). Previous surveys have shown that around 0.40 of males without CHD are smokers (i.e., \(p_0 = 0.4\)). For achieving an 90% power (i.e., \(1-\beta = 0.9\)) at the 5% level of significance (i.e., \(\alpha = 0.05\)), the sample size to detect an odds ratio of 1.5 (i.e., \(OR = 1.5\) or \(p_1 = 0.5\)) is \(519\) cases and \(519\) controls or \(538\) cases and \(538\) controls by incorporating the continuity correction.

Dupont WD. Power calculations for matched case-control studies. Biometrics. 1988;44(4):1157-1168.

Suppose a researcher conduct a matched case-control study to assess whether bladder cancer may be associated with past exposure to cigarette smoking. Cases will be patients with bladder cancer and controls will be patients hospitalised for injury. One case will be matched to one control (i.e., \(k = 1\))and the correlation between case and control exposures for matched pairs is estimated to be 0.01 (low, i.e., \(r = 0.01\)). It is assumed that 20% of controls will be smokers or past smokers (i.e., \(p_0 = 0.2\)), and the researcher wish to detect an odds ratio of 2 (i.e., \(OR = 2\) or \(p_1 = 0.67\)) with power 90% (i.e., \(1-\beta = 0.9\)). The sample size needed for cases and controls is \(16\) and \(16\), respectively.

  • Independent
  • Proportional Outcome

Woodward M. Formulae for sample size, power and minimum detectable relative risk in medical studies. Journal of the Royal Statistical Society: Series D (The Statistician). 1992;41(2):185-196.

Fleiss JL, Tytun A, Ury HK. A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics. 1980;36(2):343-346.

A government initiative has decided to reduce the prevalence of male smoking to 30% (i.e., \(p_1 = 0.3\)). A sample survey is planned to test, at the 0.05 level (i.e., \(\alpha = 0.05\)), the hypothesis that the percentage of smokers in the male population is 30% against the one-sided alternative that it is greater. The survey should be able to find a prevalence of 32% (i.e., \(p_0 = 0.32\)), when it is true, with 0.90 power (i.e., \(1-\beta=0.9\)). The survey needs to sample \(9158\) in males pre inititative and \(9158\) in males post government initiative (or \(9257\) and \(9257\) by incorporating the continuity correction).

Ratio of unexposed to exposed, \(k\)

Woodward M (2005). Epidemiology Study Design and Data Analysis. Chapman & Hall/CRC, New York, pp. 381 - 426.

Supposed we wish to test, at the 5% level of significance (i.e., \(\alpha = 0.05\)), the hypothesis that cholesterol means in a population are equal in two study years against the one-sided alternative that the mean is higher in the second of the two years. Suppose that equal sized samples will be taken in each year (i.e., \(k=1\)), but that these will not necessarily be from the same individuals (i.e. the two samples are drawn independently). Our test is to have a power of 0.95 (i.e., \(1-\beta = 0.95\)) at detecting a difference of 0.5 mmol/L (i.e., \(m_0 = 0, m_1 = 0.5\)). The standard deviation of serum cholesterol in humans is assumed to be 1.4 mmol/L (i.e., \(SD = 1.4\)). We need to test \(170\) in the first year and \(170\) in the second year.

Breslow NE, Day NE, Heseltine E, Breslow NE. Statistical Methods in Cancer Research: The Design and Analysis of Cohort Studies. International Agency for Research on Cancer; 1987.

A matched cohort study is to be conduct to quantify the association between exposure A and an outcome B. Assume the prevalence of event in unexposed group is 0.60 (i.e., \(p_0 = 0.6\)) and the correlation between exposed and unexposed for matched pairs is 0.20 (moderate, i.e., \(r = 0.2\)). In order to detect a relative risk of 0.75 (i.e., \(RR=0.75\) or \(p_1 = 0.45\)) with 0.80 power (i.e., \(1-\beta = 0.8\)) using a two-sided 0.05 test (i.e., \(\alpha=0.05\)), there needs to be \(1543\) unexposed and \(1543\) exposed.

Rubinstein LV, Gail MH, Santner TJ. Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation. J Chronic Dis. 1981;34(9-10):469-479.

Suppose a two-arm prospective cohort study with 1 year accrual time period (period of time that patients are entering the study, \(T_a = 1\)) and 1 year follow-up time period (period of time after accrual has ended before the final analysis is conducted, \(T_b=1\)). Assume the hazard for the unexposed group is a constant risk over time at 0.5 (i.e., \(\lambda_0 = 0.5\)). To achieve 80% power (i.e., \(1-\beta=0.8\)) to detect Hazard ratio of 2 (i.e., \(HR = 2\)) in the hazard of the exposed group by using a two-sided 0.05-level log-rank test (i.e., \(\alpha=0.05\)), the required sample size for unexposed group is \(53\) and for exposed group is \(53\).

Hazard for the unexposed group , \(\lambda_0\)

Woodward M. Formulae for sample size, power and minimum detectable relative risk in medical studies. Journal of the Royal Statistical Society: Series D (The Statistician). 1992;41(2):185-196

Suppose that the primary interest lies in comparing systolic blood pressure between the two cities. Assume that simple random sampling from among 40-44-year-old men is to be used in each city with twice as many sampled from City 1 as from City 2, so that \(k=2\). Systolic blood pressure is to be compared using a one-sided 5% significance test (i.e. \(\alpha = 0.05\)). The medical investigators wish to be 95% sure of detecting when the average blood pressure in City 1 exceeds that in City 2 by 3 mm Hg (i.e., \(1-\beta=0.95\) and \(m_1 = 3\), \(m_2 = 0\)). From published literature (Smith et al. 1989) the standard deviation of systolic blood pressure is likely to be 15.6mmHg (i.e. \(SD=15.6\)). The sample size required is \(878\) for City 1 and \(439\) for City 2.

Ratio of first samples to second samples, \(k\)

Suppose the estimated prevalence of smoking is higher among male students (around 50%, i.e., \(p_1 = 0.5\)) compared with female students (around 35%, i.e., \(p_2 = 0.35\)). In order to 80% certain (i.e., \(1-\beta=0.8\)) of detecting a prevalence ratio of \(RR = 0.50 / 0.35 = 1.428\) using a 0.05 level of significance (i.e., \(\alpha =0.05\)) with equal number of recruited males and females, the study needs to enroll \(170\) males and \(170\) females.

Cochran WG. Sampling Techniques. John Wiley & Sons; 1977.

Kotrlik, J. W. K. J. W., & Higgins, C. C. H. C. C. (2001). Organizational research: Determining appropriate sample size in survey research appropriate sample size in survey research. Information technology, learning, and performance journal, 19(1), 43.

Suppose the researcher assumes a seven (\(7\)) point scaled survery as a continuous data. Suppose for the continuous variable, the level of acceptable error is 3% (i.e., \(d = 0.21\)), and the estimated standard deviation of the scale as 1.167 (i.e., \(SD = 1.167\)). At the 5% Type I error rate (i.e., \(\alpha = 0.05\)), the sample size of the survery is \(119\).

Standard deviation of outcome, \(SD\)

Absolute error or precision, \(d\)

Suppose for the proportional variable, the level of acceptable error is 5% (i.e., \(d = 0.05\)), and the expected proportion in population is 0.5 (i.e., \(p = 0.5\)). At the 5% Type I error rate (i.e., \(\alpha = 0.05\)), the sample size of the survery is \(385\).

Expected proportion in population, \(p\)

  • Binary Outcome
  • Time-to-event Outcome

Riley R D, Ensor J, Snell K I E, Harrell F E, Martin G P, Reitsma J B et al. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ, m441. doi: 10.1136/bmj.m441

Expected value of the (Cox-Snell) R-squared of the new model

Number of candidate predictor parameters for potential inclusion in the new model

Level of shrinkage desired at internal validation after developing the new model

Overall outcome proportion (for a prognostic model) or overall prevalence (for a diagnostic model)

C-statistic reported in an existing prediction model study

Overall event rate in the population of interest

Timepoint of interest for prediction in follow-up

Average (mean) follow-up time anticipated for individuals

Average outcome value in the population of interest

Standard deviation (SD) of outcome values in the population

Multiplicative margin of error (MMOE) acceptable for calculation of the intercept

Lu, Grace, "Sample Size Formulas For Estimating Areas Under the Receiver Operating Characteristic Curves With Precision and Assurance" (2021). Electronic Thesis and Dissertation Repository. 8045. https://ir.lib.uwo.ca/etd/8045

Area under ROC curve

Null hypothesis AUC value

Prevalence (ratio of positive cases / total sample size)

Power, 1-\(\beta\)

Sample Size Calculator

Determines the minimum number of subjects for adequate study power, clincalc.com » statistics » sample size calculator, study group design.

Two independent study groups

One study group vs. population

Primary Endpoint

Dichotomous (yes/no)

Continuous (means)

Statistical Parameters

Group 1

Standard deviation is determined by examining previous literature of a similar patient population.

" />
Group 2 " />
Enrollment ratio

For most studies, the enrollment ratio is 1 (ie, equal enrollment between both groups).

Some studies will have different enrollment ratios (2:1, 3:1) for additional safety data.

" />
Group 1 " />
Group 2 " />
Enrollment ratio

For most studies, the enrollment ratio is 1 (ie, equal enrollment between both groups).

Some studies will have different enrollment ratios (2:1, 3:1) for additional safety data.

" />
Known population

This value is determined by examining previous literature of a similar patient population.

" />
Study group " />
Known population

The mean and standard deviation are determined by examining previous literature of a similar patient population.

" />
Study group " />
Alpha

Most medical literature uses a value of 0.05.

" />
Power

Most medical literature uses a value of 80-90% power (β of 0.1-0.2)

" />

Dichotomous Endpoint, Two Independent Sample Study

Sample Size
Group 1 690
Group 2 690
Total 1380
Study Parameters
Incidence, group 1 35%
Incidence, group 2 28%
Alpha 0.05
Beta 0.2
Power 0.8

Biostatistics Rx - Which statistical test is most appropriate to analyze median weight loss (in kg) between semaglutize and tirzepatide? Independent t-test, Chi-square test, or Mann-Whitney U test?

About This Calculator

This calculator uses a number of different equations to determine the minimum number of subjects that need to be enrolled in a study in order to have sufficient statistical power to detect a treatment effect. 1

Before a study is conducted, investigators need to determine how many subjects should be included. By enrolling too few subjects, a study may not have enough statistical power to detect a difference (type II error). Enrolling too many patients can be unnecessarily costly or time-consuming.

Generally speaking, statistical power is determined by the following variables:

  • Baseline Incidence: If an outcome occurs infrequently, many more patients are needed in order to detect a difference.
  • Population Variance: The higher the variance (standard deviation), the more patients are needed to demonstrate a difference.
  • Treatment Effect Size: If the difference between two treatments is small, more patients will be required to detect a difference.
  • Alpha: The probability of a type-I error -- finding a difference when a difference does not exist. Most medical literature uses an alpha cut-off of 5% (0.05) -- indicating a 5% chance that a significant difference is actually due to chance and is not a true difference.
  • Beta: The probability of a type-II error -- not detecting a difference when one actually exists. Beta is directly related to study power (Power = 1 - β). Most medical literature uses a beta cut-off of 20% (0.2) -- indicating a 20% chance that a significant difference is missed.

Post-Hoc Power Analysis

To calculate the post-hoc statistical power of an existing trial, please visit the post-hoc power analysis calculator .

References and Additional Reading

  • Rosner B. Fundamentals of Biostatistics . 7th ed. Boston, MA: Brooks/Cole; 2011.

Related Calculators

  • Post-hoc Power Calculator

Mailing List

New and popular, cite this page.

Show AMA citation

Biostatistics Rx - Medical literature evaluation and biostats WITHOUT the complex math and formulas

We've filled out some of the form to show you this clinical calculator in action. Click here to start from scratch and enter your own patient data.

Sample Size Calculator

You can use this free sample size calculator to determine the sample size of a given survey per the sample proportion, margin of error, and required confidence level.

Confidence Level (α) : 70% 75% 80% 85% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 99.5% 99.9% 99.99%

What is Sample Size?

Some basic terms are of interest when calculating sample size. These are as follows:

Margin of Error: Margin of error is also measured in percentage terms. It indicates the extent to which the outputs of the sample population are reflective of the overall population. The lower the margin of error, the nearer the researcher is to having an accurate response at a given confidence level. To determine the margin of error, take a look at our margin of error calculator .

Sample Size Formula

The Sample Size Calculator uses the following formulas:

z   is the z-score associated with a level of confidence,

Example of a Sample Size Calculation:   Let's say we want to calculate the proportion of patients who have been discharged from a given hospital who are happy with the level of care they received while hospitalized at a 90% confidence level of the proportion within 4%. What sample size would we require?

where z = 1.645 for a confidence level (α) of 90%, p = proportion (expressed as a decimal), e = margin of error.

Desired Confidence LevelZ-Score
70%1.04
75%1.15
80%1.28
85%1.44
90%1.645
91%1.70
92%1.75
93%1.81
94%1.88
95%1.96
96%2.05
97%2.17
98%2.33
99%2.576
99.5%2.807
99.9%3.29
99.99%3.89

Reference: Daniel WW (1999). Biostatistics: A Foundation for Analysis in the Health Sciences. 7th edition. New York: John Wiley & Sons.

Sample Size Calculators

For designing clinical research.

thesis sample size calculator

  • Calculators
  • CI for proportion
  • CI for mean
  • Means - effect size
  • Means - sample size
  • Proportions - effect size
  • Proportions - sample size
  • CI for proportion - sample size
  • Survival analysis - sample size
  • CI for risk ratio
  • More calculators...

Calculator finder

  • About calculating sample size

This project was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through UCSF-CTSI Grant Numbers UL1 TR000004 and UL1 TR001872. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

Please cite this site wherever used in published work:

Kohn MA, Senyak J. Sample Size Calculators [website]. UCSF CTSI. 12 June 2024. Available at https://www.sample-size.net/ [Accessed 23 August 2024]

This site was last updated on June 12, 2024.

thesis sample size calculator

  • Calculators
  • Descriptive Statistics
  • Merchandise
  • Which Statistics Test?

Sample Size Calculator

This calculator allows you to determine an appropriate sample size for your study, given different combinations of confidence, precision and variability.

For large populations, it uses Cochran's equation to perform the calculation.

For small populations of a known size, it uses Cochran's equation together with a population correction to calculate sample size.

Instructions

The default values we provide below will work well for many scenarios.

Precision Level is the margin of error you're prepared to tolerate - e.g., 5% means a result that is within 5 percentage points of the true population value.

Confidence Level is a measure of confidence in the precision of the result. For example, selecting 5% as the level of precision, and 95% as the confidence level, indicates a result that is within 5% of the real population value 95% of the time.

Estimated Proportion is a measure of variability. We suggest you leave this at 0.5 - maximum variability - unless you have prior knowledge about the population from which you are drawing your sample.

The final thing to note is that if you know the size of the population from which you wish to take a sample, you can select the Small Population option, and specify population size. This will result in a smaller sample.

The Calculator

HyperLink

Sample Size Calculation and Sample Size Justification

Sample size calculation is concerned with how much data we require to make a correct decision on particular research.  If we have more data, then our decision will be more accurate and there will be less error of the parameter estimate.  This doesn’t necessarily mean that more is always best in sample size calculation.  A statistician with expertise in sample size calculation will need to apply statistical techniques and formulas in order to find the correct sample size calculation accurately.

There are some basics formulas for sample size calculation, although sample size calculation differs from technique to technique.  For example, when we are comparing the means of two populations, if the sample size is less than 30, then we use the t-test .  If the sample size is greater than 30, then we use the z-test. If the population size is small, than we need a bigger sample size, and if the population is large, then we need a smaller sample size as compared to the smaller population.  Sample size calculation will also differ with different margins of error.

request a consultation

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

  • Bring dissertation editing expertise to chapters 1-5 in timely manner.
  • Track all changes, then work with you to bring about scholarly writing.
  • Ongoing support to address committee feedback, reducing revisions.

Intellectus allows you generate your sample size write-up in seconds. Click the button below to create a free account now!

Statistical consulting provides a priori sample size calculation, to tell you how many participants you need, and sample size justification, to justify the sample you can obtain.  Knowing the appropriate number of participants for your particular study and being able to justify your sample size is important to meet your power and effect size requirements.  Using the appropriate power and establishing the effect size will tell you how many people you need to find statistically significant results.   Power and effect size measurements are also important to lending credibility to your study and are easily calculated by the experts at Statistics Solutions. Sample size justification is as important as the sample size calculation.  If the sample size cannot be accurately justified, the researcher will not be able to make a valid inference. Statistics Solutions can assist with determining the sample size / power analysis for your research study.  To learn more, visit our webpage on sample size / power analysis , or contact us today .

Tools to Calculate Sample Size and Power Analysis

Statistics Solutions offers tools to calculate sample size for populations and power analysis for your dissertation or research study.  Our sample size for populations calculator is available using the Intellectus Statistics application. The Sample Size/Power Analysis Calculator with Write-up is a tool for anyone struggling with power analysis.  Simply identify the test to be conducted and the degrees of freedom where applicable (explained in the document), and the sample size/power analysis calculator will calculate your sample size for a power of .80 of an alpha of .05 for small, medium and large effect sizes.  The sample size/power analysis calculator then presents the write-up with references which can easily be integrated in your dissertation document.   Click here for a sample. For questions about these or any of our products and services, please email [email protected] or call 877-437-8622.

Additional Resource Pages Related to Sample Size Calculation and Sample Size Justification:

  • Sample Size / Power Analysis
  • Statistical Power Analysis
  • Monte Carlo Methods
  • Sample Size Formula
  • Standard Error

Power Analysis Resources

Abraham, W. T., & Russell, D. W. (2008). Statistical power analysis in psychological research. Social and Personality Psychology Compass, 2 (1), 283-301.

Bausell, R. B., & Li, Y. -F. (2002). Power analysis for experimental research: A practical guide for the biological, medical and social sciences. Cambridge, UK: Cambridge University Press.

Bonett, D. G., & Seier, E. (2002). A test of normality with high uniform power. Computational Statistics & Data Analysis , 40 (3), 435-445.

Goodman, S. N. & Berlin, J. A. (1994). The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine, 121 (3), 200-206.

Jones, A., & Sommerlund, B. (2007). A critical discussion of null hypothesis significance testing and statistical power analysis within psychological research. Nordic Psychology, 59 (3), 223-230.

MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11 (1), 19-35.

Sahai, H., & Khurshid, A. (1996). Formulas and tables for the determination of sample sizes and power in clinical trials involving the difference of two populations: A review. Statistics in Medicine , 15 (1), 1-21.

Sample Size Calculator

What is a sample size.

Sample size refers to the number of observations or participants in a product or web experiment. It plays a crucial role in the reliability and accuracy of tests, ensuring that results effectively represent the population you're studying.

How to calculate sample size

Calculating sample size involves considering several factors, including your confidence level, minimum detectable effect, and baseline conversion rate. Inputting these parameters into a sample size calculator helps you determine the minimum number of participants you need to detect a meaningful effect with a certain degree of certainty. This calculation ensures that your experiments are adequately powered to yield statistically significant results, providing reliable insights for decision making.

Statistical Significance Calculator

Churn calculator.

GeoPoll

How to Determine Sample Size for a Research Study

Frankline kibuacha | apr. 06, 2021 | 3 min. read.

sample size research

This article will discuss considerations to put in place when determining your sample size and how to calculate the sample size.

Confidence Interval and Confidence Level

As we have noted before, when selecting a sample there are multiple factors that can impact the reliability and validity of results, including sampling and non-sampling errors . When thinking about sample size, the two measures of error that are almost always synonymous with sample sizes are the confidence interval and the confidence level.

Confidence Interval (Margin of Error)

Confidence intervals measure the degree of uncertainty or certainty in a sampling method and how much uncertainty there is with any particular statistic. In simple terms, the confidence interval tells you how confident you can be that the results from a study reflect what you would expect to find if it were possible to survey the entire population being studied. The confidence interval is usually a plus or minus (±) figure. For example, if your confidence interval is 6 and 60% percent of your sample picks an answer, you can be confident that if you had asked the entire population, between 54% (60-6) and 66% (60+6) would have picked that answer.

Confidence Level

The confidence level refers to the percentage of probability, or certainty that the confidence interval would contain the true population parameter when you draw a random sample many times. It is expressed as a percentage and represents how often the percentage of the population who would pick an answer lies within the confidence interval. For example, a 99% confidence level means that should you repeat an experiment or survey over and over again, 99 percent of the time, your results will match the results you get from a population.

The larger your sample size, the more confident you can be that their answers truly reflect the population. In other words, the larger your sample for a given confidence level, the smaller your confidence interval.

Standard Deviation

Another critical measure when determining the sample size is the standard deviation, which measures a data set’s distribution from its mean. In calculating the sample size, the standard deviation is useful in estimating how much the responses you receive will vary from each other and from the mean number, and the standard deviation of a sample can be used to approximate the standard deviation of a population.

The higher the distribution or variability, the greater the standard deviation and the greater the magnitude of the deviation. For example, once you have already sent out your survey, how much variance do you expect in your responses? That variation in responses is the standard deviation.

Population Size

population

As demonstrated through the calculation below, a sample size of about 385 will give you a sufficient sample size to draw assumptions of nearly any population size at the 95% confidence level with a 5% margin of error, which is why samples of 400 and 500 are often used in research. However, if you are looking to draw comparisons between different sub-groups, for example, provinces within a country, a larger sample size is required. GeoPoll typically recommends a sample size of 400 per country as the minimum viable sample for a research project, 800 per country for conducting a study with analysis by a second-level breakdown such as females versus males, and 1200+ per country for doing third-level breakdowns such as males aged 18-24 in Nairobi.

How to Calculate Sample Size

As we have defined all the necessary terms, let us briefly learn how to determine the sample size using a sample calculation formula known as Andrew Fisher’s Formula.

  • Determine the population size (if known).
  • Determine the confidence interval.
  • Determine the confidence level.
  • Determine the standard deviation ( a standard deviation of 0.5 is a safe choice where the figure is unknown )
  • Convert the confidence level into a Z-Score. This table shows the z-scores for the most common confidence levels:
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58
  • Put these figures into the sample size formula to get your sample size.

sample size calculation

Here is an example calculation:

Say you choose to work with a 95% confidence level, a standard deviation of 0.5, and a confidence interval (margin of error) of ± 5%, you just need to substitute the values in the formula:

((1.96)2 x .5(.5)) / (.05)2

(3.8416 x .25) / .0025

.9604 / .0025

Your sample size should be 385.

Fortunately, there are several available online tools to help you with this calculation. Here’s an online sample calculator from Easy Calculation. Just put in the confidence level, population size, the confidence interval, and the perfect sample size is calculated for you.

GeoPoll’s Sampling Techniques

With the largest mobile panel in Africa, Asia, and Latin America, and reliable mobile technologies, GeoPoll develops unique samples that accurately represent any population. See our country coverage  here , or  contact  our team to discuss your upcoming project.

Related Posts

Sample Frame and Sample Error

Probability and Non-Probability Samples

How GeoPoll Conducts Nationally Representative Surveys

  • Tags market research , Market Research Methods , sample size , survey methodology

thesis sample size calculator

Beyond Focus Groups: Segal Benz Delivers Authentic Employee Insights with Remesh

line

Read Article

icon

August 20, 2024

Case Studies

thesis sample size calculator

How AI is Redefining the Use of Audience Data

Team Remesh

August 9, 2024

Market Research

thesis sample size calculator

Research 101

Want to Fix Things Fast? Take a Rapid Research Approach

Patrick Hyland

August 8, 2024

Employee Research

thesis sample size calculator

For Researchers, by Researchers: How Remesh Implements a Customer Feedback Loop

August 6, 2024

thesis sample size calculator

Advanced Research

The Global Research Revolution: From Clipboards to Clicks

August 1, 2024

thesis sample size calculator

How and Where Remesh Uses AI in Market Research

July 19, 2024

thesis sample size calculator

How to Avoid Confirmation Bias in Research

June 25, 2024

thesis sample size calculator

Compressing Research Timelines Without Sacrificing Quality

June 21, 2024

thesis sample size calculator

Hidden Insights: The Power of Participant Voting in Market Research

How to calculate sample size using a sample size formula.

Learn how to calculate sample size with a margin of error using these simple sample size formulas for your market research.

thesis sample size calculator

Anika Nishat

March 22, 2024

What is sample size determination and why is it important

Finding an appropriate sample size, otherwise known as sample size determination, is a crucial first step in market research. Understanding why sample size is important is equally crucial. The answer: it ensures the robustness, reliability, and believability of your research findings. But how is sample size determined?

Calculating your sample size

During the course of your market research , you may be unable to reach the entire population you want to gather data about. While larger sample sizes bring you closer to a 1:1 representation of your target population, working with them can be time-consuming, expensive, and inconvenient. However, small samples risk yielding results that aren’t representative of the target population. It can be tricky because determining the ideal sample size for statistical significance ensures your research yields reliable and actionable insights.

Luckily, you can easily identify an ideal subset that represents the population and produces strong, statistically significant results that don’t gobble up all of your resources. In this article, we’ll teach you how to calculate sample size with a margin of error to identify that subset.

Five steps to finding your sample size

  • Define population size or number of people

Designate your margin of error

  • Determine your confidence level
  • Predict expected variance
  • Finalize your sample size

What is a good statistical sample size can vary depending on your research goals. But by following these five steps, you'll ensure you get the right selection size for your research needs.

Download: The State of AI in Market Research

Define the size of your population.

Your sample size needs will differ depending on the true population size or the total number of people you're looking to conclude on. That's why determining the minimum sample size for statistical significance is an important first step.

Defining the size of your population can be easier said than done. While there is a lot of population data available, you may be targeting a complex population or for which no reliable data currently exists.

Knowing the size of your population is more important when dealing with relatively small, easy-to-measure groups of people. If you're dealing with a larger population, take your best estimate, and roll with it.

This is the first step in a sample size formula, yielding more accurate results than a simple estimate – and accurately reflecting the population.

Random sample errors are inevitable whenever you're using a subset of your total population. Be confident that your results are accurate by designating how much error you intend to permit: that's your margin of error.

Sometimes called a "confidence interval," a margin of error indicates how much you're willing for your sample mean to differ from your population mean . It's often expressed alongside statistics as a plus-minus (±) figure, indicating a range which you can be relatively certain about.

For example, say you take a sample proportion of your colleagues with a designated 3% margin of error and find that 65% of your office uses some form of voice recognition technology at home. If you were to ask your entire office, you could be sure that in reality, as low as 62% and as high as 68% might use some form of voice recognition technology at home.

Determine how confident you can be

Your confidence level reveals how certain you can be that the true proportion of the total population would pick an answer within a particular range. The most common confidence levels are 90%, 95%, and 99%. Researchers most often employ a 95% confidence level.

Don't confuse confidence levels for confidence intervals (i.e., mean of error). Remember the distinction by thinking about how the concepts relate to each other to sample more confidently.

In our example from the previous step, when you put confidence levels and intervals together, you can say you're 95% certain that the true percentage of your colleagues who use voice recognition technology at home is within ± three percentage points from the sample mean of 65%, or between 62% and 68%.

Your confidence level corresponds to something called a "z-score." A z-score is a value that indicates the placement of your raw score (meaning the percent of your confidence level) in any number of standard deviations below or above the population mean.

Z-scores for the most common confidence intervals are:

  • 90% = 2.576
  • 99% = 2.576

While not as commonly used, the z-score for an 80% confidence interval is approximately 1.28. If you're using a different confidence interval, use this z-score table . A z-score sample calculator like this will quickly determine the appropriate value for your chosen confidence level.

Predict variance by calculating standard deviation in a sample

The last thing you'll want to consider when calculating your sample size is the amount of variance you expect to see among participant responses.

The standard deviation in a sample measures how much individual sample data points deviate from the average population.

Don't know how much variance to expect? Use the standard deviation of 0.5 to make sure your group is large enough.

Read: Best Practices for Writing Discussion Guides (eBook)

Finding your ideal sample size.

Now that you know what goes into determining sample size, you can easily calculate sample size online. Consider using a sample size calculator to ensure accuracy. Or, calculate it the old-fashioned way: by hand.

Below, find two sample size calculations - one for the known population proportion and one for the unknown population.

Sample size for known population

how to calculate sample size for known population

Sample size for unknown population

how to calculate sample size for unknown population

Here’s how the calculations work out for our voice recognition technology example in an office of 500 people, with a 95% confidence level and 5% margin of error:

How is sample size determined

There you have it! 197 respondents are needed.

You can tweak some things if that number is too big to swallow.

Try increasing your margin of error or decreasing your confidence level. This will reduce the number of respondents necessary but, unfortunately, increase the chances of errors. Even so, understanding why trade-offs are necessary in sample size determination can help you make informed decisions.

Summing Up Sample Size

Calculating sample size sounds complicated - but, utilizing an easy sample size formula and even calculators are now available to make this tedious part of market research faster!

Once you've determined your sample size, you're ready to create and distribute your sample market research survey. This can be done through methods like running a focus group or even a customer satisfaction survey . Whatever you decide, you now have the information needed to make decisions with confidence.

Want to whip your research skills into shape? Check out our go-to eBook on writing discussion guides !

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

thesis sample size calculator

Three Ways to Energize Your Employee Listening Program

Stay up-to date..

Stay ahead of the curve. Get it all. Or get what suits you. Our 101 material is great if you’re used to working with an agency. Are you a seasoned pro? Sign up to receive just our advanced materials.

thesis sample size calculator

Get insights in your inbox . Sign up to stay up to date on the latest research tactics and breakthroughs.

icon

©All Rights Reserved 2024. Read our Privacy policy

No data capture tricks. No bull SEO content for ranking. Just good, solid, honest, research ed.

By entering your email address, you agree to receive marketing communications in accordance with our privacy policy.

icon

Request a demo

Dive Deeper

Root out friction in every digital experience, super-charge conversion rates, and optimise digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered straight to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Meet the operating system for experience management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Sampling Methods
  • How To determine Sample Size

Try Qualtrics for free

Determining sample size: how to make sure you get the correct sample size.

16 min read Sample size can make or break your research project. Here’s how to master the delicate art of choosing the right sample size.

What is sample size?

Sample size is the beating heart of any research project. It’s the invisible force that gives life to your data, making your findings robust, reliable and believable.

Sample size is what determines if you see a broad view or a focus on minute details; the art and science of correctly determining it involves a careful balancing act. Finding an appropriate sample size demands a clear understanding of the level of detail you wish to see in your data and the constraints you might encounter along the way.

Remember, whether you’re studying a small group or an entire population, your findings are only ever as good as the sample you choose.

Free eBook: Empower your market research efforts today

Let’s delve into the world of sampling and uncover the best practices for determining sample size for your research.

How to determine sample size

“How much sample do we need?” is one of the most commonly-asked questions and stumbling points in the early stages of  research design . Finding the right answer to it requires first understanding and answering two other questions:

How important is statistical significance to you and your stakeholders?

What are your real-world constraints.

At the heart of this question is the goal to confidently differentiate between groups, by describing meaningful differences as statistically significant.  Statistical significance  isn’t a difficult concept, but it needs to be considered within the unique context of your research and your measures.

First, you should consider when you deem a difference to be meaningful in your area of research. While the standards for statistical significance are universal, the standards for “meaningful difference” are highly contextual.

For example, a 10% difference between groups might not be enough to merit a change in a marketing campaign for a breakfast cereal, but a 10% difference in efficacy of breast cancer treatments might quite literally be the difference between life and death for hundreds of patients. The exact same magnitude of difference has very little meaning in one context, but has extraordinary meaning in another. You ultimately need to determine the level of precision that will help you make your decision.

Within sampling, the lowest amount of magnification – or smallest sample size – could make the most sense, given the level of precision needed, as well as timeline and budgetary constraints.

If you’re able to detect statistical significance at a difference of 10%, and 10% is a meaningful difference, there is no need for a larger sample size, or higher magnification. However, if the study will only be useful if a significant difference is detected for smaller differences – say, a difference of 5% — the sample size must be larger to accommodate this needed precision. Similarly, if 5% is enough, and 3% is unnecessary, there is no need for a larger statistically significant sample size.

You should also consider how much you expect your responses to vary. When there isn’t a lot of variability in response, it takes a lot more sample to be confident that there are statistically significant differences between groups.

For instance, it will take a lot more sample to find statistically significant differences between groups if you are asking, “What month do you think Christmas is in?” than if you are asking, “How many miles are there between the Earth and the moon?”. In the former, nearly everybody is going to give the exact same answer, while the latter will give a lot of variation in responses. Simply put, when your variables do not have a lot of variance, larger sample sizes make sense.

Statistical significance

The likelihood that the results of a study or experiment did not occur randomly or by chance, but are meaningful and indicate a genuine effect or relationship between variables.

Magnitude of difference

The size or extent of the difference between two or more groups or variables, providing a measure of the effect size or practical significance of the results.

Actionable insights

Valuable findings or conclusions drawn from  data analysis  that can be directly applied or implemented in decision-making processes or strategies to achieve a particular goal or outcome.

It’s crucial to understand the differences between the concepts of “statistical significance”, “magnitude of difference” and “actionable insights” – and how they can influence each other:

  • Even if there is a statistically significant difference, it doesn’t mean the magnitude of the difference is large: with a large enough sample, a 3% difference could be statistically significant
  • Even if the magnitude of the difference is large, it doesn’t guarantee that this difference is statistically significant: with a small enough sample, an 18% difference might not be statistically significant
  • Even if there is a large, statistically significant difference, it doesn’t mean there is a story, or that there are actionable insights

There is no way to guarantee statistically significant differences at the outset of a study – and that is a good thing.

Even with a sample size of a million, there simply may not be any differences – at least, any that could be described as statistically significant. And there are times when a lack of significance is positive.

Imagine if your main competitor ran a multi-million dollar ad campaign in a major city and a huge pre-post study to detect campaign effects, only to discover that there were no statistically significant differences in  brand awareness . This may be terrible news for your competitor, but it would be great news for you.

relative importance of age

With Stats iQ™ you can analyze your research results and conduct significance testing

As you determine your sample size, you should consider the real-world constraints to your research.

Factors revolving around timings, budget and target population are among the most common constraints, impacting virtually every study. But by understanding and acknowledging them, you can definitely navigate the practical constraints of your research when pulling together your sample.

Timeline constraints

Gathering a larger sample size naturally requires more time. This is particularly true for elusive audiences, those hard-to-reach groups that require special effort to engage. Your timeline could become an obstacle if it is particularly tight, causing you to rethink your sample size to meet your deadline.

Budgetary constraints

Every sample, whether large or small, inexpensive or costly, signifies a portion of your budget. Samples could be like an open market; some are inexpensive, others are pricey, but all have a price tag attached to them.

Population constraints

Sometimes the individuals or groups you’re interested in are difficult to reach; other times, they’re a part of an extremely small population. These factors can limit your sample size even further.

What’s a good sample size?

A good sample size really depends on the context and goals of the research. In general, a good sample size is one that accurately represents the population and allows for reliable statistical analysis.

Larger sample sizes are typically better because they reduce the likelihood of  sampling errors  and provide a more accurate representation of the population. However, larger sample sizes often increase the impact of practical considerations, like time, budget and the availability of your audience. Ultimately, you should be aiming for a sample size that provides a balance between statistical validity and practical feasibility.

4 tips for choosing the right sample size

Choosing the right sample size is an intricate balancing act, but following these four tips can take away a lot of the complexity.

1) Start with your goal

The foundation of your research is a clearly defined goal. You need to determine what you’re trying to understand or discover, and use your goal to guide your  research methods  – including your sample size.

If your aim is to get a broad overview of a topic, a larger, more diverse sample may be appropriate. However, if your goal is to explore a niche aspect of your subject, a smaller, more targeted sample might serve you better. You should always align your sample size with the objectives of your research.

2) Know that you can’t predict everything

Research is a journey into the unknown. While you may have hypotheses and predictions, it’s important to remember that you can’t foresee every outcome – and this uncertainty should be considered when choosing your sample size.

A larger sample size can help to mitigate some of the risks of unpredictability, providing a more diverse range of data and potentially more accurate results. However, you shouldn’t let the fear of the unknown push you into choosing an impractically large sample size.

3) Plan for a sample that meets your needs and considers your real-life constraints

Every research project operates within certain boundaries – commonly budget, timeline and the nature of the sample itself. When deciding on your sample size, these factors need to be taken into consideration.

Be realistic about what you can achieve with your available resources and time, and always tailor your sample size to fit your constraints – not the other way around.

4) Use best practice guidelines to calculate sample size

There are many established guidelines and formulas that can help you in determining the right sample size.

The easiest way to define your sample size is using a  sample size calculator , or you can use a manual sample size calculation if you want to test your math skills. Cochran’s formula is perhaps the most well known equation for calculating sample size, and widely used when the population is large or unknown.

Cochran's sample size formula

Beyond the formula, it’s vital to consider the confidence interval, which plays a significant role in determining the appropriate sample size – especially when working with a  random sample  – and the sample proportion. This represents the expected ratio of the target population that has the characteristic or response you’re interested in, and therefore has a big impact on your correct sample size.

If your population is small, or its variance is unknown, there are steps you can still take to determine the right sample size. Common approaches here include conducting a small pilot study to gain initial estimates of the population variance, and taking a conservative approach by assuming a larger variance to ensure a more representative sample size.

Empower your market research

Conducting meaningful research and extracting actionable intelligence are priceless skills in today’s ultra competitive business landscape. It’s never been more crucial to stay ahead of the curve by leveraging the power of market research to identify opportunities, mitigate risks and make informed decisions.

Equip yourself with the tools for success with our essential eBook,  “The ultimate guide to conducting market research” .

With this front-to-back guide, you’ll discover the latest strategies and best practices that are defining effective market research. Learn about practical insights and real-world applications that are demonstrating the value of research in driving business growth and innovation.

Learn how to determine sample size

To choose the correct sample size, you need to consider a few different factors that affect your research, and gain a basic understanding of the statistics involved. You’ll then be able to use a sample size formula to bring everything together and sample confidently, knowing that there is a high probability that your survey is statistically accurate.

The steps that follow are suitable for finding a sample size for continuous data – i.e. data that is counted numerically. It doesn’t apply to categorical data – i.e. put into categories like green, blue, male, female etc.

Stage 1: Consider your sample size variables

Before you can calculate a sample size, you need to determine a few things about the target population and the level of accuracy you need:

1. Population size

How many people are you talking about in total? To find this out, you need to be clear about who does and doesn’t fit into your group. For example, if you want to know about dog owners, you’ll include everyone who has at some point owned at least one dog. (You may include or exclude those who owned a dog in the past, depending on your research goals.) Don’t worry if you’re unable to calculate the exact number. It’s common to have an unknown number or an estimated range.

2. Margin of error (confidence interval)

Errors are inevitable – the question is how much error you’ll allow. The margin of error , AKA confidence interval, is expressed in terms of mean numbers. You can set how much difference you’ll allow between the mean number of your sample and the mean number of your population. If you’ve ever seen a political poll on the news, you’ve seen a confidence interval and how it’s expressed. It will look something like this: “68% of voters said yes to Proposition Z, with a margin of error of +/- 5%.”

3. Confidence level

This is a separate step to the similarly-named confidence interval in step 2. It deals with how confident you want to be that the actual mean falls within your margin of error. The most common confidence intervals are 90% confident, 95% confident, and 99% confident.

4. Standard deviation

This step asks you to estimate how much the responses you receive will vary from each other and from the mean number. A low standard deviation means that all the values will be clustered around the mean number, whereas a high standard deviation means they are spread out across a much wider range with very small and very large outlying figures. Since you haven’t yet run your survey, a safe choice is a standard deviation of .5 which will help make sure your sample size is large enough.

Stage 2: Calculate sample size

Now that you’ve got answers for steps 1 – 4, you’re ready to calculate the sample size you need. This can be done using an  online sample size calculator  or with paper and pencil.

1. Find your Z-score

Next, you need to turn your confidence level into a Z-score. Here are the Z-scores for the most common confidence levels:

  • 90% – Z Score = 1.645
  • 95% – Z Score = 1.96
  • 99% – Z Score = 2.576

If you chose a different confidence level, use this  Z-score table  (a resource owned and hosted by SJSU.edu) to find your score.

2. Use the sample size formula

Plug in your Z-score, standard of deviation, and confidence interval into the  sample size calculator  or use this sample size formula to work it out yourself:

Sample size formula graphic

This equation is for an unknown population size or a very large population size. If your population is smaller and known, just  use the sample size calculator.

What does that look like in practice?

Here’s a worked example, assuming you chose a 95% confidence level, .5 standard deviation, and a margin of error (confidence interval) of +/- 5%.

((1.96)2 x .5(.5)) / (.05)2

(3.8416 x .25) / .0025

.9604 / .0025

385 respondents are needed

Voila! You’ve just determined your sample size.

eBook: 2022 Market Research Global Trends Report

Related resources

Convenience sampling 15 min read, non-probability sampling 17 min read, probability sampling 8 min read, stratified random sampling 13 min read, simple random sampling 10 min read, sampling methods 15 min read, sampling and non-sampling errors 10 min read, request demo.

Ready to learn more about Qualtrics?

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 23 August 2024

Disrupted brain functional connectivity as early signature in cognitively healthy individuals with pathological CSF amyloid/tau

  • Abdulhakim Al-Ezzi 1 ,
  • Rebecca J. Arechavala   ORCID: orcid.org/0000-0002-9799-2610 2 ,
  • Ryan Butler 1 ,
  • Anne Nolty 3 ,
  • Jimmy J. Kang 4 ,
  • Shinsuke Shimojo   ORCID: orcid.org/0000-0002-1290-5232 5 ,
  • Daw-An Wu   ORCID: orcid.org/0000-0003-4296-3369 5 ,
  • Alfred N. Fonteh 1 ,
  • Michael T. Kleinman 2 ,
  • Robert A. Kloner 1 , 6 &
  • Xianghong Arakaki 1  

Communications Biology volume  7 , Article number:  1037 ( 2024 ) Cite this article

Metrics details

  • Cerebrospinal fluid proteins
  • Cognitive control
  • Diagnostic markers
  • Neurophysiology

Alterations in functional connectivity (FC) have been observed in individuals with Alzheimer’s disease (AD) with elevated amyloid ( A β ) and tau. However, it is not yet known whether directed FC is already influenced by A β and tau load in cognitively healthy (CH) individuals. A 21-channel electroencephalogram (EEG) was used from 46 CHs classified based on cerebrospinal fluid (CSF) A β tau ratio: pathological (CH-PAT) or normal (CH-NAT). Directed FC was estimated with Partial Directed Coherence in frontal, temporal, parietal, central, and occipital regions. We also examined the correlations between directed FC and various functional metrics, including neuropsychology, cognitive reserve, MRI volumetrics, and heart rate variability between both groups. Compared to CH-NATs, the CH-PATs showed decreased FC from the temporal regions, indicating a loss of relative functional importance of the temporal regions. In addition, frontal regions showed enhanced FC in the CH-PATs compared to CH-NATs, suggesting neural compensation for the damage caused by the pathology. Moreover, CH-PATs showed greater FC in the frontal and occipital regions than CH-NATs. Our findings provide a useful and non-invasive method for EEG-based analysis to identify alterations in brain connectivity in CHs with a pathological versus normal CSF A β /tau.

Introduction

Alzheimer’s disease (AD) is a neurological disorder in which progressive neurodegeneration and synaptic dysfunction result in impairments in a range of cognitive domains. With the continual rise of the global population and life expectancy, it is anticipated that the prevalence of neurocognitive disorders or dementia will experience a substantial surge, reaching an estimated 74.7 million individuals by 2030 and more than 131.5 million by 2050 worldwide 1 , 2 . Recent research reported early impairments in executive functions and memory among individuals afflicted with A β and/or tau pathologies 3 , 4 , 5 . These findings provide validation for the notion that executive functions and episodic memory 6 , 7 , 8 are indeed affected during the initial stages of AD, primarily due to the alteration or pathology of the frontal and temporal cortices 9 , 10 . More specifically, inhibitory abilities 11 , attentional processes 12 , 13 , and visuospatial functions 14 appear to be particularly compromised. A defining feature of the progression of AD is the reduction in A β protein (resulting in low levels of CSF amyloid- β ( A β ) and the rise in neuronal degeneration biomarkers (such as increased levels of CSF total tau and phosphorylated tau) in individuals with AD. The reduction in CSF A β levels seems to occur in the early progression of AD, becoming apparent more than twenty years before the onset of any clinical symptoms 15 . In individuals with AD or those who are at risk of developing AD, the amyloid- β -to-tau ratio is often low, indicating an accumulation of A β plaques and/or tau tangles in the brain 6 , 16 , 17 . Lei Wang et al, found that lower CSF A β 42 levels and higher tau/ A β 42 ratios were strongly correlated with a reduction in hippocampal volume and indicators of progressive atrophy of the cornu ammonis subfield in pre-clinical AD individuals, but not cognitively healthy (CH) individuals 18 . Compared to the A β 42 and/or tau, the A β 42/tau ratio demonstrated greater sensitivity in detecting pre-symptomatic AD and distinguishing it from frontotemporal dementia 19 . Consequently, it is plausible that A β 42/tau ratio may serve as a sensitive biomarker in detecting the earliest stages of preclinical AD compared to individual biomarkers. Preclinical investigations offer robust evidence supporting functional connectivity as a probable intermediary mechanism linking A β to tau secretion and accumulation 20 . Despite the significant research dedicated to unraveling AD pathogenesis, there is currently a lack of sensitive, specific, reliable, objective, and easily scalable biomarkers or endpoints to guide clinical trials and facilitate early risk detection in clinical settings.

Several large prospective studies attempt to characterize the early diagnostic criteria in people at risk of developing AD. These assessments include pathological markers (both Beta Amyloid ( A β ) and tau pathologies) 21 , neuropsychological scores (Montreal Cognitive Assessment (MoCA) and Mini-Mental State Examination (MMSE)) 22 , neuroimaging (Magnetic resonance imaging (MRI), Magnetoencephalography (MEG), and electroencephalogram (EEG) 23 , and heart rate variability (HRV) 24 . EEG Brain connectivity, MRI brain structures, neuropsychological assessments, and HRV are intricately correlated measures that can predict early AD pathology. For example, a recent examination of HRV from the Multi-Ethnic study of atherosclerosis revealed a correlation between higher HRV and superior cognitive function across various cognitive domains 25 . We previously reported a significant association between high resting HR and less negative alpha event related resynchronization (ERD) during Stroop testing in individuals with pathological A β /tau, compared with those with normal A β /tau 26 . These findings prompt further investigation into brain connectivity involved with pathological A β /tau presence during task-switching tasks. Therefore, we aim to integrate brain activity, neuropsychology, and HRV assessments in this study to facilitate early detection of AD risks, understand disease mechanisms, and ultimately help improving outcomes for individuals affected by AD by addressing the multifaceted nature of the disease.

The abnormality of brain connectivity measured by MRI in regions with early A β -burden (e.g., default mode network (DMN) has been shown when A β fibrils just start to accumulate 27 . However, this abnormality has not been reported or tested in EEG investigations to our knowledge. Both A β and tau pathologies have been shown to impact brain network’s structural and FC 28 . Abnormal FC has been consistently identified in the early stages of AD before the appearance of clinical symptoms or brain structural changes 29 . For instance, a recent study has achieved 90% accuracy in classifying brain A β and tau pathology in subjective cognitive decline from mild cognitive impairment (MCI) individuals using EEG coherence 30 . FC studies have found that abnormal cerebrospinal fluid (CSF) levels of phosphorylated-tau and A β in early AD are linked with disrupted cortical networks involving the anterior and posterior cingulate cortex, and temporal and frontal cortices 31 . We previously reported that ERD increased in CH with pathological A β tau ratio (CH-PATs) 32 , compared to CH with normal A β tau ratio (CH-NATs) in alpha band. Using regional interconnectivity methods, a previous study found that the temporal and frontal regions’ connection is a characteristic pattern for the pathological transition of normal to MCI and the density of edges in these networks is a differential pattern between HC and MCI 33 . The decreased patterns of regional hemispheric interconnectivity in the metabolic network rely on the pathology severity 33 . Therefore, EEG can be a promising, diagnostic, noninvasive, high temporal resolution method, which is a cost-effective biomarker and easily accessible to track and predict the severity of cognitive dysfunction in degenerative diseases. As memory (predominantly localized in the temporal region) and executive functions (mainly associated with the frontal region) are the two sensitive cognitive activities that were abnormal in early AD 9 , 10 , it will be compelling to study frontal and temporal FC in the early stage of AD spectrum.

In the present study, we aimed to: (1) compare effective connectivity (EC) between CH-NATs and CH-PATs during task switching, and explore the potential contribution of task difficulty levels; (2) study the links between EC and Neuropsychological measures, structural MRI brain volumes, and HRV.

Participant characteristics

The demographic and clinical characteristics of our subjects have been reported in our previous work 32 ). The participants’ age in CH-PATs and CH-NATs were comparable and both groups also had similar educational levels, with mean years of education. There were no differences in cognitive reserve (CR) and intelligence quotient (IQ) scores between CH-NATs and CH-PATs.

Behavioral analysis

The difference between the accuracy (ACC) and reaction time (RT) scored under the effect of trial types (repeat or switch) was notable with a significantly improved RT ( p  < 0.0001) and ACC ( p  = 0.048) during repeat trials than during switch trials. In addition, the results of this study showed no significant differences in group  × trial type interaction in RT, F(1, 108) = 0.0001, p  = 0.991 and ACC, F(1, 108) = 0.003, p  = 0.960 between the two groups of all CH-PAT and CH-NAT participants. A comparison of the main effect of trial types and group  × trial between CH-NATs and CH-PATs was reported previously in our work 32 .

EEG power spectral density

The comparison of the normalized alpha power in resting-state and task-switching at the five regions in the CH-PATs and CH-NATs is shown in Fig.  1 . The CH-NATs showed significantly stronger spectral power of task-switching alpha at temporal, parietal, and occipital electrodes when compared to CH-PATs ( p  < 0.0001), ( p  = 0.0005 and p  < 0.0001), respectively as shown in (Fig.  1 a, b). On the contrary, there were no significant differences between CH-PATS and CH-NATs in frontal or central power. In the resting state, there were no significant differences in alpha power changes between all brain regions (Fig.  1 b, d).

figure 1

a , b A group comparison between CH-PATs and CH-NATs using t -test in different brain regions (Frontal, temporal, parietal, central, occipital) within the range (200–550 ms), where 0 ms is the onset of the stimulus during task switching and resting-state, respectively. c , d Shows the avereged topographical distribution of alpha power in CH-PATs and CH-NATs during task switching and resting-state, respectively. e , f Shows the mean normalized absolute PSD for all electrodes in the frequency domain (0–50Hz) for CH-NATs and CH-PATs during the switching task and resting state. Frequency bands are decomposed into the following: delta (0.4–4 Hz), theta (4.1–8 Hz), alpha (8.1–12 Hz), and beta (12.1–30 Hz). * P  < 0.05, ** P  < 0.01, *** P  < 0.001.

The mean partial directed coherence (PDC) FC pattern of CH-PATs and CH-NATs were shown in Fig.  2 a, b, respectively, and we qualitatively observed that the main difference between the two groups was that CH-PATs patients exhibited much more and much weaker long-range connections from left and right temporal cortices than CH-NATs. Specifically, compared with the CH-NATs group (0.135 ± 0.017), the averaged information flow in the alpha frequency of CH-PATs was enhanced in the frontal regions during task switching processing (0.223 ± 0.014); t (44) = 18.06, p  < 0.0001. On the contrary, CH-NATs showed increased information flow from temporal region (0.187 ± 0.016) compared to CH-PATs (0.09 ± 0.015), t (44) = 21.06, p  < 0.001. On the contrary, central, parietal, and occipital regions did not show any significant differences between CH-NATs and CH-PATs. It is also interesting to note that resting-state EC showed a significant difference between CH-NATs and CH-PATs only in the occipital cortex (0.082 ± 0.03), (0.105 ± 0.036), t (44) = 2.36, p  = 0.023 as shown in Fig.  2 c, d. Differences in brain connectivity (Switching task connectivity values - Resting state connectivity values) were calculated across five distinct brain regions (Fig.  2 e, f). Results revealed differences between CH-NATs and CH-PATs in frontal ((0.036 ± 0.041), (0.084 ± 0.037), t (44) = 6.985, p  < 0.0001) and temporal ((0.072 ± 0.039),(−0.028 ± 0.027), t (44) = 10.21, p  < 0.0001)), while no differences were observed in the parietal, occipital, or central regions.

figure 2

Representation of the functional networks as graphs in the Alpha frequency band at stimuli time (200–550 ms) after the onset of the stimulus. PDC from an area i to j is represented by an arrow. a , c , e Group connectivity comparison between CH-PATs and CH-NATs during task switching, resting state, and the differences between task switching and resting (Task-rest), respectively in the alpha band. b , d , f The directed connectivity of the CH-PATs (left) and CH-NATs (right) during task switching, resting state, and the differences between task switching and resting (Task-rest), respectively in the alpha band. The brain regions are graphically represented with connections depicting causal influence at (200–550 ms). The brain surface templates we used to visualize these connections in Fig.  2 are primarily generated from a commonly used template known as MNI/Talaraich (ICBM152).

To validate the outcomes concerning directed connectivity as measured by PDC, we employed multiple functional phase connectivity methodologies, including Weighted Phase Lag Index (wPLI) and Phase locking value (PLV). Although FC and EC can be associated, they estimate distinct characteristics of brain interactions, and the presence of one does not inherently imply the presence of the other. During task-switching, wPLI results showed that CH-PATs demonstrated significantly higher phase connectivity in frontal and central regions (Supplementary Fig.  1 ). In addition, PLV analysis showed decreased phase coherence in frontal and occipital regions (Supplementary Fig.  2 ). Detailed results, including additional analyses and comparisons utilizing wPLI and PLV algorithms, are presented in this manuscript’s supplementary file.

Task difficulty

To examine the potential contribution of task difficulty level to the EC differences, we selectively compared connectivity between “good performers” in CH-PATs and “bad performers” from CH-NATs to bring down the potential differences and check if the frontal and temporal EC differences remained. Task difficulty was determined through the calculation of performance indicators, namely ACC or RT, and brain connectivity. In this context, the ACC results demonstrated a significant difference between CH-NATs (0.79 ± 0.13) and CH-PATs (0.96  ± 0.03) with p  < 0.0001. Similarly, the results indicated that CH-NATs showed a significant increase in RT (1743.63 ± 265.6) compared to CH-PATs (1218.47 ± 167.66) with p  < 0.0001. For this condition, we sorted the EC based on the ACC and RT data (i.e., low-vs-high-connectivity) and constructed a statistical analysis between CH-PATs and CH-NATs. For instance, using EC based on the ACC classification, the CH-PATs showed increased frontal connectivity (0.22 ± 0.02) compared to CH-NATs (0.131 ± 0.02) with p  = 0.0009. Additionally, the CH-PATs showed decreased temporal connectivity (0.09 ± 0.02) compared to CH-NATs (0.187 ± 0.006) with p  < 0.0001. Furthermore, using EC based on the RT classification, the CH-PATs showed increased frontal connectivity (0.22 ± 0.01) compared to CH-NATs (0.13 ± 0.02) with p  = 0.0008. The CH-PATs showed decreased temporal connectivity (0.09  ± 0.02) compared to CH-NATs (0.188  ± 0.012) with p  < 0.0001. This rigorous comparative analysis supported that the EC differences are independent of task difficulty levels. Figure  3 illustrates mean scores for task difficulty based on ACC and RT classifications for the color-Word (cW) test.

figure 3

a The Reaction Time (RT) was different between CH-NATs and CH-PATs. The values represent the best 50% performance of RT (lowest RT values) in CH-PAT participants and the worst 50% performance (Highest RT values) in CH-NATs during high-load color-word switch trials. The comparison was performed between the two groups for frontal ( b ) and temporal ( c ) connectivity for the same participants. d The ACC was significantly different between CH-NATs and CH-PATs. The values represent the good 50% performance of ACC scores (highest ACC values) in CH-PATs and the worst 50% performance (lowest ACC values) in CH-NATs during high-load color-word switch trials. The comparison was performed between the two groups for frontal ( e ) and temporal ( f ) connectivity for the same participants using a parametric t -test. * p  < 0.05, ** p  < 0.01, *** p  < 0.001, **** p  < 0.0001.

Functional connectivity and neuropsychological and cognitive reserve analysis

Supplementary Table  1 presents several correlations between task-switching brain connectivity in temporal and frontal brain regions and different neuropsychological tests (i.e., processing speed, working memory, and executive functions) between CH-NATs and CH-PATs. Processing speed tests were used to assess the ability to process information rapidly. The higher the score, the more time it has taken, and the worse the performance. Executive function and working memory tests can provide an estimation of a wide range of skills (i.e., working memory and organization). The higher the score, the more time it has taken, and the better the performance. While CH-PATs showed greater scores in performance in speed processing tests, CH-NATS showed a better performance in executive function and working memory tests. Also, the present study aimed to investigate the relationship between cognitive reserve (CR) and parietal connectivity in two groups, CH-NATs and CH-PATs. Our findings revealed a significant negative correlation between CR and parietal connectivity in the CH-NATs group (r = −0.61, p  = 0.030), as shown in Fig.  4 . This suggests that individuals with higher CR tend to exhibit lower parietal connectivity in this group. However, in contrast, the CH-PATs group did not show any differences in CR and parietal EC.

figure 4

A linear regression model was used to estimate the coefficients of linear correlations (Confidence Intervals = 0.95) that relate a set of predictor variables to a response variable.

Functional connectivity and MRI brain volumes

In the CH-PATs group, significant negative correlations were observed during task switching between temporal EC and several brain volumetrics: Fusiform Right Side Volume ( r  = −0.42, p  = 0.043), Hippocampal Occupancy Score (HOC) Norm Percentile ( r  = −0.46, p  = 0.023), and Fusiform Asymmetry Norm Percentile ( r  = −0.41, p  = 0.049) as shown in Fig.  5 . However, CH-NATs did not show significant correlations between temporal EC and the same regions. Moreover, CH-NATs showed significant correlations between temporal EC and Entorhinal Cortex Asymmetry Norm Percentile ( r  = −0.67, p  = 0.013), Fusiform Left Percent Of intracranial volume (ICV) ( r  = −0.58, p  = 0.041), and Fusiform Asymmetry Norm Percentile ( r  = −0.67, p  = 0.015). Additional correlations between frontal and temporal connectivity with brain volumetrics in CH-NATs and CH-PATs are reported in Supplementary Table  2 .

figure 5

a A Correlation analysis between Hippocampal Occupancy Score (HOC) Norm Percentile and temporal EC in two groups;in two groups CH-NATs (blue scatter plots) and CH-PATs (red scatter plots). b Correlation between Fusiform Right Side Volume and temporal EC between two groups; CH-NATs and CH-PATs. A correlation analysis reveals the strength and direction of the association between brain volumetrics and brain connectivity. Spearman correlation was applied and the p  < 0.05 and r (association directionality values) are shown.

Functional connectivity and HRV analysis

Spearman’s correlation analysis was also conducted in both groups to explore the relationship between HRV metrics and EC during task-switching paradigms. In the task-switching condition, CH-PATs exhibited noteworthy findings, revealing significant negative correlations between frontal EC with Root mean square of the successive differences (RMSSD) ( r  = −0.52, p  = 0.020). Conversely, CH-NATs demonstrated significant negative correlations between frontal connectivity and mean RR ( r  = −0.87, p  = 0.002), as shown in Fig.  6 . On the contrary, CH-NATs did not reveal significant correlations in RMSSD measures and brain connectivity as shown in Fig.  6 .

figure 6

a A correlation between mean RR and frontal EC during task switching for two groups CH-NATs and CH-PATs. b A correlation between resting RMSSD and frontal connectivity for the two groups. Spearman correlation was applied and p values were set to  < 0.05 and r (association directionality values) are shown.

The main objective of the present biomarker study was to characterize the effects of the accumulation of A β pathologies and tau concentrations on the directed brain networks in CH individuals. Participants categorized into our CH-NAT and CH-PAT groups were asymptomatic, with normal neurocognitive tests, and were diagnosed based on CSF A β 42 and Tau measures that were within the published ranges 6 . The CSF A β 42/Tau ratio outperforms the CSF A β 42 or tau levels individually to identify dementia and preclinical phases of AD. Abnormal amyloid levels and tau accumulation can disrupt synaptic function by interfering with neurotransmitter release and synaptic plasticity. This disruption can lead to neurotoxicity, microtubule destabilization, neuroinflammation, and alterations in the strength and efficiency of synaptic connections between neurons, ultimately affecting overall brain connectivity. In this exploratory study, we report on several important findings: (1) CH-PATs compared to CH-NATs, presented higher frontal EC, and lower temporal EC, independent from task difficulties. (2) CH-PATs presented significant correlations between temporal or frontal EC and other measures, including neuropsychological measurements (i.e., processing speed, executive functions, and working memory tests), MRI regional volumetrics, and HRV, supporting compensatory mechanisms. These changes are potentially linked to a less strategic approach while performing the task in CH-PATs, or no improvement in efficiency. These results may indicate that CH-PATs may present compensating mechanisms and may lack learning and self-improvement with functions that as seen in advanced intelligence for self-improving mode. Another similar example is during coding, using functions (temporal lobe in CH-NATs) can improve efficiency, while always using whole codes (frontal lobe) but limited functions (temporal lobe) can be exhaustive for CH-PATs.

The identification of effective EEG biomarkers associated with AD pathology holds substantial promise in unraveling the neural mechanisms underlying this neurodegenerative disorder and facilitating its early diagnosis. Growing evidence suggests that EEG measurements reflect the capacity of AD neuropathology on brain neural signal transmission underlying cognitive processes 34 , 35 , 36 . However, the accuracy and reliability of different types of EEG biomarkers, i.e., power and entropy in facilitating the early detection and prediction of AD progression remain largely unknown. To our knowledge, this is the first study to evaluate the impact of promising EEG connectivity on detecting early AD pathology in CH individuals. We provide strong evidence supporting that the inclusion of multidimensional information (i.e., EEG biomarkers, CSF measures, brain volumetrics, and HRV) is highly effective in assessing patients’ pre-symptomatic clinical status. Taken together, our findings suggest that brain connectivity has the potential for the early detection of risk for cognitive decline in CH individuals independently or in association with other measures. Notably, our study corroborates these findings and highlights the significance of EEG metrics and connectivity as pivotal biomarkers for revealing CH-PATs.

The Behavioral results of this study showed no significant differences between all CH-PATs and CH-NATs groups (results were reported previously in ref. 32 ). Additionally, the after-test survey suggested no subjective difficulty levels between the two groups. This indicates that, at the behavioral level, there is no evidence of cognitive decline in CH-PATs at this early stage of the disease. One possible explanation is that the cognitive deficits associated with CH-PATs are not yet severe enough to manifest at the behavioral level 37 . It is also possible that compensatory mechanisms and/or CR may contribute to similar behavioral performance in both groups 38 . When the brain switches attention between tasks, it successfully alternates, but consistent mental replacement of one task with another requires additional effort in terms of time and cognitive resources. This leads to switching costs, which we also observed in our study. Both CH-PATs and CH-NATs exhibited longer RTs during switch trials compared to repeat trials, indicating the presence of a successful switching cost effect 32 , 39 , 40 . Moreover, to investigate the possible contribution of task difficulty level to the EC differences, we selectively compared connectivity values between the best 50% performance of CH-PATs (Highest accuracy scores and lowest RT scores) and the lowest 50% performance from CH-NATs (lowest accuracy scores and highest RT scores) to bring down the potential differences and check if the frontal and temporal EC differences remained. Presumably, these subsets will bring CH-PATs and CH-NATs data closer in subjective difficulty/concentration. If connectivity analysis continues to exhibit consistent differences, it suggests that the connectivity patterns are fundamental, rather than solely results of task difficulty. Conversely, if the connectivity differences diminish, it indicates that they may indeed be influenced by subjective task difficulty. This approach was motivated by the desire to control for the influence of task difficulty on EC alterations and isolate the effects of intrinsic brain connectivity differences. By focusing on individuals with comparable task performance levels, we were able to minimize the confounding effects of task difficulty, ensuring that any observed EC differences were more likely attributable to inherent neurobiological factors rather than variations in task performance 41 . The good performance (increased ACC values and decreased RT values) in Fig.  3 may suggest that this group of CH-PATs did benefit from cognitive reserve, at least on neural activity during task switching. This also could be due to a compensatory increase in the number of neurons and/or synapses in CH-PATs.

During task switching processing compared to CH-NATs, CH-PATs exhibited higher alpha power values in the frontal region, while lower values were observed in temporal and parietal areas as shown in Fig.  1 . These aberrations may signify two distinct pathophysiological alterations: the reduction in alpha power in AD pathology could be attributed to alterations in cortico-cortical connections 42 . We previously reported increased event-related resynchronization (ERD) in CH-PATs, compared to CH-NATs in the alpha band 32 . In contrast to ERD (Negative values calculated by wavelet transform and corrected with baseline), absolute alpha power (Positive values calculated by welch power) is a measure of the overall power that may detect changes in excitability alterations in the brain and does not provide specific information about task-related processing. ERD and absolute alpha power are both measures used in EEG analysis, but they capture different aspects of brain activity. Furthermore, evidentiary results have found higher resting-state alpha power manifestations in the frontal regions among MCI individuals compared to CH individuals 43 . This increase may suggest the recruitment of compensatory mechanisms. Individuals with a cognitive decline may show less vigilance to external stimuli in the resting state and may exhibit diminished capacity to recruit relevant brain regions when performing a task. Previous investigations have consistently reported slowing EEG activity among individuals with MCI and AD. For instance, a recent work substantiates the presence of distinct power resting state EEG rhythms in older individuals with subjective memory complaints (awareness of memory loss), notably showing greater theta power and a subtle reduction in EEG reactivity 44 . In addition, a decreased alpha/beta power and increased theta/delta power across various brain regions, including the frontal, temporal, parietal, and occipital areas 45 were reported. The degeneration of cholinergic neurons in the basal forebrain projecting to the hippocampus and neocortex is believed to play a pivotal role in this process 46 . The present study also examined the resting-state EEG power analysis in CH-PATs and CH-NATs. Our findings revealed no significant differences in EEG power between the two groups during the resting state. This lack of significant differences suggests that the EEG resting-state brain activity may not be significantly affected by the pathological amyloid/tau. Such results may indicate compensatory mechanisms or variability within the groups, which might contribute to the absence of significant differences. The absence of significance also may indicate that cognitive challenge can help in revealing subtle changes in brain activities 47 , 48 .

In our investigation, we employed directed EC measures in the alpha frequency band, which are reliable, valid, and less influenced by confounding factors such as volume conduction 49 . Unlike undirected functional connectivity (i.e., coherence, phase lag index (PLI), and Phase Locked Value (PLV)) or Structural connectivity (anatomical links between neuronal populations), Effective connectivity (EC) (i.e., partial directed coherence (PDC)) among different EEG features examines the causal and directional influences between distant brain networks. Our study provides evidence of EEG changes associated with pre-clinical AD neuropathologies ( A β and tau). Specifically, we found a significant association between the A β /tau in CSF and an increase in CH-PATs frontal alpha connectivity. Reduced levels of A β peptides in the CSF indicate heightened A β deposition in the brain, while elevated levels of CSF tau protein, derived from damaged neuronal microtubules, serve as reliable biological indicators of AD and predictors of MCI conversion 50 . It has been found that synaptic dysfunction is a fundamental deficit in AD, preceding the emergence of hallmark pathological changes 51 . Soluble A β oligomers and tau fibrillar lesions disrupt synaptic plasticity and contribute to synaptic loss, resulting in the impairment of neural networks. Consequently, MCI and pre-symptomatic AD are better characterized as disruptions in functional and structural integration of neural systems rather than localized abnormalities. Our study observed increased frontal connectivity in CH-PATs. This finding suggests possible compensatory responses within executive networks and the presence of synaptotoxicity and neuronal dysfunction associated with presymptomatic AD-related pathology 52 . Furthermore, preclinical studies have suggested that increased synchrony in cortical circuits among individuals with pathological A β /tau may be attributed to reduced inhibitory neurotransmission mediated by GABAergic mechanisms rather than increased excitatory transmission 53 . Furthermore, tau was hypothesized to be associated with a breakdown in predictive neural coding 54 .

We speculate that CH-NATs exhibited pronounced temporal lobe connectivity in terms of causal interactions, surpassing those observed in CH-PATs. These results align with existing evidence indicating that the preclinical stages and MCI are characterized by significant atrophy and hypometabolism primarily in the posterior hippocampal, cingulate, temporal, and parietal regions. Particularly, these affected regions collectively resemble the memory network and default mode network as delineated in healthy individuals using task-free fMRI paradigms 55 . In summary, decreased brain connectivity is believed to be associated with memory decline 56 .

Furthermore, our analysis of EEG data using three distinct connectivity measures (that is, PDC, PLV, and wPLI) revealed complementary insights into neural connectivity patterns 57 . PDC analysis showed significant EC between the frontal and temporal cortices, pointing to greater information flow from these regions. In contrast, PLV identified significant phase synchronization between the frontal and occipital cortices, suggesting coupled activity potentially related to top-down visual attention processes 58 . WPLI results found significant phase synchronization in the frontal, parietal, and central cortices, indicating inhibitory effects of visuospatial attention, inhibition of return, and inhibitory control 59 . Different brain regions might be more involved in either directional influence or synchronization depending on the cognitive task. The synchronization (Measured by PLV and wPLI) between two regions could be due to a third region influencing both, or other indirect interactions. Conversely, PDC can be established without strong functional synchronization if the causal influence is strong enough to create a statistically significant correlation in their activities. The differences in how these methods handle noise, artifact, spatial, temporal resolutions, and the assumptions they create about neural dynamics can lead to variations in results. Notably, these three methods consistently identified the frontal cortex as a key hub of connectivity. These findings underscore the importance of the frontal cortex in the neural network and illustrate how different connectivity measures can provide a multifaceted understanding of brain activity 60 .

Our results revealed significant associations between brain connectivity and performance on these neuropsychological measures. Regarding memory tasks (e.g., REY-O 3-MINUTE DELAY), CH-PATs show a strong positive correlation between frontal connectivity and performance on episodic memory tasks. This finding suggests that greater connectivity within the working memory brain networks is associated with greater efficiency 61 . Furthermore, we observed a significant negative correlation between frontal connectivity and executive functions tasks (that is, language animals tasks), indicating that decreased connectivity in the temporal or frontal regions is associated with poorer executive functions tasks and may indicate a neural compensatory mechanism. CH-PATs showed a positive correlation between frontal connectivity and Stroop color-naming task as compared to CH-NATs. This suggests that CH-PATs compensate for their processing speed decline by increasing their frontal cortex connectivity (specifically in regions linked with executive functions) to perform better on the Stroop color naming task. Moreover, we found many negative correlations between brain connectivity and attention tasks, indicating that greater connectivity in these regions is associated with decreased attentional performance 62 . These results underscore the importance of brain connectivity with memory and cognitive functioning 63 . The strong correlations observed between specific brain regions and performance on memory and cognitive tasks provide evidence for the role of neural networks in cognitive processes.

Furthermore, results suggest that the relationship between CR and parietal connectivity may vary across different patient populations, with the CH-NATs group showing a distinct pattern of negative correlation. Adults with higher levels of cognitive reserve (CR) are more likely to use other cognitive resources, such as memory strategies, to compensate for their memory impairments. Previous studies showed that individuals with a higher CR use additional brain regions associated with better memory task performance 64 . CR is assumed to reduce the risk of cognitive decline associated with brain changes related to aging by promoting the use of compensatory cognitive processes 65 . CR indicates the efficiency, capacity, and flexibility of cognitive processes in the presence of a challenge, which helps to explain the individual’s ability to cope better with brain pathology (e.g., brain aging, delay of dementia symptoms, stroke) via more adaptable functional brain processes. Although actual biomarkers of CR are still questioned, a possible mechanism for CR has been hypothesized 66 . Neural reserve theory postulates that there exists an inter-individual variability in brain networks that function as a basis of any task. In CH-NATs, a higher CR was correlated with a lower parietal EC (more efficient), which was not observed in CH-PATs. This result may suggest that CR may be exhausted in CH-PATs during this task switching processing.

The association of MRI structural volumes and EEG brain connectivity in alpha may explain how neural structures and brain functions are coupled. The negative correlations in CH-PATs between temporal EC and these brain volumes suggest that a decrease in the volume of these brain regions may be associated with an increase in EC 67 . This could be due to a compensatory increase in the number of neurons and/or synapses in these brain regions. Previous studies have found that people with AD typically have smaller hippocampus than in HCs 68 . This suggests that the reduction of neurons in the hippocampus may constitute one of the initial alterations observed in AD 69 . Other brain regions that are often affected in AD include the temporal, the parietal, and the frontal lobes 69 , 70 . In late-onset AD, cortical atrophy initiates in the temporal cortex and subsequently extends to the parietal cortex via the cingulum bundle. In contrast, in early-onset AD, cortical atrophy originates in the parietal cortex and then spreads to the temporal cortex 71 , 72 . These regions are involved in a variety of cognitive functions, including language, memory, and executive function. Furthermore, recent research observed that both AD and MCI patients showed altered FC of the fusiform gyrus in a resting state compared to normal controls 73 , 74 , which can help explain our findings in supplementary Table  2 . As the disease progresses, the brain tissue in these regions may shrink, leading to further cognitive decline. Our findings uphold the notion that greater connectivity within the frontal regions is associated with brain compensation in pre-symptomatic AD 75 . This relationship aligns with previous studies highlighting the involvement of the frontal cortex in early AD pathology 10 . The results could provide evidence that enlarged regional volumes in CH-PATs may link with greater frontal EC and play a role in compensating for behavioral performance in the presence of AD pathologies.

Despite the probable clinical relevance of autonomic dysfunction in CH individuals with pathological A β /tau, only a few studies have evaluated HRV in presymptomatic AD. Our study observed a significant negative correlation between frontal connectivity and RMSSD in CH-PATs but not in CH-NATs. The level of brain connectivity may serve as a predictor of cognitive flexibility during a cognitive task, whereas HRV may specifically predict cognitive flexibility when influenced by neuronal oscillations 76 . The association of HRV, which measures autonomic function, and cardiovascular disease as well as cognitive dysfunction has been evidenced. There is a strong relationship between cardiovascular risk and an elevated likelihood of developing neurodegenerative diseases 77 . During the initial phases of AD, perturbations in the autonomic nervous system play a role in sustaining chronic hypoperfusion, thereby impacting the self-regulation of the brain and the functioning of the neurovascular unit. Conversely, neurodegenerative alterations characteristic of AD can exert an influence on autonomic functions and HRV by disrupting the vegetative networks situated in the insular cortex and brainstem 78 . This is in line with the previous findings that the preclinical dementia patients demonstrated parasympathetic regulation of slow waves is strongly associated with disrupted FC in the central nervous system 79 . Our data suggested that higher HRV (mean RR or RMSSD) is related to lower temporal and frontal connectivity in CH-NATs. CH-PATs also suggest that higher RMSSD is associated with decreased brain connectivity. Our study supports that memory and executive function networks are related to autonomic regulation and are affected by AD pathology.

The current study has several limitations. Firstly, the study’s sample size was relatively modest, potentially limiting the generalizability of our findings to broader populations or distinct groups. Second, we used the 21-electrode EEG system to study brain connectivity (scalp potentials) in the brain cortex. Future research should focus on a high-density EEG system (i.e., 64, 128, or 256 electrodes) and compare the results with our findings. Third, a notable limitation of this study is the time difference in data collection, with MRI and EEG data being acquired at different time points. This misalignment could potentially introduce confounding variables related to alterations in the participants’ electrophysiological states or external environmental factors over time. Fourth, our analysis was performed at the sensor-space level rather than at the source-space level. While sensor-level analysis offers valuable insights into neural activity patterns, it lacks the precision and specificity that source-level analysis can provide in localizing the origins of these signals within the brain. Fifth, we considered CSF A β and tau in classifying our cohorts. Future research endeavors may explore brain connectivity using PET or plasma A β and tau to study pre-clinical AD progression. Lastly, a significant limitation in estimating causal information flow among brain regions, such as with PDC, particularly with multichannel non-invasive recordings, is the influence of volume conduction arising from surrounding active neuronal sources. Future investigations should explore alternative connectivity algorithms less sensitive to volume conduction and validate findings using high-temporal-resolution methodologies such as MRI or DTI.

To conclude, AD pathology manifests several years before clinical symptoms are recognized, termed the preclinical stage. We investigated this stage to assess whether brain connectivity could detect this early pathophysiology. The results of this study showed the potential of EC as a noninvasive tool in isolating the asymptomatic participants with normal CSF biomarker (CH-NATs)levels from asymptomatic participants with AD (CH-PATs) during task switching. Reduced temporal EC and increased frontal EC were reported in CH-PATs compared to CH-NATs, independent of task difficulties. The increased frontal EC and/or decreased temporal EC in CH-PATs are linked with disrupted brain volumes, neuropsychological, HRV, and CR, suggesting a compensatory mechanism in the presence of AD pathology to retain the same behavioral performance. Our findings indicate that A β /tau pathology may affect specific EEG networks with systemic structural/functional compensations. Overall, EC is a useful, non-invasive tool for assessing EEG-functional-network activities and provides a better understanding of the neurophysiological mechanisms underlying Alzheimer’s disease.

Participants

Forty-six cognitively healthy elderly participants were recruited locally through local newspapers and newsletters, the Pasadena Huntington Hospital Senior Health Network, and visits to the senior centers. All participants consented via an Institutional Review Board (IRB) approved protocol (HMRI # 33797). Assessments included collecting demographic data, physical exams, fasting blood studies, disease severity and disability scales, and CSF A β /tau measurements 6 . Inclusion criteria: over 60 years, classified as CH after a comprehensive neuropsychological battery, as referenced in detail 6 . Exclusion criteria: other active, untreated disease, use of anticoagulants, or other contraindications to lumbar puncture.

CSF Amyloid/tau analysis

We reported a cutoff ratio of A β 42/total tau (2.7132) provided at least 85% sensitivity in discriminating AD from non-AD participants; we then used this regression to assign CH participants (CH) into 2 groups, one with normal CSF A β /total tau (CH-NATs) and the other with pathological A β /total tau (CH-PATs). As provisional evidence for the capacity of this CSF A β /total tau to predict clinical decline, a longitudinal study found that 40% of CH-PATs declined cognitively over 4 years to MCI, or AD, while none of the CH-NATs declined 39 , 40 . A detailed description of the data collection, methodological aspects of the entire process, and CSF data analysis procedures have been documented in our prior studies 6 , 32 , 39 .

Task switching paradigms

During the resting state baseline, participants were instructed to remain still and relax for 5 min with their eyes open, followed by another 5 min with their eyes closed. For the task-switching testing each trial consisted of two sequential stimuli, both presenting incongruent colored words (e.g., the word ‘Red’ in green color or the word ‘Green’ in red color), with or without an underline (see Fig.  7 ). Participants were instructed to press a button labeled ‘1’ for red and ‘2’ for green, indicating either the color (c) when underlined or the word (w) when not underlined. The trials were categorized into low-load repeat (color-color (cC) or word-word (wW)) or high-load switching (cW or wC) trials, with the second stimulus denoted using superscript to indicate the study target. The task-switching phase was comprised of three mixed blocks, each containing 64 trials. The blocks included all four conditions (cC, wW, cW, wC) in a random sequence with equal weightage. Our analysis focused on the cW task due to the presence of the persisting task-set inhibition 80 .

figure 7

Each trial includes two sequential stimuli. Each stimulus is incongruent colored word. Participants were requested to respond to the word itself (no-underline), or to the color of the ink (underlined), by pressing a button (“1” for red, “2” for green). Tasks include a random mixture of low-load repeat trials ( a ) or high-load switch trials ( b ). The paradigm is described from our previous work 32 .

EEG data acquisition and processing

All EEG data were collected during the resting state (eyes closed) or during the switching-task challenge 32 . A 21-head-sensor, dry electrode system (Quasar Wearable Sensing, DSI-24, San Diego, CA, USA) was used to collect EEG signals. Sensor configuration followed the international 10–20 system. EEG signals were sampled at 300 Hz, and bandpass filtered between 0.4 and 45 Hz. For artifact rejection, we applied a  − 100 to 100 μV voltage threshold to detect bad epochs. In short, the visual inspection of epochs was performed based on a minimum of artifacts (e.g., excessive muscle activity, eye blinks) and drowsiness. In our study, drowsiness was inspected in EEG signals through careful visual inspection. Specifically, trained individuals examined EEG recordings for characteristic patterns associated with drowsiness, such as slowing of brainwave frequencies, increased theta activity, or intermittent bursts of alpha waves. Inspecting drowsiness is crucial to keep participants awake and alert (verbal notification) to ensure data quality, participant safety, and the validity of our experimental findings in our recordings. When an adequate level of quality was not obtained, we either substituted the epochs with alternative ones or eliminated the EEG data from further analysis if there were no sufficient epochs from the same subject available for analysis. Data quality refers to the standard of quality considered acceptable for the EEG data to be considered reliable for further analysis. This standard encompasses different factors including signal clarity, absence of artifacts, and adherence to predefined criteria for data integrity. For better signal processing, electrooculographic, electrocardiographic, and electromyography were recorded by 3 auxiliary sensors. A trigger channel encoded the time of color-word stimuli onset, the participants’ responses, and the type of test (C or W) for further analysis.

The continuous baseline EEG data were initially converted from the DSI-24 format to MATLAB format (R2022a) 38 . To ensure data quality and remove artifacts, a preprocessing pipeline designed explicitly for developmental EEG data, known as the Harvard Automated Processing Pipeline for EEG (HAPPE), was employed 38 . This subset consisted of 21 channels; Frontal (Fp1, Fp2, F7, F3, Fz, F4, F8), Temporal (T3, T4, T5, T6), Parietal (P3, PZ, P4), Occipital (O1, O2), Central (C3, CZ, C4), and mastoidal (A1, A2), as shown in Fig.  8 . The EEG signals were then referenced to the two mastoids/earlobes electrodes A1 and A2. Before independent component analysis (ICA), a 0.4 Hz digital high-pass filter, and a 45 Hz low-pass filter were applied to the EEG data to remove non-stationary signal drifts across the recording. HAPPE’s artifact removal steps encompassed the elimination of 60 Hz electrical noise using CleanLine’s multi-taper approach, rejection of bad channels, and removal of participant-related artifacts (e.g., eye blinks, movement, muscle activity) through ICA with automated component rejection via EEGLAB and the Multiple Artifact Rejection Algorithm (MARA) 81 . After artifact rejection, any channels removed during bad channel rejection were reconstructed using spherical interpolation to mitigate spatial bias in re-referencing. The resting-state EEG data were segmented into contiguous 2-s windows, and segments containing retained artifacts were rejected based on HAPPE’s amplitude and joint probability criteria, consistent with prior research on developmental EEG 82 . Importantly, there were no significant differences between outcome groups in terms of the mean lengths of the processed EEG data or any of the HAPPE data quality measures. Significant features were determined ( p  < 0.05), and assessed for the between groups using Student’s t -test) 82 .

figure 8

From raw EEG signals, cortical activity is achieved by means of high-resolution EEG techniques. It shows the HAPPE pipeline’s pre-processing steps including ICA, the estimation of directed FC from the cortical time series, threshold application, and eventually the statistical analysis. The Schematic figure also shows the proportional threshold on the PDC metrics by maintaining a proportion p (0  < p  < 1) of the high dense connections and setting these connections to the same connectivity value, with all other connections set to 0 86 . The selection of the optimum thresholding value was based on global cost efficiency 96 . The brain network statistics are performed by t -test and Spearman’s rank correlation coefficient or Pearson correlation coefficients where appropriate.

A Fast Fourier Transform (FFT) with multitaper windowing was used to decompose the EEG signal into power for each 2-s segment for each of the channels of interest. For each of the four frequency bands, the summed power across all frequencies within the band was calculated as the measure of total power in that frequency band. All segmentation parameters and analysis windows are consistent with connectivity metrics and FFT was conducted using a Hanning window. Each participant’s data was averaged across the epochs for each electrode and the mean alpha power was computed for each of the following frequency bands: delta (0.4–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), and beta (12–30 Hz). We selected a time window from 200 ms to 550 ms after stimulus onset as it detects brain responses associated with diverse cognitive functions, such as attention, working memory, decision-making, integration of incoming words, and emotion processing 83 . The data analysis process is illustrated in the block diagram in Fig.  8 .

Brain connectivity networks and information flow

Partial directed coherence (PDC) is one measure of the Granger causality which provides insights about the directionality of information between the brain nodes. PDC is based on the consideration that knowledge of the “driver’s” past increases the prediction of the “receiver’s” present state, compared to only using the receiver’s past. In the presence of volume conduction, however, all EEG channels mutually “drive” each other in this respect. PDC is derived from coefficients of a multivariate autoregressive (MVAR) model, which additionally depends on the scaling of the data. Interestingly, this scaling dependency is sufficient to yield significant spurious information flow from low-variance to high-variance temporally and spatially white noise channels. The MVAR uses Akaike information criterion (AIC) and Schwarz information to select the model order of MVAR 84 , 85 . In this study, the average MVAR model order p for all subjects was 7. Models with lower AIC are principally preferred. PDC is a frequency-domain approach to denote the direct linear relationship between two different signals y i ( t ) and y j ( t ) (equation 1 ) once remarked jointly with a set of other signals. Considering Y ( t ) the set of all observed time series, it can be depicted as an autoregressive model as follows: where p represents the model order, ε (t) is the prediction error matrix and A k are the coefficients matrix with a i j elements in which denotes the relation between signals at lag k . ε (t) has a covariance matrix ξ and their coefficients are usually a white noise with zero mean. This results in PDC factor ( π i j ) and partial coherence function ( ∣ k i j [ f ] ∣ 2 ) that indicates the strength and the direction of communication at frequency f .

Therefore, the PDC value from channel j to channel i can be expressed as follows:

where, \(({\bar{a}}_{i})(\, f)(i = 1,2,\ldots M)\) represents the i t h column of the matrix \(\bar{A(f)}\) and π i j represents the strength of causal interaction of the information flow from channel j to channel i at a frequency of f .

H [ f ] is the Hermitian matrix which is equal to \({\bar{A}}^{-1}[ \, f]\) . \({\bar{A}}_{ij}(f)\) is the complement of A i j ( f ) and represents the transfer function from y j [ t ] to y i [ t ] being also an element of A [ f ] matrix. Finally, a j [ f ] is the j t h column of A [ f ] and π i is the i t h row of π i j .

We applied the proportional threshold on the PDC metrics by maintaining a proportion p (0 <  p  < 1) of the highly dense connections and setting these connections to the same connectivity value, with all other connections set to 0 86 . The selection of the optimum threshold value was based on global cost efficiency. Proportional thresholding is a commonly used analysis step in reconstructing functional brain networks to ensure equal density across patient and control samples. The proportional threshold method is employed to highlight the most robust and significant connections while reducing visual clutter caused by weaker or less relevant connections. The MVAR model is a mathematical model commonly used in time series analysis to describe the relationship between an observation and a linear combination of its past observations. This MVAR model is generated using the two-time series for a specific frequency f ,  A k is the MVAR model in the discrete domain, is the covariance of the cross-spectral density matrix, k is the number of EEG channels, and I is the identity matrix. Calculation of the PDC values leads to a large matrix that describes the connectivity between the EEG channels, as shown in equation ( 1 ), where i j (  f  ) is the individual PDC value calculated from the time series i to j at frequency f . The PDC values range between 0 and 1 depending on how well one time series predicts the other. The strength of a measure such as PDC is apparent in its formulation because it is normalized according to the destination. When analyzing time series, this operation is taken in a short time Fourier transform approach. A 50 percent overlap between windows, with a window length of 400 Ms, is chosen to capture events that may fall on the border between windows. To reduce memory requirements, frequencies are divided into evenly spaced bins, typically a power of 2; here we chose 30. For k channels, there will be a k x k x 30 x t matrix of PDC values. The first dimension is the source channel and the second corresponds to the destination.

Brain phase synchronization

Phase synchronization analysis is crucial in understanding undirected functional connectivity in brain networks derived from EEG data. The Phase Lag Index (PLI) is a widely used measure in neuroscience that quantifies the consistency of phase differences between neural oscillations across different brain regions. Nonetheless, the Weighted Phase Lag Index (wPLI) extends the PLI by considering the magnitudes of the phase differences. This enhancement accounts for the strength of phase coupling between neural oscillations in addition to their consistency. The wPLI is a robust functional connectivity approach used in EEG connectivity analysis, because of its high insensitivity to common sources and volume conduction effects. The formula for wPLI is given by:

Where, ∣ Δ ϕ ( t ) ∣ represents the magnitude of the phase difference. wPLI provides a more refined measure of functional connectivity, capturing both the consistency and strength of phase coupling between brain regions. In contrast to PLI, the wPLI adjusts the weighting of the cross-spectrum based on the magnitude of its imaginary component. It eliminates the influence of cross-spectrum elements (phase lacking) near the real axis (0, π , or 2 π ), which are susceptible to small noise perturbations that might alter their true sign due to the volume conduction effects.

Moreover, the phase locking value (PLV) method is commonly used for calculating the correlation between two electrodes. The PLV is a statistic that can be used to investigate EEG data for task-induced changes in the long-range synchronization of neural activity. To calculate the PLV, two time series are first spectrally decomposed at a given frequency, f 0 , to obtain an instantaneous phase estimate at each time point. Phase synchronization between two narrow-band signals is frequently characterized by the PLV. Consider a pair of real signals s 1 ( t ) and s 2 ( t ), that have been band-pass filtered to a frequency range of interest. Analytic signals \({z}_{i}(t)={A}_{i}(t){e}^{j{\phi }_{i}(t)}\) for i  = {1, 2} and \(j=\sqrt{-1}\) are obtained from s i ( t ) using the Hilbert transform:

where H T ( s i ( t )) is the Hilbert transform of s i ( t ) defined as:

and P .  V . denotes the Cauchy principal value. Once the analytic signals are defined, the relative phase can be computed as:

The instantaneous PLV is then defined as 87 :

where E [. ] denotes the expected value. The PLV takes values on [0, 1] with 0 reflecting the case where there is no phase synchrony and 1 where the relative phase between the two signals is identical in all trials. PLV can therefore be viewed as a measure of trial-to-trial variability in the relative phases of two signals. In this work, we use the Hilbert transform, but the continuous Morlet wavelet transform can also be used to compute complex signals, producing separate band-pass signals for each scaling of the wavelet 88 . The connectivity results associated with wPLI and PLV are presented in the supplemental section (Figs.  S1 and S2 ).

Neuropsychological tests and cognitive reserve

Several tests of working memory, language, executive function, and processing speed were considered in our analysis. A full description of these tests and their references were reported in these studies 6 , 89 .

One crucial factor that has not been taken into account in the previously described studies on strategy use is the potential role of Cognitive reserve (CR) (brain’s ability to withstand aging or pathology by employing compensatory mechanisms) 90 . CR indicates the effectiveness, capability, and adaptability of cognitive processes during cognitive challenges or pathology. This phenomenon elucidates an individual’s capacity to manage brain-related issues such as aging, and delayed onset of dementia symptoms. Various proxies are used to measure CR, including educational level, verbal intelligence quotient (IQ), engagement in work, social interactions, and/or participation in leisure activities 91 . Presently, composite measures offer the most comprehensive assessment of CR, such as education, occupational complexity, and leisure activities. In our study, both IQ estimation and education level were used as proxies for CR. First, scores were transformed into Z-scores. Subsequently, the education and IQ Z-scores were averaged into a single cognitive reserve (CR) score 64 , 66 , 92 .

Structural MRI data acquisition

All MRI images were acquired at the Advanced Imaging and Spectroscopy Center of the Huntington Medical Research Institutes (Pasadena, CA) using a 1.5 Tesla General Electric (GE) clinical scanner with an 8-channel high-resolution head coil. A brief description of MRI and NeuroQuant (Cortechs Labs.ai Inc, San Diego, CA, USA) analyses was reported in our published work 93 . Several brain regions were selected to examine the correlation between brain connectivity and brain atrophy in CH-NATs and CH-PATs. These regions include the fusiform cortex, frontal cortex, hippocampus, entorhinal Cortex, and Amygdala. The normalization factors are often based on automated intracranial volume (ICV) measurements or scaling factors from skull-based or whole-head-based registration to a standard template 94 , 95 .

ECG and HRV analysis

We examined the correlation between EC and HRV measures. Raw electrocardiogram (ECG) data were collected during the task-switching using AcqKnowledge software (BIOPAC Systems, Inc., Goleta, CA). ECG and HRV recording and analysis details were reported in our previous work 32 . A correlation between CH-PATS and CH-NATs was conducted between brain connectivity and HRV time domain measures (i.e., NN intervals (RR), heart rate (HR), standard deviation of NN (SDNN), and root mean squared successive differences (RMSSD)) and frequency domain (i.e., low frequency (LF) and high-frequency (HF)).

Statistics and reproducibility

We employed a parametric two-sample t-test, using Bonferroni–Holm correction method, to assess the connection metrics between CH-NATs and CH-PATs. Before conducting the statistical analysis, we used the Kolmogorov-Smirnov method to test the normal distribution of the data. A p -value ( p  < 0.05) was used to identify the significant differences between CH-NATs and CH-PATs at the group level. All data are presented as (mean ± SD). Finally, Spearman’s or Pearson correlation was applied to study the association between brain connectivity and neuropsychological, CR scores, brain volumetric, and HRV scores. The p  < 0.05 and r (association directionality values) are shown.

All statistical analyses were performed using GraphPad prism statistics software (version 9.5.0) and R programming language (version 2023.06.0), and Matlab (version 2022A, The Mathworks, Inc).

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

Raw data were generated at Huntington Medical Research Institutes (HMRI). All data generated or analyzed during this study are included in this published article and its supplementary information files, specifically in Supplementary Data  1 . Derived data and Matlab codes supporting the findings of this study are available from the corresponding author AA on valid request.

Nicaise, S. et al. Alzheimer and the Mediterranean report 2016: overview–challenges–perspectives (Monegasque Association for Research on Alzheimer’s disease (AMPA), 2016).

Prince, M. et al. World Alzheimer Report 2015. The Global Impact of Dementia: An analysis of prevalence, incidence, cost and trends . Ph.D. thesis, Alzheimer’s Disease International (2015).

Chehrehnegar, N. et al. Early detection of cognitive disturbances in mild cognitive impairment: a systematic review of observational studies. Psychogeriatrics 20 , 212–228 (2020).

Article   PubMed   Google Scholar  

Sharma, L. et al. Promising protein biomarkers in the early diagnosis of Alzheimer’s disease. Metab. Brain Dis. 37 , 1727–1744 (2022).

Article   CAS   PubMed   Google Scholar  

Leyhe, T., Müller, S., Milian, M., Eschweiler, G. W. & Saur, R. Impairment of episodic and semantic autobiographical memory in patients with mild cognitive impairment and early Alzheimer’s disease. Neuropsychologia 47 , 2464–2469 (2009).

Harrington, M. G. et al. Executive function changes before memory in preclinical Alzheimer’s pathology: a prospective, cross-sectional, case control study. PloS One 8 , e79378 (2013).

Article   PubMed   PubMed Central   Google Scholar  

Allain, P., Etcharry-Bouyx, F. & Verny, C. Executive functions in clinical and preclinical Alzheimer’s disease. Rev. Neurol. 169 , 695–708 (2013).

Aluise, C. D. et al. Redox proteomics analysis of brains from subjects with amnestic mild cognitive impairment compared to brains from subjects with preclinical Alzheimer’s disease: insights into memory loss in mci. J. Alzheimer’s. Dis. 23 , 257–269 (2011).

Article   CAS   Google Scholar  

Guan, Z.-Z., Zhang, X., Ravid, R. & Nordberg, A. Decreased protein levels of nicotinic receptor subunits in the hippocampus and temporal cortex of patients with Alzheimer’s disease. J. Neurochem. 74 , 237–243 (2000).

Kashani, A. et al. Loss of vglut1 and vglut2 in the prefrontal cortex is correlated with cognitive decline in Alzheimer disease. Neurobiol. Aging 29 , 1619–1630 (2008).

Amieva, H. et al. Evidencing inhibitory deficits in Alzheimer’s disease through interference effects and shifting disabilities in the Stroop test. Arch. Clin. Neuropsychol. 19 , 791–803 (2004).

Belleville, S., Chertkow, H. & Gauthier, S. Working memory and control of attention in persons with Alzheimer’s disease and mild cognitive impairment. Neuropsychology 21 , 458 (2007).

Wang, Q. et al. Risk assessment and stratification of mild cognitive impairment among the Chinese elderly: attention to modifiable risk factors. J. Epidemiol. Community Health (2023).

Lee, D., Park, J. Y. & Kim, W. J. Altered functional connectivity of the default mode and dorsal attention network in subjective cognitive decline. J. Psychiatr. Res. 159 , 165–171 (2023).

Villemagne, V. L. et al. Amyloid β deposition, neurodegeneration, and cognitive decline in sporadic Alzheimer’s disease: a prospective cohort study. Lancet Neurol. 12 , 357–367 (2013).

Park, J.-C. et al. Plasma tau/amyloid- β 1–42 ratio predicts brain tau deposition and neurodegeneration in Alzheimer’s disease. Brain 142 , 771–786 (2019).

Kwak, S. S. et al. Amyloid- β 42/40 ratio drives tau pathology in 3d human neural cell culture models of Alzheimer’s disease. Nat. Commun. 11 , 1377 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wang, L. et al. Cerebrospinal fluid proteins predict longitudinal hippocampal degeneration in early-stage dementia of the Alzheimer type. Alzheimer Dis. Assoc. Disord. 26 , 314–321 (2012).

Fagan, A. M. et al. Comparison of analytical platforms for cerebrospinal fluid measures of β -amyloid 1-42, total tau, and p-tau181 for identifying Alzheimer disease amyloid plaque pathology. Arch. Neurol. 68 , 1137–1144 (2011).

Pichet Binette, A. et al. Amyloid-associated increases in soluble tau relate to tau aggregation rates and cognitive decline in early Alzheimer’s disease. Nat. Commun. 13 , 6635 (2022).

Rodriguez-Vieitez, E. et al. Association of cortical microstructure with amyloid- β and tau: impact on cognitive decline, neurodegeneration, and clinical progression in older adults. Mol. Psychiatry 26 , 7813–7822 (2021).

Hoops, S. et al. Validity of the MoCA and MMSE in the detection of MCI and dementia in Parkinson disease. Neurology 73 , 1738–1745 (2009).

Matsuoka, T., Imai, A. & Narumoto, J. Neuroimaging of mild behavioral impairment: a systematic review. Psychiatry Clin. Neurosci. Rep. 2 , e81 (2023).

Google Scholar  

Nicolini, P. et al. Autonomic function predicts cognitive decline in mild cognitive impairment: Evidence from power spectral analysis of heart rate variability in a longitudinal study. Front. Aging Neurosci. 14 , 886023 (2022).

Ebinger, J. E. et al. Blood pressure variability supersedes heart rate variability as a real-world measure of dementia risk. Sci. Rep. 14 , 1838 (2024).

Molloy, C. et al. Resting heart rate (variability) and cognition relationships reveal cognitively healthy individuals with pathological amyloid/tau ratio. Front. Epidemiol. 3 , 1168847 (2023).

Palmqvist, S. et al. Earliest accumulation of β -amyloid occurs within the default-mode network and concurrently affects brain connectivity. Nat. Commun. 8 , 1214 (2017).

Yu, M., Sporns, O. & Saykin, A. J. The human connectome in Alzheimer disease—relationship to biomarkers and genetics. Nat. Rev. Neurol. 17 , 545–563 (2021).

Moretti, D. V. Understanding early dementia: Eeg, MRI, spect and memory evaluation. Transl. Neurosci. 6 , 32–46 (2015).

Kim, N. H. et al. Pet-validated eeg-machine learning algorithm predicts brain amyloid pathology in pre-dementia Alzheimer’s disease. Alzheimer’s. Dement. 19 , e064436 (2023).

Article   Google Scholar  

Canuet, L. et al. Network disruption and cerebrospinal fluid amyloid-beta and phospho-tau levels in mild cognitive impairment. J. Neurosci. 35 , 10325–10330 (2015).

Arechavala, R. J. et al. Task switching reveals abnormal brain-heart electrophysiological signatures in cognitively healthy individuals with abnormal csf amyloid/tau, a pilot study. Int. J. Psychophysiol. 170 , 102–111 (2021).

Huang, S.-Y. et al. Characteristic patterns of inter-and intra-hemispheric metabolic connectivity in patients with stable and progressive mild cognitive impairment and Alzheimer’s disease. Sci. Rep. 8 , 13807 (2018).

Vecchio, F. et al. Sustainable method for Alzheimer dementia prediction in mild cognitive impairment: Electroencephalographic connectivity and graph theory combined with apolipoprotein e. Ann. Neurol. 84 , 302–314 (2018).

Moretti, D. et al. Specific eeg changes associated with atrophy of hippocampus in subjects with mild cognitive impairment and Alzheimer’s disease. Int. J. Alzheimer’s Dis. 2012 , 253153 (2012).

McBride, J. C. et al. Spectral and complexity analysis of scalp EEG characteristics for mild cognitive impairment and early Alzheimer’s disease. Comput. Methods Prog. Biomed. 114 , 153–163 (2014).

Trinh, T.-T. et al. Identifying individuals with mild cognitive impairment using working memory-induced intra-subject variability of resting-state eegs. Front. Comput. Neurosci. 15 , 700467 (2021).

Gabard-Durnam, L. J., Mendez Leal, A. S., Wilkinson, C. L. & Levin, A. R. The Harvard automated processing pipeline for electroencephalography (happe): standardized processing software for developmental and high-artifact data. Front. Neurosci. 12 , 97 (2018).

Arakaki, X. et al. Alpha desynchronization/synchronization during working memory testing is compromised in acute mild traumatic brain injury (mtbi). PloS One 13 , e0188101 (2018).

Arakaki, X. et al. A study of alpha desynchronization, heart rate, and mri during Stroop testing unmasks pre-symptomatic Alzheimer’s disease: Eeg biomarkers of Alzheimer’s disease in pre-symptomatic and symptomatic patients: Multimodal validation from international projects. Alzheimer Dement. 16 , e042793 (2020).

Philiastides, M. G., Ratcliff, R. & Sajda, P. Neural representation of task difficulty and decision making during perceptual categorization: a timing diagram. J. Neurosci. 26 , 8965–8975 (2006).

Wang, R. et al. Power spectral density and coherence analysis of Alzheimer’s eeg. Cogn. Neurodyn. 9 , 291–304 (2015).

Gaubert, S. et al. Eeg evidence of compensatory mechanisms in preclinical Alzheimer’s disease. Brain 142 , 2096–2112 (2019).

Perez, V. et al. Eeg markers and subjective memory complaints in young and older people. Int. J. Psychophysiol. 182 , 23–31 (2022).

Jiao, B. et al. Neural biomarker diagnosis and prediction to mild cognitive impairment and Alzheimer’s disease using eeg technology. Alzheimer’s. Res. Ther. 15 , 1–14 (2023).

Fahnestock, M. & Shekari, A. Prongf and neurodegeneration in Alzheimer’s disease. Front. Neurosci. 13 , 440994 (2019).

Hearne, L. J. et al. Increased cognitive complexity reveals abnormal brain network activity in individuals with corpus callosum dysgenesis. NeuroImage: Clin. 21 , 101595 (2019).

Jeong, H. T., Youn, Y. C., Sung, H.-H. & Kim, S. Y. Power spectral changes of quantitative eeg in the subjective cognitive decline: comparison of community normal control groups. Neuropsychiatr. Dis. Treat. 21 , 2783–2790 (2021).

Briels, C. T. et al. Reproducibility of EEG functional connectivity in Alzheimer’s disease. Alzheimer’s. Res. Ther. 12 , 1–14 (2020).

Vlassenko, A. G., Benzinger, T. L. & Morris, J. C. Pet amyloid-beta imaging in preclinical Alzheimer’s disease. Biochim. Biophys. Acta (BBA)-Mol. Basis Dis. 1822 , 370–379 (2012).

Grandjean, J. et al. Early alterations in functional connectivity and white matter structure in a transgenic mouse model of cerebral amyloidosis. J. Neurosci. 34 , 13780–13789 (2014).

Canuet, L. et al. Resting-state network disruption and APOE genotype in Alzheimer’s disease: a lagged functional connectivity study PLoS ONE , 7 , e46289. https://doi.org/10.1371/journal.pone.0046289 (2012).

Palop, J. J. & Mucke, L. Amyloid- β –induced neuronal dysfunction in Alzheimer’s disease: from synapses toward neural networks. Nat. Neurosci. 13 , 812–818 (2010).

Mohanta, S. et al. Receptors, circuits and neural dynamics for prediction. Available at SSRN 3659396 (2021).

Greicius, M. D., Krasnow, B., Reiss, A. L. & Menon, V. Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proc. Natl. Acad. Sci. USA 100 , 253–258 (2003).

Farràs-Permanyer, L., Guàrdia-Olmos, J. & Peró-Cebollero, M. Mild cognitive impairment and f MRI studies of brain functional connectivity: the state of the art. Front. Psychol. 6 , 1095 (2015).

Babiloni, F. et al. Hypermethods for eeg hyperscanning. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society , 3666–3669 (IEEE, 2006).

Gilbert, C. D. & Li, W. Top-down influences on visual processing. Nat. Rev. Neurosci. 14 , 350–363 (2013).

Fernández, P. J., Vivas, A. B., Chechlacz, M. & Fuentes, L. J. The role of the parietal cortex in inhibitory processing in the vertical meridian: Evidence from elderly brain damaged patients. Aging Brain 2 , 100043 (2022).

Burgess, A. P. On the interpretation of synchronization in eeg hyperscanning studies: a cautionary note. Front. Hum. Neurosci. 7 , 881 (2013).

Trammell, J. P., MacRae, P. G., Davis, G., Bergstedt, D. & Anderson, A. E. The relationship of cognitive performance and the theta-alpha power ratio is age-dependent: an EEG study of short term memory and reasoning during task and resting-state in healthy young and old adults. Front. Aging Neurosci. 9 , 364 (2017).

Rogala, J., Kublik, E., Krauz, R. & Wróbel, A. Resting-state eeg activity predicts frontoparietal network reconfiguration and improved attentional performance. Sci. Rep. 10 , 5064 (2020).

Vecchio, F. et al. Cortical connectivity and memory performance in cognitive decline: a study via graph theory from EEG data. Neuroscience 316 , 143–150 (2016).

Bouazzaoui, B. et al. Aging and self-reported internal and external memory strategy uses: the role of executive functioning. Acta Psychol. 135 , 59–66 (2010).

Stern, Y. Cognitive reserve in ageing and Alzheimer’s disease. Lancet Neurol. 11 , 1006–1012 (2012).

Frankenmolen, N. L., Fasotti, L., Kessels, R. P. & Oosterman, J. M. The influence of cognitive reserve and age on the use of memory strategies. Exp. Aging Res. 44 , 117–134 (2018).

Scahill, R. I., Schott, J. M., Stevens, J. M., Rossor, M. N. & Fox, N. C. Mapping the evolution of regional atrophy in Alzheimer’s disease: unbiased analysis of fluid-registered serial mri. Proc. Natl. Acad. Sci. USA 99 , 4703–4707 (2002).

Barnes, J. et al. A meta-analysis of hippocampal atrophy rates in Alzheimer’s disease. Neurobiol. Aging 30 , 1711–1723 (2009).

Vemuri, P. & Jack, C. R. Role of structural MRI in Alzheimer’s disease. Alzheimer’ Res. Ther. 2 , 1–10 (2010).

Duara, R. et al. Medial temporal lobe atrophy on MRI scans and the diagnosis of Alzheimer disease. Neurology 71 , 1986–1992 (2008).

Contador, J. et al. Longitudinal brain atrophy and CSF biomarkers in early-onset Alzheimer’s disease. NeuroImage Clin. 32 , 102804 (2021).

Aoki, Y. et al. Eeg resting-state networks in Alzheimer’s disease associated with clinical symptoms. Sci. Rep. 13 , 3964 (2023).

Cai, S. et al. Altered functional connectivity of fusiform gyrus in subjects with amnestic mild cognitive impairment: a resting-state fMRI study. Front. Hum. Neurosci. 9 , 471 (2015).

Ma, D. et al. The fusiform gyrus exhibits an epigenetic signature for Alzheimer’s disease. Clin. Epigenet. 12 , 1–16 (2020).

Aramadaka, S. et al. Neuroimaging in Alzheimer’s disease for early diagnosis: a comprehensive review. Cureus 15 , e38544 (2023).

Alba, G., Vila, J., Rey, B., Montoya, P. & Muñoz, M. Á. The relationship between heart rate variability and electroencephalography functional connectivity variability is associated with cognitive flexibility. Front. Hum. Neurosci. 13 , 428262 (2019).

Imbimbo, C. et al. Heart rate variability and cognitive performance in adults with cardiovascular risk. Cereb. Circul. Cogn. Behav. 3 , 100136 (2022).

Idiaquez, J. & Roman, G. C. Autonomic dysfunction in neurodegenerative dementias. J. Neurol. Sci. 305 , 22–27 (2011).

Kong, S. D. et al. Heart rate variability during slow wave sleep is linked to functional connectivity in the central autonomic network. Brain Commun. 5 , fcad129 (2023).

Wu, S. et al. The neural dynamic mechanisms of asymmetric switch costs in a combined Stroop-task-switching paradigm. Sci. Rep. 5 , 10240 (2015).

Klug, M. & Gramann, K. Identifying key factors for improving ica-based decomposition of EEG data in mobile and stationary experiments. Eur. J. Neurosci. 54 , 8406–8420 (2021).

Gabard-Durnam, L. J. et al. Longitudinal eeg power in the first postnatal year differentiates autism outcomes. Nat. Commun. 10 , 4188 (2019).

Jia, H., Li, H. & Yu, D. The relationship between ERP components and eeg spatial complexity in a visual go/nogo task. J. Neurophysiol. 117 , 275–283 (2017).

Akaike, H. Factor analysis and aic. Psychometrika 52 , 317–332 (1987).

Schwarz, G. Estimating the dimension of a model. The annals of statistics 6 , 461–464 (1978).

van den Heuvel, M. P. et al. Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. Neuroimage 152 , 437–449 (2017).

Celka, P. Statistical analysis of the phase-locking value. IEEE Signal Process. Lett. 14 , 577–580 (2007).

Cui, G., Li, X. & Touyama, H. Emotion recognition based on group phase locking value using convolutional neural network. Sci. Rep. 13 , 3769 (2023).

Miles, S., Gnatt, I., Phillipou, A. & Nedeljkovic, M. Cognitive flexibility in acute anorexia nervosa and after recovery: a systematic review. Clin. Psychol. Rev. 81 , 101905 (2020).

Garba, A. E. et al. The influence of cognitive reserve on Alzheimer’s disease progression. Alzheimer Dement. 17 , e054537 (2021).

Šneidere, K., Mondini, S. & Stepens, A. Role of eeg in measuring cognitive reserve: a rapid review. Front. Aging Neurosci. 12 , 249 (2020).

Steffener, J. & Stern, Y. Exploring the neural basis of cognitive reserve in aging. Biochim. Biophys. Acta (BBA)-Mol. Basis Dis. 1822 , 467–473 (2012).

Wei, K. et al. White matter hypointensities and hyperintensities have equivalent correlations with age and csf β -amyloid in the nondemented elderly. Brain Behav. 9 , e01457 (2019).

Plant, C. et al. Automated detection of brain atrophy patterns based on mri for the prediction of Alzheimer’s disease. Neuroimage 50 , 162–174 (2010).

Rocca, M. A. et al. Brain mri atrophy quantification in ms: from methods to clinical application. Neurology 88 , 403–413 (2017).

Chan, Y. L. et al. Automated thresholding method for fnirs-based functional connectivity analysis: validation with a case study on Alzheimer’s disease. IEEE Trans. Neural Syst. Rehabilit. Eng. 28 , 1691–1701 (2020).

Download references

Acknowledgements

The authors thank the study participants for their altruistic participation in this research. They also thank Dr. Astrid Suchy-Dicey for her support in submitting this manuscript, as well as Cathleen Molloy for revising the manuscript and handling some data, Shant Rising, and Rachel Woo for taking part in handling some data. Some data relied on in this study were derived from research performed at HMRI by Dr. Michael G. Harrington. This work was supported by the National Institute on Aging, National Institutes of Health (NIH) (grant numbers R56AG063857 and R01AG063857).

Author information

Authors and affiliations.

Department of Neurosciences, Huntington Medical Research Institutes, Pasadena, CA, USA

Abdulhakim Al-Ezzi, Ryan Butler, Alfred N. Fonteh, Robert A. Kloner & Xianghong Arakaki

Department of Environmental and Occupational Health, Center for Occupational and Environmental Health (COEH), University of California, Irvine, CA, USA

Rebecca J. Arechavala & Michael T. Kleinman

Fuller Theological Seminary, Pasadena, CA, USA

The Hill Medical Corporation, Pasadena, CA, USA

Jimmy J. Kang

The Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA

Shinsuke Shimojo & Daw-An Wu

Department of Cardiovascular Research, Huntington Medical Research Institutes, Pasadena, CA, USA

Robert A. Kloner

You can also search for this author in PubMed   Google Scholar

Contributions

Conceived and designed the experiments: X.A.; performed the experiments: X.A.; neuropsychological data: A.N.; MRI data: HMRI Brain Imaging Center; Analyzed data: A.A.; Wrote the paper: A.A.; Behavioral analysis: D.W.; Heart rate variability analysis: M.K. and R.A.; Edited the paper: R.A., R.B., A.N., J.J.K., S.S., D.W., A.F., M.K., R.K., and X.A.; All authors contributed to the final manuscript.

Corresponding authors

Correspondence to Abdulhakim Al-Ezzi or Xianghong Arakaki .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Communications Biology thanks Gleb Bezgin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Christian Beste and Benjamin Bessieres. A  peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, supplementary information file, description of additional supplementary files, supplementary data 1, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Al-Ezzi, A., Arechavala, R.J., Butler, R. et al. Disrupted brain functional connectivity as early signature in cognitively healthy individuals with pathological CSF amyloid/tau. Commun Biol 7 , 1037 (2024). https://doi.org/10.1038/s42003-024-06673-w

Download citation

Received : 16 January 2024

Accepted : 01 August 2024

Published : 23 August 2024

DOI : https://doi.org/10.1038/s42003-024-06673-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

thesis sample size calculator

Ready to level up your insights?

Get ready to streamline, scale and supercharge your research. Fill out this form to request a demo of the InsightHub platform and discover the difference insights empowerment can make. A member of our team will reach out within two working days.

Cost effective insights that scale

Quality insight doesn't need to cost the earth. Our flexible approach helps you make the most of research budgets and build an agile solution that works for you. Fill out this form to request a call back from our team to explore our pricing options.

  • What is InsightHub?
  • Data Collection
  • Data Analysis
  • Data Activation
  • Research Templates
  • Information Security
  • Our Expert Services
  • Support & Education
  • Consultative Services
  • Insight Delivery
  • Research Methods
  • Sectors We Work With
  • Meet the team
  • Advisory Board
  • Press & Media
  • Book a Demo
  • Request Pricing

Camp InsightHub

Embark on a new adventure. Join Camp InsightHub, our free demo platform, to discover the future of research.

FlexMR InsightHub

Read a brief overview of the agile research platform enabling brands to inform decisions at speed in this PDF.

InsightHub on the Blog

  • Surveys, Video and the Changing Face of Agile Research
  • Building a Research Technology Stack for Better Insights
  • The Importance of Delegation in Managing Insight Activities
  • Common Insight Platform Pitfalls (and How to Avoid Them)
  • Support and Education
  • Insight Delivery Services

FlexMR Services Team

Our services drive operational and strategic success in challenging environments. Find out how.

Video Close Connection Programme

Close Connections bring stakeholders and customers together for candid, human conversations.

Services on the Blog

  • Closing the Client-Agency Divide in Market Research
  • How to Speed Up Fieldwork Without Compromising Quality
  • Practical Ways to Support Real-Time Decision Making
  • Developing a Question Oriented, Not Answer Oriented Culture
  • Meet the Team

FlexMR Credentials Deck

The FlexMR credentials deck provides a brief introduction to the team, our approach to research and previous work.

FlexMR Insights Empowerment

We are the insights empowerment company. Our framework addresses the major pressures insight teams face.

Latest News

  • Insight as Art Shortlisted for AURA Innovation Award
  • FlexMR Launch Video Close Connection Programme
  • VideoMR Analysis Tool Added to InsightHub
  • FlexMR Makes Shortlist for Quirks Research Supplier Award
  • Latest Posts
  • Strategic Thinking
  • Technology & Trends
  • Practical Application
  • Insights Empowerment
  • View Full Blog Archives

FlexMR Close Connection Programme

Discover how to build close customer connections to better support real-time decision making.

Market Research Playbook

What is a market research and insights playbook, plus discover why should your team consider building one.

Featured Posts

  • Five Strategies for Turning Insight into Action
  • How to Design Surveys that Ask the Right Questions
  • Scaling Creative Qual for Rich Customer Insight
  • How to Measure Brand Awareness: The Complete Guide
  • All Resources
  • Client Stories
  • Whitepapers
  • Events & Webinars
  • The Open Ideas Panel
  • InsightHub Help Centre
  • FlexMR Client Network

Insights Empowerment Readiness Calculator

The insights empowerment readiness calculator measures your progress in building an insight-led culture.

MRX Lab Podcast

The MRX Lab podcast explores new and novel ideas from the insights industry in 10 minutes or less.

Featured Stories

  • Specsavers Informs Key Marketing Decisions with InsightHub
  • The Coventry Panel Helps Maintain Award Winning CX
  • Isagenix Customer Community Steers New Product Launch
  • Curo Engage Residents with InsightHub Community
  • Tech & Trends /
  • Research Methods /

The Quantitative Research Sample Size Calculator

Chris martin, the connection between customer salience and marke....

In today’s day and age, it is hard for insight experts to imagine that some stakeholders still don’t...

Emily James

  • Insights Empowerment (29)
  • Practical Application (173)
  • Research Methods (283)
  • Strategic Thinking (198)
  • Survey Templates (7)
  • Tech & Trends (387)

We may love our in-depth qualitative research tools, but we know the value of integrating both qual and quant methods. That’s why we have built our DIY quantitative research sample size calculator. From some basic information, this tool displays the recommended sample size required for your research to be statistically significant.

Use the calculator to work out how many people you need to complete your survey or poll to be confident in the accuracy of your results.

Not sure what values to use? This brief guide explains the terms used in our sample size calculator, in addition to providing recommended values for optimum results.

Sample Size:   Your sample size is the amount of consumers in your target population that you will be researching. This calculator provides a recommended sample size – i.e. the minimum amount of consumers you need to research for your results to be statistically significant within your defined parameters.

Population Size:   The population size is the approximate amount of consumers in the group that you want to research. For example, if you want to understand the internet usage habits of the entire UK population, your population would be all UK consumers ( 64.6 million  at the latest estimate).

Confidence Level:   A confidence level is defined as the statistical probability that the value of a parameter falls within a specified range of values. Therefore a confidence level of N% means you can be N% sure that your results contain the true mean average of the designated population.

In market research, the most commonly used confidence level is 95%. A higher confidence level indicates a higher probability that your results are accurate, but increasing it can dramatically increase the required sample size. Finding a balance between confidence and an achievable research goal is crucial.

In this calculation, each confidence level is translated to a   z-score. A z-score is a statistical method for rescaling data that helps researchers draw comparisons easier. The following table details the z-score generated from each confidence level:

90% 1.645
95% 1.96
99% 2.58

Margin of Error:   The margin of error is the maximum acceptable difference in results between the population and sample. On a basic level, if a poll were to ask 1,000 people if they drive a car and 70% of people were to answer yes – a margin of error of +/- 5% would indicate that in the total population, between 66.5% and 73.5% would answer in the same way.

The smaller the margin of error, the more representative of the total population the results will be. However, decreasing the margin of error will also result in a sharp increase in sample size. We recommend using a 5% margin of error as standard, which should never be increased above 10%.

The DIY Sample Size Calculation

Want to work out your required sample size by hand? Our free calculator uses the following equation. Simply follow the steps below to work out how many research participants you need to complete your research.

S = (z 2  (d(1 - d))/ e 2 ) / 1 + (z 2  (d(1 - d)) / e 2 )

S   = sample size |  P   = population size  |  z   = z-score  |  e   = margin of error  |  d  = standard deviation

Please note that our calculator assumes a standard deviation of 0.5. Use the manual equation to change the standard deviation.

Now that you have calculated the recommended amount of participants required to make your results statistically significant, the next step is to invite participants. In our experience, a typical panel based survey will yield a 15%-20% response rate without an incentive, but approximately a 30% response rate when a relevant incentive is offered. There are a multitude of factors that can affect survey response rate, from length to design, acessibility to relevance.

However, by using an estimate of 30% survey completion and the minimum sample provided by our calculator, it is possible to work out how many consumers you'll need to reach. For example, if you need to reach a minimum sample of 1,000 consumers - you'll need to invite approximately 3,334 consumers within your target population   (S (1 - 0.7) . 

 sample size calculator to work out my ideal quantitative research sample."

What are your top tips for improving survey response rates? Let us know in the comments below and start a discussion.

About FlexMR

We are The Insights Empowerment Company. We help research, product and marketing teams drive informed decisions with efficient, scalable & impactful insight.

About Chris Martin

Chris is an experienced executive and marketing strategist in the insight and technology sectors. He also hosts our MRX Lab podcast.

Stay up to date

You might also like....

Blog Featured Image Header

Market Research Room 101: Round 2

On Thursday 9th May 2024, Team Russell and Team Hudson duelled in a panel debate modelled off the popular TV show Room 101. This mock-gameshow-style panel, hosted by Keen as Mustard Marketing's Lucy D...

Blog Featured Image Header

Delivering AI Powered Qual at Scale...

It’s safe to say artificial intelligence, and more specifically generative AI, has had a transformative impact on the market research sector. From the contentious emergence of synthetic participants t...

Blog Featured Image Header

How to Use Digital Ethnography and ...

In one way or another, we’ve all encountered social media spaces. Whether you’ve had a Facebook account since it first landed on the internet, created different accounts to keep up with relatives duri...

Grit Top 50 Logo

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Ayurveda Res
  • v.1(1); Jan-Mar 2010

Sample size calculation

Prashant kadam.

Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Parel, Mumbai - 400 012, India

Supriya Bhalerao

1 Department of Clinical Pharmacology, TNMC and BYL Nair Hospital, Mumbai Central, Mumbai - 400 008, India

I NTRODUCTION

One of the pivotal aspects of planning a clinical study is the calculation of the sample size. It is naturally neither practical nor feasible to study the whole population in any study. Hence, a set of participants is selected from the population, which is less in number (size) but adequately represents the population from which it is drawn so that true inferences about the population can be made from the results obtained. This set of individuals is known as the “sample.”

In a statistical context, the “population” is defined as the complete set of people (e.g., Indians), the “target population” is a subset of individuals with specific clinical and demographic characteristics in whom you want to study your intervention (e.g., males, between ages 45 and 60, with blood pressure between 140 mmHg systolic and 90 mmHg diastolic), and “sample” is a further subset of the target population which we would like to include in the study. Thus a “sample” is a portion, piece, or segment that is representative of a whole.

A TTRIBUTES OF A SAMPLE

  • Every individual in the chosen population should have an equal chance to be included in the sample.
  • Ideally, choice of one participant should not affect the chance of another's selection (hence we try to select the sample randomly – thus, it is important to note that random sampling does not describe the sample or its size as much as it describes how the sample is chosen).

The sample size, the topic of this article, is, simply put, the number of participants in a sample. It is a basic statistical principle with which we define the sample size before we start a clinical study so as to avoid bias in interpreting results. If we include very few subjects in a study, the results cannot be generalized to the population as this sample will not represent the size of the target population. Further, the study then may not be able to detect the difference between test groups, making the study unethical.

On the other hand, if we study more subjects than required, we put more individuals to the risk of the intervention, also making the study unethical, and waste precious resources, including the researchers’ time.

The calculation of an adequate sample size thus becomes crucial in any clinical study and is the process by which we calculate the optimum number of participants required to be able to arrive at ethically and scientifically valid results. This article describes the principles and methods used to calculate the sample size.

Generally, the sample size for any study depends on the:[ 1 ]

  • Acceptable level of significance
  • Power of the study
  • Expected effect size
  • Underlying event rate in the population
  • Standard deviation in the population.

Some more factors that can be considered while calculating the final sample size include the expected drop-out rate, an unequal allocation ratio, and the objective and design of the study.[ 2 ]

L EVEL OF SIGNIFICANCE

Everyone is familiar with the “ p ” value. This is the “level of significance” and prior to starting a study we set an acceptable value for this “ p .” When we say, for example, we will accept a p <0.05 as significant, we mean that we are ready to accept that the probability that the result is observed due to chance (and NOT due to our intervention) is 5%. To put it in different words, we are willing to accept the detection of a difference 5 out of 100 times when actually no difference exists (i.e., get a “false positive” result). Conventionally, the p value of 5% ( p = 0.05) or 1% ( p = 0.01), which means 5% (or 1%) chance of erroneously reporting a significant effect is accepted.

Sometimes, and exactly conversely, we may commit another type of error where we fail to detect a difference when actually there is a difference. This is called the Type II error that detects a false negative difference, as against the one mentioned above where we detect a false positive difference when no difference actually exists or the Type I error. We must decide what is the false negative rate we are willing to accept to make our study adequately powered to accept or reject our null hypothesis accurately.

This false negative rate is the proportion of positive instances that were erroneously reported as negative and is referred to in statistics by the letter β. The “power” of the study then is equal to (1 –β) and is the probability of failing to detect a difference when actually there is a difference. The power of a study increases as the chances of committing a Type II error decrease.

Usually most studies accept a power of 80%. This means that we are accepting that one in five times (that is 20%) we will miss a real difference. Sometimes for pivotal or large studies, the power is occasionally set at 90% to reduce to 10% the possibility of a “false negative” result.

E XPECTED EFFECT SIZE

We can understand the concept of “effect size” from day-to-day examples. If the average weight loss following one diet program is 20 kg and following another is 10 kg, the absolute effect size would be 10 kg. Similarly, one can claim that a specific teaching activity brings about a 10% improvement in examination scores. Here 10 kg and 10% are indicators of the claimed effect size.

In statistics, the difference between the value of the variable in the control group and that in the test drug group is known as effect size. This difference can be expressed as the absolute difference or the relative difference, e.g., in the weight loss example above, if the weight loss in the control group is 10 kg and in the test group it is 20 kg, the absolute effect size is 10 kg and the relative reduction with the test intervention is 10/20, or 50%.

We can estimate the effect size based on previously reported or preclinical studies. It is important to note that if the effect size is large between the study groups then the sample size required for the study is less and if the effect size between the study groups is small, the sample size required is large. In the case of observational studies, for example, if we want to find an association between smoking and lung cancer, since earlier studies have shown that there is a large effect size, a smaller sample would be needed to prove this effect. If on the other hand we want to find out the association between smoking and getting brain tumor, where the “effect” is unknown or small, the sample size required to detect an association would be larger.

U NDERLYING EVENT RATE IN THE POPULATION

The underlying event rate of the condition under study (prevalence rate) in the population is extremely important while calculating the sample size. This unlike the level of significance and power is not selected by convention. Rather, it is estimated from previously reported studies. Sometimes it so happens that after a trial is initiated, the overall event rate proves to be unexpectedly low and the sample size may have to be adjusted, with all statistical precautions.

S TANDARD DEVIATION (SD OR Σ )

Standard deviation is the measure of dispersion or variability in the data. While calculating the sample size an investigator needs to anticipate the variation in the measures that are being studied. It is easy to understand why we would require a smaller sample if the population is more homogenous and therefore has a smaller variance or standard deviation. Suppose we are studying the effect of an intervention on the weight and consider a population with weights ranging from 45 to 100 kg. Naturally the standard deviation in this group will be great and we would need a larger sample size to detect a difference between interventions, else the difference between the two groups would be masked by the inherent difference between them because of the variance. If on the other hand, we were to take a sample from a population with weights between 80 and 100 kg we would naturally get a tighter and more homogenous group, thus reducing the standard deviation and therefore the sample size.

S AMPLE SIZE CALCULATION

There are several methods used to calculate the sample size depending on the type of data or study design. The sample size is calculated using the following formula:

where n is the required sample size. For

Z α , Z is a constant (set by convention according to the accepted α error and whether it is a one-sided or two-sided effect) as shown below:

α-error5%1%0.1%
2-sided1.962.57583.2905
1-sided1.652.33

For Z1-,β,Z is a constant set by convention according to power of the study as shown below:

Power80%85%90%95%
Value0.84161.03641.28161.6449

In the above-mentioned formula σ is the standard deviation (estimated) and Δ the difference in effect of two interventions which is required (estimated effect size).

This gives the number of sample per arm in a controlled clinical trial.

This issue of the Journal has an article describing the benefits of ayurvedic treatment AyTP in patients of migraine in an open uncontrolled trial design.[ 3 ] If anyone wishes to confirm these results using a randomized controlled trial design where the effect of the ayurvedic intervention will be compared to standard of care in headache as measured by VAS how would we plan the sample size?

As seen above, we need the following values: Z α, Z 1-β, σ, standard deviation (estimated), and Δ, the difference in effect of two interventions. Let us assume we will accept a p <0.05 as acceptable and a study with 80% power; using the above tables, we get the following values: Z α, is 1.96 (in this case we will be using a two-tailed test because the results could be bidirectional). Z 1-β, is 0.8416. The standard deviation (based on the data in the published paper) would be approximately 0.7. For Δ, the paper describes that the ayurvedic therapy has given a 35% effect. Previously it has been reported that sumatriptan at 50 mg improves headache by 50%.[ 4 ] Thus, the effect size would be 15% (i.e., 0.15).

The sample size for the new study will be

= 362 per arm.

Calculating for a 10% drop-out rate one would need to complete approximately 400 patients per arm to be able to say with any degree of confi dence whether a difference exists between the two treatments.

L IMITATIONS OF THE CALCULATED SAMPLE SIZE

The sample size calculated using the above formula is based on some conventions (Type I and II errors) and few assumptions (effect size and standard variation).

The sample size ALWAYS has to be calculated before initiating a study and as far as possible should not be changed during the study course.

The sample size calculation is also then influenced by a few practical issues, e.g., administrative issues and costs.

Source of Support: Nil

Conflict of Interest: None decleared

R EFERENCES

COMMENTS

  1. Sample Size Calculator

    Sample Size Calculation. Sample size is a statistical concept that involves determining the number of observations or replicates (the repetition of an experimental condition used to estimate the variability of a phenomenon) that should be included in a statistical sample. It is an important aspect of any empirical study requiring that ...

  2. Sample Size Calculator for thesis (MD/MS/DNB)

    SAMPLE SIZE CALCULATION FOR THESIS (MD/MS/DNB) Sample Size Calculator. Determination of Sample Size From Number of cases in pilot Study. Determination of Sample Size from a pilot study is the easiest way of determining sample size for Your study. In this type of sample size determination 3 values can be obtained from the sample size taken from ...

  3. Sample size calculator

    For achieving an 90% power (i.e., ) at the 5% level of significance (i.e., ), the sample size to detect an odds ratio of 1.5 (i.e., controls by incorporating the continuity correction. Two-sided (Unchecking the checkbox will perform the sample estimation for a one-sided test.) Dupont WD.

  4. Sample Size Calculator

    This calculator uses a number of different equations to determine the minimum number of subjects that need to be enrolled in a study in order to have sufficient statistical power to detect a treatment effect. 1. Before a study is conducted, investigators need to determine how many subjects should be included.

  5. How to Calculate Sample Size for Different Study Designs in Medical

    In this method a value E is calculated based on decided sample size. The value if E should lies within 10 to 20 for optimum sample size. If a value of E is less than 10 then more animal should be included and if it is more than 20 then sample size should be decreased. E = Total number of animals - Total number of groups.

  6. Sample Size Calculator

    You can use this free sample size calculator to determine the sample size of a given survey per the sample proportion, margin of error, and required confidence level. You can calculate the sample size in five simple steps: Choose the required confidence level from the dropdown menu; Input the margin of error

  7. Calculator finder

    Proportions - Sample size (Alternative) Prevalence - CI given N. Risk ratio - CI given N. Survival Analysis - Sample Size. Statistical calculators, sample size, free, confidence interval, proportion, mean.

  8. Sample Size Calculator

    This calculator allows you to determine an appropriate sample size for your study, given different combinations of confidence, precision and variability. For large populations, it uses Cochran's equation to perform the calculation. For small populations of a known size, it uses Cochran's equation together with a population correction to ...

  9. Sample Size Calculation and Sample Size Justification

    The sample size/power analysis calculator then presents the write-up with references which can easily be integrated in your dissertation document. Click here for a sample. For questions about these or any of our products and services, please email [email protected] or call 877-437-8622.

  10. Sample Size Calculator

    Calculating sample size involves considering several factors, including your confidence level, minimum detectable effect, and baseline conversion rate. Inputting these parameters into a sample size calculator helps you determine the minimum number of participants you need to detect a meaningful effect with a certain degree of certainty.

  11. Sample size determination: A practical guide for health researchers

    2.2 Sample size calculation using software programs. Sample size calculation need not be done manually, and there are several free-of-charge software tools that can assist in the calculation. For example, OpenEpi 12 (an open-source online calculator) and G*Power 13 (a statistical software package) are commonly used for sample size calculations.

  12. How to Determine Sample Size for a Research Study

    2.58. Put these figures into the sample size formula to get your sample size. Here is an example calculation: Say you choose to work with a 95% confidence level, a standard deviation of 0.5, and a confidence interval (margin of error) of ± 5%, you just need to substitute the values in the formula: ( (1.96)2 x .5 (.5)) / (.05)2.

  13. Sample size calculation: Basic principles

    If a difference of 15 mmHg in MAP is considered between the phenylephrine and the placebo group as clinically significant (μ1− μ2) and be detected with 80% power and a significance level alpha of 0.05. [ 7] n = 2 × ( [1.96 + 0.842] 2 × 20 2 )/15 2 = 27.9. That means 28 subjects per group is the sample size.

  14. How To Calculate Sample Size Using a Sample Size Formula

    Now that you know what goes into determining sample size, you can easily calculate sample size online. Consider using a sample size calculator to ensure accuracy. Or, calculate it the old-fashioned way: by hand. Below, find two sample size calculations - one for the known population proportion and one for the unknown population.

  15. A Step-by-Step Process on Sample Size Determination for Medical

    Introduction. Sample size calculation or estimation is an important consideration which necessitate all researchers to pay close attention to when planning a study, which has also become a compulsory consideration for all experimental studies ().Moreover, nowadays, the selection of an appropriate sample size is also drawing much attention from researchers who are involved in observational ...

  16. Sample size determination: A practical guide for health researchers

    For sample size estima-tion, researchers need to (1) provide information regarding the statistical analysis to be applied, (2) determine acceptable precision levels, (3) decide on study power, (4) specify the confidence level, and (5) determine the magnitude of practical significance differences (effect size).

  17. SampleSizePlanner: A Tool to Estimate and Justify Sample Size for Two

    To calculate an appropriate sample size for testing whether the two groups are practically equivalent, we used the TOST (Schuirmann, 1987) method. We used an α of .05. We set the aimed TPR to be 0.8 because [1) it is the common standard in the field; 2) it is the journal publishing requirement]. We consider all effect sizes below 0.2 ...

  18. How to Determine Sample Size

    4) Use best practice guidelines to calculate sample size. There are many established guidelines and formulas that can help you in determining the right sample size. The easiest way to define your sample size is using a sample size calculator, or you can use a manual sample size calculation if you want to test your math skills. Cochran's ...

  19. How to Determine Sample Size in Research

    This can be done using an online sample size calculator or with paper and pencil. 1. Find your Z-score. Next, you need to turn your confidence level into a Z-score. Here are the Z-scores for the most common confidence levels: 90% - Z Score = 1.645. 95% - Z Score = 1.96. 99% - Z Score = 2.576.

  20. Sample size: how many participants do I need in my research?

    CHART 2. Sample size calculation to estimate the frequency (prevalence) of sunscreen use in the population, considering different scenarios but keeping the significance level (95%) and the design effect (1.0) constant. Target population. Prevalence (p) of outcome. Sunscreen use at work p=10%.

  21. Disrupted brain functional connectivity as early signature in

    To calculate the PLV, two time series are first spectrally decomposed at a given frequency, f 0, to obtain an instantaneous phase estimate at each time point. Phase synchronization between two ...

  22. The Quantitative Research Sample Size Calculator

    This brief guide explains the terms used in our sample size calculator, in addition to providing recommended values for optimum results. Sample Size: Your sample size is the amount of consumers in your target population that you will be researching. This calculator provides a recommended sample size - i.e. the minimum amount of consumers you ...

  23. Sample size calculator, formula, and examples

    Calculate your sample size with our calculation tool. Learn the sample size formula and how to determine the best sample size for your study. Products. Product Overview. SurveyMonkey is built to handle every use case and need. Explore our product to learn how SurveyMonkey can work for you.

  24. Sample size calculation

    Go to: S AMPLE SIZE CALCULATION. There are several methods used to calculate the sample size depending on the type of data or study design. The sample size is calculated using the following formula: n = 2(Za + Z1-β)2σ2, Δ2. where n is the required sample size. For.