findings of a research study are extrapolated to

What Is Extrapolation?

Extrapolation is a statistical technique used in data science to estimate values of data points beyond the range of the known values in the data set.

Extrapolation is an inexpensive and effective method you can use to predict future values and trends in data, as well as gain insight into the behavior of complex environments. Extrapolation is especially helpful for time series and geospatial analysis due to the technique’s ability to take into account the impact of temporal and spatial factors on the data.

Using extrapolation techniques, you can calculate unobserved values by extending a known sequence of values in the data set .

Extrapolation vs. Interpolation: What’s the Difference?

Extrapolation is often mistaken for interpolation. The two use the same techniques to estimate unknown values but differ in some key areas. If the estimated values are derived within two known values, then it’s an interpolation. However, if the predicted values fall outside of the data set, then it’s extrapolation.

More From the Built In Tech Dictionary What Is Statistical Analysis?

Common Extrapolation Methods

There are different types of extrapolation for predicting and evaluating trends in data. The following two are the most widely used extrapolation methods.

Linear Extrapolation : This is the most basic form of extrapolation that uses a linear equation to predict future outcomes. This method is best suited for predictions close to the given data. We simply draw a tangent line from the last point to extend it beyond the known values.
Polynomial Extrapolation : This method uses a polynomial equation to make predictions about future values. We use polynomial extrapolation when the data points exhibit a non-linear trend. This is more complex than linear extrapolation and we can use it to make more accurate predictions.

More From This Expert 4 Tools to Speed Up Exploratory Data Analysis (EDA) in Python

How Does Extrapolation Work?

Extrapolation is basically a forecasting method common in time series analysis. The following example uses linear extrapolation to predict sales.

Let’s take an example of a company’s sales in 2020 and 2021, then extrapolate what the sales will be in 2022.

To find the value of 2022 using extrapolation, with the given sales records of the past two years, we first need to calculate the slope.

m = (y2 - y1) / (x2 - x1)

After that, we apply a line equation.

(y = y1 + m · (x - x1))

We can then find the extrapolated value for 2022 by plugging the values into the equations above. We conclude that sales will be $15,086.

What Are the Benefits of Extrapolation?

Extrapolating is a powerful tool to help us make data-informed predictions and understand trends. Here are some reasons why we use extrapolation methods.

If we’re worried that expert forecasts are biased but we don’t know much about the situation, extrapolation might be the best option.
Extrapolation is inexpensive and straightforward, which means you can run the modeling as much as you need to in order to have multiple predictions.
When you’re looking at multiple scenarios that are an important element in the forecast, such as economic trends and change policies, extrapolation can help the process.
Extrapolation can help identify potential risks or future opportunities.
Extrapolation helps you identify patterns in data and make informed decisions.

What Are the Risks of Extrapolation?

Although the process of extrapolation is simple and straightforward, its accuracy and reliability depend on the trends present in the data set. Thus, careful consideration of the data set and the values it contains can help you mitigate larger errors in forecasting future trends. It’s also essential to use various methods when extrapolating to help reduce errors and ensure that the extrapolation is based on a complete and accurate understanding of the data.

Recent Big Data Articles

All Subjects

study guides for every class

That actually explain what's on your next test, extrapolation, from class:, causal inference.

Extrapolation is the process of estimating unknown values by extending or projecting from known data points. This technique is crucial in understanding how results observed in a specific sample or experimental setting might apply to a broader population or different contexts, which relates closely to issues of external validity and the generalizability of findings.

congrats on reading the definition of Extrapolation . now let's actually learn it.

5 Must Know Facts For Your Next Test

Extrapolation can introduce errors if the relationship between variables changes outside the observed range of data, potentially leading to misleading conclusions.
In machine learning for causal inference, extrapolation is often necessary when applying learned models to new datasets, but caution must be exercised to avoid overestimating the model's applicability.
External validity is fundamentally linked to extrapolation, as it assesses whether study results are applicable to settings or populations beyond those studied.
Understanding the limits of extrapolation is critical; for instance, applying results from a controlled environment directly to real-world situations can yield inaccurate predictions.
The validity of extrapolated conclusions heavily depends on the robustness of the underlying causal assumptions made during analysis.

Review Questions

Extrapolation can significantly impact the reliability of findings because it involves making predictions about unobserved data based on known values. If the underlying relationships remain stable across contexts, then extrapolated conclusions may hold true. However, if those relationships change or do not apply outside the studied sample, it can lead to erroneous interpretations and flawed decision-making. Thus, careful consideration of the context and assumptions is vital when relying on extrapolated results.
Extrapolating machine learning models poses several challenges, including overfitting and potential changes in underlying data distributions. When a model is overfit to training data, it may not perform well when applied to new datasets due to its lack of generalization. Strategies such as cross-validation, regularization techniques, and ensuring diverse training datasets can help improve model robustness and accuracy. Additionally, conducting sensitivity analyses can assess how variations in input affect output predictions, helping validate extrapolations.
External validity is inherently linked to extrapolation as it assesses whether research findings can be applied beyond the specific conditions of a study. If researchers fail to establish strong external validity, their ability to extrapolate results confidently to broader populations or different contexts becomes compromised. This impacts generalizability since findings that cannot be reliably extrapolated may misrepresent real-world scenarios or lead to ineffective interventions. Therefore, establishing external validity through careful study design and consideration of contextual factors is crucial for valid extrapolation.

Related terms

Generalization : The process of applying findings from a study sample to a larger population, which relies on the assumption that the sample accurately represents the population.

Overfitting : A modeling error that occurs when a machine learning model learns the details and noise in the training data to the extent that it negatively impacts its performance on new data.

Transferability : The extent to which findings from one context can be applied to another, often assessed in qualitative research settings.

" Extrapolation " also found in:

Subjects ( 34 ).

AP Statistics
Advanced quantitative methods
Algebra and Trigonometry
Approximation Theory
Blockchain and Cryptocurrency
Business Analytics
Business Valuation
College Algebra
College Introductory Statistics
Computational Mathematics
Contemporary Mathematics for Non-Math Majors
Forecasting
Honors Pre-Calculus
Honors Statistics
Intermediate Financial Accounting 2
Intro to Business Statistics
Introduction to Demographic Methods
Introduction to Econometrics
Introduction to Film Theory
Mathematical Biology
Mathematical Fluid Dynamics
Numerical Analysis I
Numerical Analysis for Data Science and Statistics
Numerical Solution of Differential Equations
Population and Society
Preparatory Statistics
Principles of Finance
Programming for Mathematical Applications
Screenwriting II
Thermodynamics I
Variational Analysis

© 2024 Fiveable Inc. All rights reserved.

Ap® and sat® are trademarks registered by the college board, which is not affiliated with, and does not endorse this website..

What is extrapolation in data science, interpolation vs. extrapolation, extrapolation methods, what are extrapolation statistics, how to extrapolate numbers, extrapolation examples, what is extrapolation everything you need to know.

The importance of statistics is often overlooked, but it's hard to argue that they don't play a vital role in our lives.

They help us make decisions and understand what's going on around us. We use them to calculate the risk of an operation or treatment, determine whether we need an umbrella today, and even decide what kind of ice cream flavor to get at the grocery store.

Statistics are everywhere, and they're essential because they allow us to make informed decisions about our lives.

Extrapolation is the process of inferring values outside the range of the existing data to make predictions. Extrapolation is one of the essential methods that data scientists use to predict future trends and outcomes.

When looking at a dataset, you can use extrapolation to predict what might happen in the future. For example, suppose you have historical data about how people vote for different political parties at election time. In that case, you could use that information to predict what will happen in upcoming elections.

Your Data Analytics Career is Around The Corner!

Interpolation is the process of estimating a value between known values. Extrapolation is the process of evaluating a value beyond known values.

For example, if you wanted to estimate how much money you'll make when you retire, you might use interpolation to get an estimate. Look at how much money you make now and add it up until retirement.

On the other hand, if you wanted to predict how many people will be using your product in 2020, it might be more helpful to extrapolate from what we know now and project how that will change over time.

Interpolation can help predict things that are likely to happen (such as future events) but not necessarily ones that are guaranteed to happen (like winning the lottery).

Extrapolation can be used to make predictions about any kind of event—even if it's unlikely or impossible—as long as enough data is available for us to make those predictions confidently.

Linear extrapolation is a method of estimating the value of a variable based on its current value and the values of several other variables. It gives good results when the predicted value is close to the available data but can be more accurate when it is far from the available data.

It is because linear extrapolation assumes that there will be no change in the relationship between two variables as you go farther away from their current values.

Linear extrapolation can be done using a linear equation or function, which allows you to draw a tangent line at the endpoints of your graph and extend it beyond the limits of your data set.

The method of Lagrange interpolation is used to find the polynomial curve between known values or near endpoints of a function. It uses Newton's system of finite series to have the data. The resulting polynomial can be used in extrapolating the data.

A conic section is a curve obtained using five points near a given data set. When the data set involves a circle or ellipse, the curve will always curve back to itself. However, when the data set involves a parabolic or hyperbolic curve, it may not curve back to itself as it is relative to the x-axis.

French Curve

French curve extrapolation is a method that uses an existing set of data to predict the variable's value at a point not included in the original data.

It is useful when there is a need to extrapolate from a small number of data points because it does not require any assumptions about the relationship between the variables.

Geometric Extrapolation With Error Prediction

Geometric extrapolation is a method of estimating the value of a variable at a time in the future based on how the variable's values have changed over time. It is typically used when the estimated variable has a known relationship to another variable and is often applied to stock prices.

Extrapolation Statistics are used to predict future behavior based on past data. They can be used to forecast the number of customers you might expect at a given time or place or how much money you will make in a given period. They are used in many fields, such as marketing, finance, and sports.

Extrapolation statistics use mathematical formulas that calculate the probability that a particular event will occur based on other events that have happened before. These events are called "input variables." The mathematical formula is then used to predict what will happen next or what will happen after the input variable has changed slightly.

These statistics can be handy when making important decisions about things like marketing campaigns, sales goals, or budgeting for equipment purchases.

Extrapolation Formula

In the case of linear exploration, the extrapolation of a point to be calculated using two endpoints (x1, y1) and (x2, y2) in the linear graph when the value of x is given, then a formula that can be used is as follows:

Extrapolation formula for linear graph:

Extrapolation is taking a known quantity and projecting it into the future. It can be done when analyzing historical data or making predictions based on current events.

For example, if you wanted to know how much money will be spent on Christmas presents this year, you could use past data and extrapolate that into the future. You could also use current data, such as how many people have been purchasing gifts online, and extrapolate that into the future (for example, predicting that more people will shop online next year).

Extrapolation has two primary uses: forecasting and trend analysis . Forecasting involves predicting future outcomes based on past information and trends. Trend analysis consists of identifying data trends over time and using these trends to predict future results.

Enroll in the Professional Certificate Program in Data Analytics to learn over a dozen of data analytics tools and skills, and gain access to masterclasses by Purdue faculty and IBM experts, exclusive hackathons, Ask Me Anything sessions by IBM.

If you're looking to boost your career, this program is for you.

The Data Analytics Certification Program is designed to teach you how to use data analytics and predictive modeling to solve real-world problems in your organization. You'll learn how to tackle challenges with a team of experts focused on your success—and you'll get support from industry leaders like IBM.

The program features master classes, project-based learning, and hands-on experience that will prepare you for a career in data analytics.

1. What Does Extrapolation Mean?

Extrapolation is the process of making predictions based on current or past data.

It's a way of using existing information to make an educated guess about what might happen in the future.

2. What is Extrapolation With an Example?

Extrapolation is a technique that uses reasoning to predict future events by extrapolating from past occurrences. For example, if you've been keeping track of the number of cups of coffee you drink per week, and it's been steadily increasing over time, you can use extrapolation to predict that you'll drink even more next week.

3. What is Extrapolation in Statistics?

Extrapolation is a statistical technique that predicts future trends based on existing data. Based on past data, it can predict future sales, profits, or other financial performance .

4. What is an Extrapolation on a Graph?

An extrapolation is a graph that goes beyond the limits of the collected data. For example, if you're looking at a graph of stock prices over time, and one point on the graph shows that stocks went up by $50 when they were worth $5,000 each, then an extrapolation would be to assume that if you sold your stocks now for $500 each (which is higher than any point on your graph), you'd make $50.

5. What Is Another Word for Extrapolation?

Extrapolation is another word for prediction.

It describes the process of guessing what might happen in the future based on past events and other factors.

6 Why Do We Use Extrapolation?

An extrapolation is a trend-based approach to predicting what will happen in the future based on what has happened in the past.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Program Name	Duration	Fees
Cohort Starts:	32 weeks	€ 1,790
Cohort Starts:	11 Months	€ 3,790
Cohort Starts:	11 months	€ 2,290
Cohort Starts:	11 months	€ 2,790
Cohort Starts:	8 months	€ 2,790
Cohort Starts:	14 weeks	€ 1,999
	11 months	€ 1,099
	11 months	€ 1,099

Get Affiliated Certifications with Live Class programs

Post graduate program in data analytics.

Post Graduate Program certificate and Alumni Association membership
Exclusive hackathons and Ask me Anything sessions by IBM

Data Analyst

Industry-recognized Data Analyst Master’s certificate from Simplilearn
Dedicated live sessions by faculty of industry experts
PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

What Is Generalizability In Research?

Data Collection

Generalizability is making sure the conclusions and recommendations from your research apply to more than just the population you studied. Think of it as a way to figure out if your research findings apply to a larger group, not just the small population you studied.

In this guide, we explore research generalizability, factors that influence it, how to assess it, and the challenges that come with it.

So, let’s dive into the world of generalizability in research!

Defining Generalizability

Generalizability refers to the extent to which a study’s findings can be extrapolated to a larger population. It’s about making sure that your findings apply to a large number of people, rather than just a small group.

Generalizability ensures research findings are credible and reliable. If your results are only true for a small group, they might not be valid.

Also, generalizability ensures your work is relevant to as many people as possible. For example, if you were to test a drug only on a small number of patients, you could potentially put patients at risk by prescribing the drug to all patients until you are confident that it is safe for everyone.

Factors Influencing Generalizability

Here are some of the factors that determine if your research can be adapted to a large population or different objects:

1. Sample Selection and Size

The size of the group you study and how you choose those people can affect how well your results can be applied to others. Think of it like asking one person out of a friendship group of 16 if a game is fun, doesn’t accurately represent the opinion of the group.

2. Research Methods and Design

Different methods have different levels of generalizability. For example, if you only observe people in a particular city, your findings may not apply to other locations. But if you use multiple methods, you get a better idea of the big picture.

3. Population Characteristics

Not everyone is the same. People from different countries, different age groups, or different cultures may respond differently. That’s why the characteristics of the people you’re looking at have a significant impact on the generalizability of the results.

4. Context and Environment

Think of your research as a weather forecast. A forecast of sunny weather in one location may not be accurate in another. Context and environment play a role in how well your results translate to other environments or contexts.

Internal vs. External Validity

You can only generalize a study when it has high validity, but there are two types of validity- internal and external. Let’s see the role they play in generalizability:

1. Understanding Internal Validity

Internal validity is a measure of how well a study has ruled out alternative explanations for its findings. For example, if a study investigates the effects of a new drug on blood pressure, internal validity would be high if the study was designed to rule out other factors that could affect blood pressure, such as exercise, diet, and other medications.

2. Understanding External Validity

External validity is the extent to which a study’s findings can be generalized to other populations, settings, and times. It focuses on how well your study’s results apply to the real world.

For example, if a new blood pressure-lowering drug were to be studied in a laboratory with a sample of young healthy adults, the study’s external validity would be limited. This is because the study doesn’t consider people outside the population such as older adults, patients with other medical conditions, and more.

3 . The Relationship Between Internal and External Validity

Internal validity and external validity are often inversely related. This means that studies with high internal validity may have lower external validity, and vice versa.

For example, a study that randomly assigns participants to different treatment groups may have high internal validity, but it may have lower external validity if the participants are not representative of the population of interest.

Strategies for Enhancing Generalizability

Several strategies enable you to enhance the generalizability of their findings, here are some of them:

1 . Random Sampling Techniques

This involves selecting participants from a population in a way that gives everyone an equal chance of being selected. This helps to ensure that the sample is representative of the population.

Let’s say you want to find out how people feel about a new policy. Randomly pick people from the list of people who registered to vote to ensure your sample is representative of the population.

2 . Diverse Sample Selection

Choose samples that are representative of different age groups, genders, races, ethnicities, and economic backgrounds. This helps to ensure that the findings are generalizable to a wider range of people.

3 . Careful Research Design

Meticulously design your studies to minimize the risk of bias and confounding variables. A confounding variable is a factor that makes it hard to tell the real cause of your results.

For example, you are studying the effect of a new drug on cholesterol levels. Even if you take a random sample of participants and randomly select them to receive either a new drug or placebo if you don’t control for the participant’s diet, your results could be misleading. You could be attributing cholesterol balance to drugs when it is due to their diet.

4 . Robust Data Collection Methods

Use robust data collection methods to minimize the risk of errors and biases. This includes using well-validated measures and carefully training data collectors.

For instance, an online survey tool could be used to conduct online polls on how voters change their minds during an election cycle rather than relying on phone interviews, which would make it harder to get repeat voters to participate in the study and review their views over time.

Challenges to Generalizability

1. sample bias .

Sample bias happens when the group you study doesn’t represent everyone you want to talk about. For example, if you’re researching ice cream preferences and only ask your friends, your results might not apply to everyone because your friends are not the only people who take ice cream.

2. Ethical Considerations

Ethical considerations can limit your research’s generalizability because it wouldn’t be right or fair. For example, it’s not ethical to test a new medicine on people without their permission.

3 . Resource Constraints

Having a limited budget for a project also restricts your research’s generalizability. For example, if you want to conduct a large-scale study but don’t have the resources, time, or personnel, you opt for a small-scale study, which could make your findings less likely to apply to a larger population.

4. Limitations of Research Methods

Tools are just as much a part of your research as the research itself. If you an ineffective tool, you might not be able to apply what you’ve learned to other situations.

Assessing Generalizability

Evaluating generalizability allows you to understand the implications of your findings and make realistic recommendations. Here are some of the most effective ways to assess generalizability:

Statistical Measures and Techniques

Several statistical tools and methods allow you to assess the generalizability of your study. Here are the top two:

Confidence Interval

A confidence interval is a range of values that is likely to contain the true population value. So if a researcher looks at a test and sees that the mean score is 78 with a 95% confidence interval of 70-80, they’re 95% sure that the actual population score is between 70-80.

The p-value indicates the likelihood that the results of the study, or more extreme results, will be obtained if the null hypothesis holds. A null hypothesis is the supposition that there is no association between the variables being analyzed.

A good example is a researcher surveying 1,000 college students to study the relationship between study habits and GPA. The researcher finds that students who study for more hours per week have higher GPAs.

The p-value below 0.05 indicates that there is a statistically significant association between study habits and GPA. This means that the findings of the study are not by coincidence.

Peer Review and Expert Evaluation

Reviewers and experts can look at sample selection, study design, data collection, and analysis methods to spot areas for improvement. They can also look at the survey’s results to see if they’re reliable and if they match up with other studies.

Transparency in Reporting

Clearly and concisely report the survey design, sample selection, data collection methods, data analysis methods, and findings of the survey. This allows other researchers to assess the quality of the survey and to determine whether the results are generalizable.

The Balance Between Generalizability and Specificity

Generalizability refers to the degree to which the findings of a study can be applied to a larger population or context. Specificity, on the other hand, refers to the focus of a study on a particular population or context.

a. When Generalizability Matters Most

Generalizability comes into play when you want to make predictions about the world outside of your sample. For example, you want to look at the impact of a new viewing restrictions policy on the population as a whole.

b. Situations Where Specificity is Preferred

Specificity is important when researchers want to gain a deep understanding of a specific group or phenomenon in detail. For example, if a researcher wants to study the experiences of people with a rare disease.

Finding the Right Balance Between Generalizability and Specificity

The right balance between generalizability and specificity depends on the research question.

Case 1- Specificity over Generalizability

Sometimes, you have to give up some of their generalizability to get more specific results. For example, if you are studying a rare genetic condition, you might not be able to get a sample that’s representative of the population.

Case 2- Generalizability over Specificity

In other cases, you may need to sacrifice some specificity to achieve greater generalizability. For example, when studying the effects of a new drug, you need a sample that includes a wide range of people with different characteristics.

Keep in mind that generalizability and specificity are not mutually exclusive. You can design studies that are both generalizable and specific.

Real-World Examples

Here are a few real-world examples of studies that turned out to be generalizable, as well as some that are not:

1. Case Studies of Research with High Generalizability

We’ve been talking about how important a generalizable study is and how to tell if your research is generalizable. Let’s take a look at some studies that have achieved this:

a. The Framingham Heart Study

This is a long-running study that has been tracking the health of over 15,000 participants since 1948. The study has provided valuable insights into the risk factors for heart disease, stroke, and other chronic diseases

The findings of the Framingham Heart Study are highly generalizable because the study participants were recruited from a representative sample of the general population.

b. The Cochrane Database of Systematic Reviews

This is a collection of systematic reviews that evaluate the evidence for the effectiveness of different healthcare interventions. The Cochrane Database of Systematic Reviews is a highly respected source of information for healthcare professionals and policymakers.

The findings of Cochrane reviews are highly generalizable because they are based on a comprehensive review of all available evidence.

2. Case Studies of Research with Limited Generalizability

Let’s look at some studies that would fail to prove their validity to the general population:

A study that examines the effects of a new drug on a small sample of participants with a rare medical condition. The findings of this study would not be generalizable to the general population because the study participants were not representative of the general population.
A study that investigates the relationship between culture and values using a sample of participants from a single country. The findings of this study would not be generalizable to other countries because the study participants were not representative of people from other cultures.

Implications of Generalizability in Different Fields

Research generalizability has significant effects in the real world, here are some ways to leverage it across different fields:

1. Medicine and Healthcare

Generalizability is a key concept of medicine and healthcare. For example, a single study that found a new drug to be effective in treating a specific condition in a limited number of patients might not apply to all patients.

Healthcare professionals also leverage generalizability to create guidelines for clinical practice. For example, a guideline for the treatment of diabetes may not be generalizable to all patients with diabetes if it is based on research studies that only included patients with a particular type of diabetes or a particular level of severity.

2. Social Sciences

Generalizability allows you to make accurate inferences about the behavior and attitudes of large populations. People are influenced by multiple factors, including their culture, personality, and social environment.

For example, a study that finds that a particular educational intervention is effective in improving student achievement in one school may not be generalizable to all schools.

3. Business and Economics

Generalizability also allows companies to conclude how customers and their competitors behave. Factors like economic conditions, consumer tastes, and tech trends can change quickly, so it’s hard to generalize results from one study to the next.

For example, a study that finds that a new marketing campaign is effective in increasing sales of a product in one region may not be generalizable to other regions.

The Future of Generalizability in Research

Let’s take a look at new and future developments geared at improving the generalizability of research:

1. Evolving Research Methods and Technologies

The evolution of research methods and technologies is changing the way that we think about generalizability. In the past, researchers were often limited to studying small samples of people in specific settings. This made it difficult to generalize the findings to the larger population.

Today, you can use various new techniques and technologies to gather data from a larger and more varied sample size. For example, online surveys provide you with a large sample size in a very short period.

2. The Growing Emphasis on Reproducibility

The growing emphasis on reproducibility is also changing the way that we think about generalizability. Reproducibility is the ability to reproduce the results of a study by following the same methods and using a similar sample.

For example, you publish a study that claims that a new drug is effective in treating a certain disease. Two other researchers replicated the study and confirmed the findings. This replication helps to build confidence in the findings of the original study and makes it more likely that the drug will be approved for use.

3. The Ongoing Debate on Generalizability vs. Precision

Generalizability refers to the ability to apply the findings of a study to a wider population. Precision refers to the ability to accurately measure a particular phenomenon.

For some researchers, generalizability matters more than accuracy because it means their findings apply to a larger number of people and have an impact on the real world. For others, accuracy matters more than generalization because it enables you to understand the underlying mechanisms of a phenomenon.

The debate over generalizability versus precision is likely to continue because both concepts are very important. However, it is important to note that the two concepts are not mutually exclusive. It is possible to achieve both generalizability and precision in research by using carefully designed methods and technologies.

Generalizability allows you to apply the findings of a study to a larger population. This is important for making informed decisions about policy and practice, identifying and addressing important social problems, and advancing scientific knowledge.

With more advanced tools such as online surveys, generalizability research is here to stay. Sign up with Formplus to seamlessly collect data from a global audience.

Connect to Formplus, Get Started Now - It's Free!

Case Studies of Research
External Validity
Generalizability
internal validity
Specificity
Moradeke Owa

Conversational Analysis in Research: Methods & Techniques

Communication patterns can reveal a great deal about our social interactions and relationships. But identifying and analyzing them can...

findings of a research study are extrapolated to

What is Retrieval Practice?

Learning something new is like putting a shelf of books in your brain. If you don’t take them out and read them again, you will probably...

What is Research Replicability in Surveys

Research replicability ensures that if one researcher does a study, another researcher could do the same study and get pretty similar...

Internal Validity in Research: Definition, Threats, Examples

In this article, we will discuss the concept of internal validity, some clear examples, its importance, and how to test it.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

This website may not work correctly because your browser is out of date. Please update your browser .

Extrapolate findings

An evaluation usually involves some level of generalising of the findings to other times, places or groups of people.

For many evaluations, this simply involves generalising from data about the current situation or the recent past to the future.

For example, an evaluation might report that a practice or program has been working well (finding), therefore it is likely to work well in the future (generalisation), and therefore we should continue to do it (recommendation). In this case, it is important to understand whether or not future times are likely to be similar to the time period of the evaluation. If the program had been successful because of support from another organisation, and this support was not going to continue, then it would not be correct to assume that the program would continue to succeed in the future.

For some evaluations, there are other types of generalising needed. Impact evaluations which aim to learn from the evaluation of a pilot to make recommendations about scaling up must be clear about the situations and people to whom results can be generalised.

There are often two levels of generalisation. For example, an evaluation of a new nutrition program in Ghana collected data from a random sample of villages. This allowed statistical generalisation to the larger population of villages in Ghana. In addition, because there was international interest in the nutrition program, many organisations, including governments in other countries, were interested to learn from the evaluation for possible implementation elsewhere.

Analytical generalisation involves making projections about the likely transferability of findings from an evaluation, based on a theoretical analysis of the factors producing outcomes and the effect of context.

Statistical generalisation involves statistically calculating the likely parameters of a population using data from a random sample of that population.

Horizontal evaluation is an approach that combines self-assessment by local participants and external review by peers.

Positive deviance (PD), a behavioural and social change approach, involves learning from those who find unique and successful solutions to problems despite facing the same challenges, constraints and resource deprivation as others.

Realist evaluation aims to identify the underlying generative causal mechanisms that explain how outcomes were caused and how context influences these.

This blog post and its associated replies, written by Jed Friedman for the World Bank, describes a process of using analytic methods to overcome some of the assumptions that must be made when extrapolating results from evaluations to other settings.

<< Synthesise data across evaluations
Report & Support Use of findings >>

Expand to view all resources related to 'Extrapolate findings'

Qualitative research & evaluation methods: Integrating theory and practice
Randomised control trials for the impact evaluation of development initiatives: a statistician's point of view

'Extrapolate findings' is referenced in:

52 weeks of BetterEvaluation: Week 34 Generalisations from case studies?

Framework/Guide

Communication for Development (C4D) : C4D: Generalise findings
Analytical generalisation
Statistical generalisation

5 Methods of Data Collection for Quantitative Research

In this blog, read up on five different ways to approach data collection for quantitative studies - online surveys, offline surveys, interviews, etc.

mrx glossary quantitative data collection

Jan 29, 2024

quantilope is the Consumer Intelligence Platform for all end-to-end research needs

In this blog, read up on five different data collection techniques for quantitative research studies.

Quantitative research forms the basis for many business decisions. But what is quantitative data collection, why is it important, and which data collection methods are used in quantitative research?

What is quantitative data collection?

Quantitative data collection is the gathering of numeric data that puts consumer insights into a quantifiable context. It typically involves a large number of respondents - large enough to extract statistically reliable findings that can be extrapolated to a larger population.

The actual data collection process for quantitative findings is typically done using a quantitative online questionnaire that asks respondents yes/no questions, ranking scales, rating matrices, and other quantitative question types. With these results, researchers can generate data charts to summarize the quantitative findings and generate easily digestible key takeaways.

Back to Table of Contents

The importance of quantitative data collection

Quantitative data collection can confirm or deny a brand's hypothesis, guide product development, tailor marketing materials, and much more. It provides brands with reliable information to make decisions off of (i.e. 86% like lemon-lime flavor or just 12% are interested in a cinnamon-scented hand soap).

Compared to qualitative data collection, quantitative data allows for comparison between insights given higher base sizes which leads to the ability to have statistical significance. Brands can cut and analyze their dataset in a variety of ways, looking at their findings among different demographic groups, behavioral groups, and other ways of interest. It's also generally easier and quicker to collect quantitative data than it is to gather qualitative feedback, making it an important data collection tool for brands that need quick, reliable, concrete insights.

In order to make justified business decisions from quantitative data, brands need to recruit a high-quality sample that's reflective of their true target market (one that's comprised of all ages/genders rather than an isolated group). For example, a study into usage and attitudes around orange juice might include consumers who buy and/or drink orange juice at a certain frequency or who buy a variety of orange juice brands from different outlets.

Methods used for quantitative data collection

So knowing what quantitative data collection is and why it's important , how does one go about researching a large, high-quality, representative sample ?

Below are five examples of how to conduct your study through various data collection methods :

Online quantitative surveys

Online surveys are a common and effective way of collecting data from a large number of people. They tend to be made up of closed-ended questions so that responses across the sample are comparable; however, a small number of open-ended questions can be included as well (i.e. questions that require a written response rather than a selection of answers in a close-ended list). Open-ended questions are helpful to gather actual language used by respondents on a certain issue or to collect feedback on a view that might not be shown in a set list of responses).

Online surveys are quick and easy to send out, typically done so through survey panels. They can also appear in pop-ups on websites or via a link embedded in social media. From the participant’s point of view, online surveys are convenient to complete and submit, using whichever device they prefer (mobile phone, tablet, or computer). Anonymity is also viewed as a positive: online survey software ensures respondents’ identities are kept completely confidential.

To gather respondents for online surveys, researchers have several options. Probability sampling is one route, where respondents are selected using a random selection method. As such, everyone within the population has an equal chance of getting selected to participate.

There are four common types of probability sampling .

Simple random sampling is the most straightforward approach, which involves randomly selecting individuals from the population without any specific criteria or grouping.
Stratified random sampling divides the population into subgroups (strata) and selects a random sample from each stratum. This is useful when a population includes subgroups that you want to be sure you cover in your research.
Cluster sampling divides the population into clusters and then randomly selects some of the clusters to sample in their entirety. This is useful when a population is geographically dispersed and it would be impossible to include everyone.
Systematic sampling begins with a random starting point and then selects every nth member of the population after that point (i.e. every 15th respondent).

Learn how to leverage AI to help generate your online quantitative survey inputs:

While online surveys are by far the most common way to collect quantitative data in today’s modern age, there are still some harder-to-reach respondents where other mediums can be beneficial; for example, those who aren’t tech-savvy or who don’t have a stable internet connection. For these audiences, offline surveys may be needed.

Offline quantitative surveys

Offline surveys (though much rarer to come across these days) are a way of gathering respondent feedback without digital means. This could be something like postal questionnaires that are sent out to a sample population and asked to return the questionnaire by mail (like the Census) or telephone surveys where questions are asked of respondents over the phone.

Offline surveys certainly take longer to collect data than online surveys and they can become expensive if the population is difficult to reach (requiring a higher incentive). As with online surveys, anonymity is protected, assuming the mail is not intercepted or lost.

Despite the major difference in data collection to an online survey approach, offline survey data is still reported on in an aggregated, numeric fashion.

In-person interviews are another popular way of researching or polling a population. They can be thought of as a survey but in a verbal, in-person, or virtual face-to-face format. The online format of interviews is becoming more popular nowadays, as it is cheaper and logistically easier to organize than in-person face-to-face interviews, yet still allows the interviewer to see and hear from the respondent in their own words. Though many interviews are collected for qualitative research, interviews can also be leveraged quantitatively; like a phone survey, an interviewer runs through a survey with the respondent, asking mainly closed-ended questions (yes/no, multiple choice questions, or questions with rating scales that ask how strongly the respondent agrees with statements). The advantage of structured interviews is that the interviewer can pace the survey, making sure the respondent gives enough consideration to each question. It also adds a human touch, which can be more engaging for some respondents. On the other hand, for more sensitive issues, respondents may feel more inclined to complete a survey online for a greater sense of anonymity - so it all depends on your research questions, the survey topic, and the audience you're researching.

Observations

Observation studies in quantitative research are similar in nature to a qualitative ethnographic study (in which a researcher also observes consumers in their natural habitats), yet observation studies for quant research remain focused on the numbers - how many people do an action, how much of a product consumer pick up, etc.

For quantitative observations, researchers will record the number and types of people who do a certain action - such as choosing a specific product from a grocery shelf, speaking to a company representative at an event, or how many people pass through a certain area within a given timeframe. Observation studies are generally structured, with the observer asked to note behavior using set parameters. Structured observation means that the observer has to hone in on very specific behaviors, which can be quite nuanced. This requires the observer to use his/her own judgment about what type of behavior is being exhibited (e.g. reading labels on products before selecting them; considering different items before making the final choice; making a selection based on price).

Document reviews and secondary data sources

A fifth method of data collection for quantitative research is known as secondary research : reviewing existing research to see how it can contribute to understanding a new issue in question. This is in contrast to the primary research methods above, which is research that is specially commissioned and carried out for a research project.

There are numerous secondary data sources that researchers can analyze such as public records, government research, company databases, existing reports, paid-for research publications, magazines, journals, case studies, websites, books, and more.

Aside from using secondary research alone, secondary research documents can also be used in anticipation of primary research, to understand which knowledge gaps need to be filled and to nail down the issues that might be important to explore further in a primary research study. Back to Table of Contents

Example of a survey showing quantitative data

The below study shows what quantitative data might look like in a final study dashboard, taken from quantilope's Sneaker category insights study .

The study includes a variety of usage and attitude metrics around sneaker wear, sneaker purchases, seasonality of sneakers, and more. Check out some of the data charts below showing these quantitative data findings - the first of which even cuts the quantitative data findings by demographics.

Beyond these basic usage and attitude (or, descriptive) data metrics, quantitative data also includes advanced methods - such as implicit association testing. See what these quantitative data charts look like from the same sneaker study below:

These are just a few examples of how a researcher or insights team might show their quantitative data findings. However, there are many ways to visualize quantitative data in an insights study, from bar charts, column charts, pie charts, donut charts, spider charts, and more, depending on what best suits the story your data is telling. Back to Table of Contents

Strengths and weaknesses of quantitative data collection

quantitative data is a great way to capture informative insights about your brand, product, category, or competitors. It's relatively quick, depending on your sample audience, and more affordable than other data collection methods such as qualitative focus groups. With quantitative panels, it's easy to access nearly any audience you might need - from something as general as the US population to something as specific as cannabis users . There are many ways to visualize quantitative findings, making it a customizable form of insights - whether you want to show the data in a bar chart, pie chart, etc.

For those looking for quick, affordable, actionable insights, quantitative studies are the way to go.

quantitative data collection, despite the many benefits outlined above, might also not be the right fit for your exact needs. For example, you often don't get as detailed and in-depth answers quantitatively as you would with an in-person interview, focus group, or ethnographic observation (all forms of qualitative research). When running a quantitative survey, it’s best practice to review your data for quality measures to ensure all respondents are ones you want to keep in your data set. Fortunately, there are a lot of precautions research providers can take to navigate these obstacles - such as automated data cleaners and data flags. Of course, the first step to ensuring high-quality results is to use a trusted panel provider. Back to Table of Contents

Quantitative research typically needs to undergo statistical analysis for it to be useful and actionable to any business. It is therefore crucial that the method of data collection, sample size, and sample criteria are considered in light of the research questions asked.

quantilope’s online platform is ideal for quantitative research studies. The online format means a large sample can be reached easily and quickly through connected respondent panels that effectively reach the desired target audience. Response rates are high, as respondents can take their survey from anywhere, using any device with internet access.

Surveys are easy to build with quantilope’s online survey builder. Simply choose questions to include from pre-designed survey templates or build your own questions using the platform’s drag & drop functionality (of which both options are fully customizable). Once the survey is live, findings update in real-time so that brands can get an idea of consumer attitudes long before the survey is complete. In addition to basic usage and attitude questions, quantilope’s suite of advanced research methodologies provides an AI-driven approach to many types of research questions. These range from exploring the features of products that drive purchase through a Key Driver Analysis , compiling the ideal portfolio of products using a TURF , or identifying the optimal price point for a product or service using a Price Sensitivity Meter (PSM) .

Depending on the type of data sought it might be worth considering a mixed-method approach, including both qual and quant in a single research study. Alongside quantitative online surveys, quantilope’s video research solution - inColor , offers qualitative research in the form of videoed responses to survey questions. inColor’s qualitative data analysis includes an AI-drive read on respondent sentiment, keyword trends, and facial expressions.

To find out more about how quantilope can help with any aspect of your research design and to start conducting high-quality, quantitative research, get in touch below:

Get in touch to learn more about quantitative research studies!

quantilope & WIRe: How Automated Insights Drive A&W's Pricing Strategy

Discover how A&W's application of advanced research methods has enabled its insights team to deliver data-driven recommendations with actio...

September 09, 2024

The Essential Guide to Idea Screening: Turning Concepts Into Reality

In this guide, we'll break down the essentials of idea screening, starting by defining its purpose and exploring the top techniques for suc...

September 04, 2024

The New quantilope Brand Experience

Introducing quantilope's new brand experience featuring a brighter, fresher look and feel.

Extrapolating baseline trend in single-case data: Problems and tentative solutions

Published: 27 November 2018
Volume 51 , pages 2847–2869, ( 2019 )

Cite this article

Rumen Manolov ORCID: orcid.org/0000-0002-9387-1926 1 , 2 ,
Antonio Solanas 1 &
Vicenta Sierra 2

5481 Accesses

13 Citations

3 Altmetric

Explore all metrics

Single-case data often contain trends. Accordingly, to account for baseline trend, several data-analytical techniques extrapolate it into the subsequent intervention phase. Such extrapolation led to forecasts that were smaller than the minimal possible value in 40% of the studies published in 2015 that we reviewed. To avoid impossible predicted values, we propose extrapolating a damping trend, when necessary. Furthermore, we propose a criterion for determining whether extrapolation is warranted and, if so, how far out it is justified to extrapolate a baseline trend. This criterion is based on the baseline phase length and the goodness of fit of the trend line to the data. These proposals were implemented in a modified version of an analytical technique called Mean phase difference. We used both real and generated data to illustrate how unjustified extrapolations may lead to inappropriate quantifications of effect, whereas our proposals help avoid these issues. The new techniques are implemented in a user-friendly website via the Shiny application, offering both graphical and numerical information. Finally, we point to an alternative not requiring either trend line fitting or extrapolation.

How important is the linearity assumption in a sample size calculation for a randomised controlled trial where treatment is anticipated to affect a rate of change?

A Review of Time Scale Fundamentals in the g-Formula and Insidious Selection Bias

Search for efficient complete and planned missing data designs for analysis of change.

Avoid common mistakes on your manuscript.

Several features of single-case experimental design (SCED) data have been mentioned as potential reasons for the difficulty of analyzing such data quantitatively, for the lack of consensus regarding the most appropriate statistical analyses, and for the continued use of visual analysis (Campbell & Herzinger, 2010 ; Kratochwill, Levin, Horner, & Swoboda, 2014 ; Parker, Cryer, & Byrns, 2006 ; Smith, 2012 ). Some of the data features that have received the most attention are serial dependence (Matyas & Greenwood, 1997 ; Shadish, Rindskopf, Hedges, & Sullivan, 2013 ), the common use of counts or other outcome measures that are not continuous or normally distributed (Pustejovsky, 2015 ; Sullivan, Shadish, & Steiner, 2015 ), the shortness of the data series (Arnau & Bono, 1998 ; Huitema, McKean, & McKnight, 1999 ), and the presence of trends (Mercer & Sterling, 2012 ; Parker et al., 2006 ; Solomon, 2014 ). In the present article we focus on trends. The reason for this focus is that trend is a data feature whose presence, if not taken into account, can invalidate conclusions regarding an intervention’s effectiveness (Parker et al., 2006 ). Even when there is an intention to take the trend into account, several challenges arise. First, linear trend has been defined in several ways in the context of SCED data (Manolov, 2018 ). Second, there has been recent emphasis on the need to consider nonlinear trends (Shadish, Rindskopf, & Boyajian, 2016 ; Swan & Pustejovsky, 2018 ; Verboon & Peters, 2018 ). Third, some techniques for controlling trend may provide insufficient control (see Tarlow, 2017 , regarding Tau-U by Parker, Vannest, Davis, & Sauber, 2011 ), leading applied researchers to think that their results represent an intervention effect beyond baseline trend, which may not be justified. Fourth, other techniques may extrapolate baseline trend regardless of the degree to which the trend line is a good representation of the baseline data, and despite the possibility of impossible values being predicted (see Parker et al.’s, 2011 , comments on the regression model by Allison & Gorman, 1993 ). The latter two challenges compromise the interpretation of results.

Aim, focus, and organization of the article

The aim of the present article is to provide further discussion on four issues related to baseline trend extrapolation, based on the comments by Parker et al. ( 2011 ). As part of this discussion, we propose tentative solutions to the issues identified. Moreover, we specifically aim to improve one analytical procedure, which extrapolates baseline trend and compares this extrapolation to the actual intervention-phase data: the mean phase difference (MPD; Manolov & Solanas, 2013 ; see also the modification and extension in Manolov & Rochat, 2015 ).

Most single-case data-analytical techniques focus on linear trend, although there are certain exceptions. One exception is a regression-based analysis (Swaminathan, Rogers, Horner, Sugai, & Smolkowski, 2014 ), for which the possibility of modeling quadratic trend has been discussed explicitly. Another is Tau-U, developed by Parker et al. ( 2011 ), which deals more broadly with monotonic (not necessarily linear) trends. We stick here to linear trends and their extrapolation, a decision that reflects Chatfield’s ( 2000 ) statement that relatively simple forecasting methods are preferred, because they are potentially more easily understood. Moreover, this focus is well aligned with our willingness to improve the MPD, a procedure for fitting a linear trend line to baseline data. Despite this focus, three of the four issues identified by Parker et al. ( 2011 ), and the corresponding solutions we propose, are also applicable to nonlinear trends.

Organization

In the following sections, first we mention procedures that include extrapolating the trend line fitted in the baseline, and distinguish them from procedures that account for baseline trend but do not extrapolate it. Second, we perform a review of published research in order to explore how frequently trend extrapolation leads to out-of-bounds predicted values for the outcome variable. Third, we deal separately with the four main issues of extrapolating a baseline trend, as identified by Parker et al. ( 2011 ), and we offer tentative solutions to these issues. Fourth, on the basis of the proposals from the previous two points, we propose a modification of the MPD. In the same section, we also provide examples, based on previously published data, of the extent to which our modification helps avoid misleading results. Fifth, we include a small proof-of-concept simulation study.

Analytical techniques that entail extrapolating baseline trend

Visual analysis.

When discussing how visual analysis should be carried out, Kratochwill et al. ( 2010 ) stated that “[t] he six visual analysis features are used collectively to compare the observed and projected patterns for each phase with the actual pattern observed after manipulation of the independent variable” (p. 18). Moreover, the conservative dual criteria for carrying out structured visual analysis (Fisher, Kelley, & Lomas, 2003 ) entail extrapolating split-middle trend in addition to extrapolating mean level. This procedure has received considerable attention recently as a means of improving decision accuracy (Stewart, Carr, Brandt, & McHenry, 2007 ; Wolfe & Slocum, 2015 ; Young & Daly, 2016 ).

Regression-based analyses

Among the procedures based on regression analysis, the last treatment day procedure (White, Rusch, Kazdin, & Hartmann, 1989 ) entails fitting ordinary least squares (OLS) trend lines to the baseline and intervention phases separately, and comparison between the two is performed for the last intervention phase measurement occasion. In the Allison and Gorman ( 1993 ) regression model, baseline trend is extrapolated before it is removed from both the A and B phases’ data. Apart from OLS regression, the generalized least squares proposal by Swaminathan et al. ( 2014 ) fits trend lines separately to the A and B phases, but baseline trend is still extrapolated for carrying out the comparisons. The overall effect size described by the authors entails comparing the treatment data as estimated from the treatment-phase trend line to the treatment data as estimated from the baseline-phase trend line.

Apart from the procedures based on the general linear model (assuming normal errors), generalized linear models (Fox, 2016 ) need to be mentioned as well in the present subsection. Such models can deal with count data, which are ubiquitous in single-case research (Pustejovsky, 2018a ), specifying a Poisson model (rather than a normal one) for the conditional distribution of the response variable (Shadish, Kyse, & Rindskopf, 2013 ). Other useful models are based on the binomial distribution, specifying a logistic model (Shadish et al., 2016 ), when the data are proportions that have a natural floor (0) and ceiling (100). Despite dealing with certain issues arising from single-case data, these models are not flawless. Note that a Poisson model may present limitations when the data are more variable than expected (i.e., alternative models have been proposed for overdispersed count data; Fox, 2016 ), whereas a logistic model may present the difficulty of not knowing the floor or ceiling (i.e., the upper asymptote) or of forcing artificial limits. Finally, what is most relevant to the topic of the present text is that none of these generalized linear models necessarily includes an extrapolation of baseline trend. Actually, some of them (Rindskopf & Ferron, 2014 ; Verboon & Peters, 2018 ) consider the baseline data together with the intervention-phase data in order to detect when the greatest change is produced. Other models (Shadish, Kyse, & Rindskopf, 2013 ) include an interaction term between the dummy phase variable and the time variable, making possible the estimation of change in slope.

Nonregression procedures

MPD involves estimating baseline trend and extrapolating it into the intervention phase in order to compare the predictions with the actual intervention-phase data. Another nonregression procedure, Slope and level change (SLC; Solanas, Manolov, & Onghena, 2010 ), involves estimating baseline trend and removing it from the whole series before quantifying the change in slope and the net change in level (hence, SLC). In one of the steps of the SLC, baseline trend is removed from the n A baseline measurements and the n B intervention-phase measurements by subtracting from each value ( y i ) the slope estimate ( b 1 ), multiplied by the measurement occasion ( i ). Formally, $ {\overset{\sim }{y}}_i={y}_i-i\times {b}_1;i=1,2,\dots, \left({n}_A+{n}_B\right) $ . This step does resemble extrapolating baseline trend, but there is no estimation of the intercept of the baseline trend line, and thus a trend line is not fitted to the baseline data and then extrapolated, which would lead to obtaining residuals as in Allison and Gorman’s ( 1993 ) model. Therefore, we consider that it is more accurate to conceptualize this step as removing baseline trend from the intervention-phase trend for the purpose of comparison.

Nonoverap indices

Among nonoverlap indices, the percentage of data points exceeding median trend (Wolery, Busick, Reichow, & Barton, 2010 ) involves fitting a split-middle (i.e., bi-split) trend line and extrapolating it into the subsequent phase. Regarding Tau-U (Parker et al., 2011 ), it only takes into account the number of baseline measurements that improve previous baseline measurements, and this number is subtracted from the number of intervention-phase values that improve the baseline-phase values. Therefore, no intercept or slope is estimated, and no trend line is fitted or extrapolated, either. The way in which trend is controlled for in Tau-U cannot be described as trend extrapolation in a strict sense.

Two other nonoverlap indices also entail baseline trend control. According to the “additional output” calculated at http://ktarlow.com/stats/tau/ , the baseline-corrected Tau (Tarlow, 2017 ) removes baseline trend from the data using the expression $ {\overset{\sim }{y}}_i={y}_i-i\times {b}_{1(TS)};i=1,2,\dots, \left({n}_A+{n}_B\right) $ , where b 1( TS ) is the Theil–Sen estimate of slope. In the percentage of nonoverlapping corrected data (Manolov & Solanas, 2009 ), baseline trend is eliminated from the n values via the same expression as for baseline-corrected Tau, $ {\overset{\sim }{y}}_i={y}_i-i\times {b}_{1(D)};i=1,2,\dots, \left({n}_A+{n}_B\right) $ , but slope is estimated via b 1( D ) (see Appendix B ) instead of via b 1( TS ) . Therefore, as we discussed above for SLC, there is actually no trend extrapolation in the baseline-corrected Tau or percentage-of-nonoverlapping-corrected data.

Procedures not extrapolating trend

The analytical procedures included in the present subsection do not extrapolate baseline trend, but they do take baseline trend into account. We decided to mention these techniques for three reasons. First, we wanted to provide a broader overview of analytical techniques applicable to single-case data. Second, we wanted to make it explicit that not all analytical procedures entail baseline trend extrapolation, and therefore, such extrapolation is not an indispensable step in single-case data analysis. Stated in other words, it is possible to deal with baseline trend without extrapolating it. Third, the procedures mentioned here were those more recently developed or suggested for single-case data analysis, and so they may be less widely known. Moreover, they can be deemed more sophisticated and more strongly grounded on statistical theory than is MPD, which is the focus of the present article.

The between-case standard mean difference, also known as the d statistic (Shadish, Hedges, & Pustejovsky, 2014 ), assumes stable data, but the possibility of detrending has been mentioned (Marso & Shadish, 2015 ) if baseline trend is present. It is not clear that a regression model using time and its interaction with a dummy variable representing phase entails baseline trend extrapolation. Moreover, a different approach was suggested by Pustejovsky, Hedges, and Shadish ( 2014 ) for obtaining a d statistic—namely, in relation to multilevel analysis. In multilevel analysis, also referred to as hierarchical linear models , the trend in each phase can be modeled separately, and the slopes can be compared (Ferron, Bell, Hess, Rendina-Gobioff, & Hibbard, 2009 ). Another statistical option is to use generalized additive models (GAMs; Sullivan et al., 2015 ), in which there is greater flexibility for modeling the exact shape of the trend in each phase, without the need to specify a particular model a priori. GAMs that have been specifically suggested include the use of cubic polynomial curves fitted to different portions of the data and joined at the specific places (called knots ) that divide the data into portions. Just like when using multilevel models, trend lines are fitted separately to each phase, without the need to extrapolate baseline trend.

A review of research published in 2015

Aim of the review.

It has already been stated (Parker et al., 2011 ) and illustrated (Tarlow, 2017 ) that baseline trend extrapolation can lead to impossible forecasts for the subsequent intervention-phase data. Accordingly, the research question we chose was the percentage of studies in which extrapolating the baseline trend of the data set (across several different techniques for fitting the trend line) leads to values that are below the lower bound or above the upper bound of the outcome variable.

Search strategy

We focused on the four journals that have published most SCED research, according to the review by Shadish and Sullivan ( 2011 ). These journals are Journal of Applied Behavior Analysis , Behavior Modification , Research in Autism Spectrum Disorders , and Focus on Autism and Other Developmental Disabilities . Each of these four journals published more than ten SCED studies in 2008, and the 76 studies they published represent 67% of all studies included in the Shadish and Sullivan review. Given that the bibliographic search was performed in September 2016, we focused on the year 2015 and looked for any articles using phase designs (AB designs, variations, or extensions) or alternation designs with a baseline phase and providing a graphical representation of the data, with at least three measurements in the initial baseline condition.

Techniques for finding a best fitting straight line

For the present review, we selected five techniques for finding a best-fitting straight line: OLS, split-middle, tri-split, Theil–Sen, and differencing. The motivation for this choice was that these five techniques are included in single-case data-analytical procedures (Manolov, 2018 ), and therefore, applied researchers can potentially use them. The R code used for checking whether out-of-bounds forecasts are obtained is available at https://osf.io/js3hk/ .

Upper and lower bounds

The data were retrieved using Plot Digitizer for Windows ( https://plotdigitizer.sourceforge.net ). We counted the number and percentage of studies in which values out of logical bounds were obtained after extrapolating the baseline trend, estimated either from an initial baseline phase or from a subsequent withdrawal phase (e.g., in ABAB designs) for at least one of the data sets reported graphically in the article. The “logical bounds” were defined as 0 as a minimum and 1 or 100 as a maximum, when the measurement provided was a proportion or a percentage, respectively. Additional upper bounds included the maximal scores obtainable for an exam (e.g., Cheng, Huang, & Yang, 2015 ; Knight, Wood, Spooner, Browder, & O’Brien, 2015 ), for the number of steps in a task (e.g., S. J. Gardner & Wolfe, 2015 ), for the number of trials in the session (Brandt, Dozier, Juanico, Laudont, & Mick, 2015 ; Cannella-Malone, Sabielny, & Tullis, 2015 ), or for the duration of transition between a stimulus and reaching a location (Siegel & Lien, 2015 ), or the total duration of a session, when quantifying latency (Hine, Ardoin, & Foster, 2015 ). We chose a conservative approach, and did not to speculate Footnote 1 about upper bounds for behaviors that were expressed as either a frequency (e.g., Fiske et al., 2015 ; Ledbetter-Cho et al., 2015 ) or a rate (e.g., Austin & Tiger, 2015 ; Fahmie, Iwata, & Jann, 2015 ; Rispoli et al., 2015 ; Saini, Greer, & Fisher, 2015 ). Footnote 2

Results of the review

The numbers of articles included per journal are as follows. From the Journal of Applied Behavior Analysis , 27 SCED studies were included from the 46 “research articles” published (excluding three alternating-treatment designs without a baseline), and 20 more SCED studies were included from the 30 “reports” published (excluding two alternating-treatments design without a baseline and one changing-criterion design). From Behavior Modification , eight SCED studies were included from the 39 “articles” published (excluding two alternating-treatments design studies without a baseline, two studies with other designs without phases, one study with phases but only two measurements in the baseline phase, meta-analyses of single cases, and data analysis for single-case articles). From Research in Autism Spectrum Disorders , seven SCED studies were included from the 67 “original research articles” published (excluding one SCED study that did not have a minimum of three measurements per phase, as per Kratochwill et al., 2010 ). From Focus on Autism and Other Developmental Disabilities , six SCED studies were included from the 21 “articles” published. The references to all 68 articles reviewed are available in Appendix A at https://osf.io/js3hk/ .

The results of this review are as follows. Extrapolation led to impossibly small values for all five trend estimators in 27 studies (39.71%), in contrast to 34 studies (50.00%) in which that did not happen for any of the trend estimators. Complementarily, extrapolation led to impossibly large values for all five trend estimators in eight studies (11.76%), in contrast to 56 studies (82.35%) in which that did not happen for any of the trend estimators. In terms of when the extrapolation led to an impossible value, a summary is provided in Table 1 . Note that this table refers to the data set in each article, including the earliest out-of-bounds forecast. Thus, it can be seen that for all trend-line-fitting techniques, it was most common to have out-of-bounds forecasts already before the third intervention phase measurement occasion. This is relevant, considering that an immediate effect can be understood to refer to the first three intervention data points (Kratochwill et al., 2010 ).

These results suggest that researchers using techniques to extrapolate baseline trend should be cautious about downward trends that would apparently lead to negative values, if continued. We do not claim that the four journals and the year 2015 are representative of all published SCED research, but the evidence obtained suggests that trend extrapolation may affect the meaningfulness of the quantitative operations performed with the predicted data frequently enough for it to be considered an issue worth investigation.

Main issues when extrapolating baseline trend, and tentative solutions

The main issues when extrapolating baseline trend that were identified by Parker et al. ( 2011 ) include (a) unreliable trend lines being fitted; (b) the assumption that trends will continue unabated; (c) no consideration of the baseline phase length; and (d) the possibility of out-of-bounds forecasts. In this section, we comment on each of these four issues identified by Parker et al. ( 2011 ) separately (although they are related), and we propose tentative solutions, based on the existing literature. However, we begin by discussing in brief how these issues could be avoided rather than simply addressed.

Avoiding the issues

Three decisions can be made in relation to trend extrapolation. First, the researcher may wonder whether there is any clear trend at all. For that purpose, a tool such as a trend stability envelope (Lane & Gast, 2014 ) can be used. According to Lane and Gast, a within-phase trend would be considered stable (or clear) when at least 80% of the data points fell within the envelope defined by the split-middle trend line plus/minus 25% of the baseline median. Similarly, Mendenhall and Sincich ( 2012 ) suggested, although not in the context of single-case data, that a good fit of an OLS trend line would be represented by a coefficient of variation of 10% or smaller. We consider that either of these descriptive approaches is likely to be more reasonable than testing the statistical significance of the baseline trend before deciding whether or not to take it into account, because such a statistical test might lack power for short baselines (Tarlow, 2017 ). Using Kendall’s tau as a measure of the percentage of improving data points (Vannest, Parker, Davis, Soares, & Smith, 2012 ) would not inform one about whether a clear linear trend were present, because it refers more generally to a monotonic trend.

Second, if the data show considerable variability and no clear trend, it is possible to use a quantification that does not rely on (a) linear trend, (b) any specific nonlinear trend, or (c) any average level whatsoever, by using a nonoverlap index. Specifically, the nonoverlap of all pairs (NAP; Parker & Vannest, 2009 ) can be used when the baseline data do not show a natural improvement, whereas Tau-U (Parker et al., 2011 ) can be used when such an improvement is apparent but it is not necessarily linear. Footnote 3 A different approach could be to quantify the difference in level (e.g., using a d statistic) after showing that the assumption of no trend is plausible via a GAM (Sullivan et al., 2015 ). Thus, there would be no trend line fitting and no trend extrapolation.

Third, if the trend looks clear (visually or according to a formal rule) and the researcher decides to take it into account, it is also possible not to extrapolate trend lines. For instance, it is possible to fit separate trend lines to the different phases and compare the slopes and intercepts of these trend lines, as in piecewise regression (Center, Skiba, & Casey, 1985–1986 ).

Although these potential solutions seem reasonable, here we deal with another option: namely, the case in which baseline extrapolation is desired (because it is part of the analytical procedure chosen prior to data collection), but the researcher is willing to improve the way in which such extrapolation is performed.

First issue: Unreliable trend lines fitted

If an unreliable linear trend is fitted (e.g., the relation between the time variable and the measurements would be described by a small R 2 value), then the degree of confidence we have in the representation of the baseline data is reduced. If the fit of the baseline trend line to the data is poor, its extrapolation would also be problematic. It is expected that, if the amount of variability were the same, shorter baselines would result in more uncertain estimates. In that sense, this issue is related to the next one.

Focusing specifically on reliability, we advocate quantifying the amount of fit of the trend line and using this information when deciding on baseline trend extrapolation. Regarding the comparison between actual and fitted values, Hyndman and Koehler ( 2006 ) reviewed the drawbacks of several measures of forecast accuracy, including widely known options such as the minimum square error ( $ \frac{{\left({y}_i-{\widehat{y}}_i\right)}^2}{n} $ , based on a quadratic loss function and inversely related to R 2 ) or the minimum absolute error ( $ \frac{\left|{y}_i-{\widehat{y}}_i\right|}{n} $ , based on a linear loss function). Hyndman and Koehler proposed the mean absolute scaled error (MASE). For a trend line fitted to the n A baseline measurements, MASE can be written as follows:

Hyndman and Koehler ( 2006 , p. 687) stated that MASE is “easily interpretable, because values of MASE greater than one indicate that the forecasts are worse, on average, than in-sample one-step forecasts from the naïve method.” (The naïve method entails predicting a value from the previous one—i.e., the random-walk model that has frequently been used to assess the degree to which more sophisticated methods provide more accurate forecasts that this simple procedure; Chatfield, 2000 .) Thus, values of MASE greater than one could be indicative that a general trend (e.g., a linear one, as in MPD) does not provide a good enough fit to the data from which it was estimated, because it does not improve the fit of the naïve method.

Second issue: Assuming that trend continues unabated

This issue refers to treating baseline trend as if it were always the same for the whole period of extrapolation. By default, all the analytical techniques mentioned in the “Analytical Techniques That Entail Extrapolating Baseline Trend” section extrapolate baseline trend until the end of the intervention phase. Thus, one way of dealing with this issue would be to limit the extrapolation, following Rindskopf and Ferron ( 2014 ), who stated that “for a short period, behavior may show a linear trend, but we cannot project that linear behavior very far into the future” (p. 229). Similarly, when discussing the gradual-effects model, Swan and Pustejovksy ( 2018 ) also cautioned against long extrapolations, although their focus was on the intervention phase and not on the baseline phase.

An initial approach could be to select how far out to extrapolate baseline trend prior to gathering and plotting the data, by selecting a number that would be the same across studies. When discussing an approach for comparing levels when trend lines are fitted separately to each phase, it has been suggested that a comparison can be performed at the fifth intervention-phase measurement occasion (Rindskopf & Ferron; 2014 ; Swaminathan et al., 2014 ). It is possible to extend this recommendation to the present situation and state that the baseline trend should be extrapolated until the fifth intervention-phase measurement occasion. The choice of five measurements is arbitrary, but it is well-aligned with the minimal phase length required in the What Works Clearinghouse Standards (Kratochwill et al., 2010 ). Nonetheless, our review (Table 1 ) suggests that impossible extrapolations are common even before the fifth intervention-phase measurement occasion, and thus a comparison at that point might not avoid comparison with an impossible projection from the baseline. Similarly, when presenting the gradual-effects model, Swan and Pustejovsky ( 2018 ) defined the calculation of the effect size for an a priori set number of intervention-phase measurement occasions. In their study, this number depends on the actually observed intervention-phase lengths. Moreover, Swan and Pustejovsky suggested a sensitivity analysis, comparing the results of several possible a-priori-set numbers. It could be argued that a fixed choice would avoid making data-driven decisions that could favor finding results in line with the expectations of the researchers (Wicherts et al., 2016 ). A second approach would be to choose how far away to extrapolate on the basis of both a design feature (baseline phase length; see the next section) and a data feature (the amount of fit of the trend line to the data, expressed as the MASE). In the following discussion, we present a tentative solution including both these aspects.

Third issue: No consideration of baseline-phase length

Parker et al. ( 2011 ) expressed a concern that baseline trend correction procedures do not take into consideration the length of the baseline phase. The problem is that a short baseline is potentially related to unreliable trend, and it could also entail predicting many values (i.e., a longer intervention phase) from few values, which is not justified.

To take baseline length ( n A ) into account, one approach would be to limit the extrapolation of baseline trend to the first n A treatment-phase measurement occasions. This approach introduces an objective criterion based on a characteristic of the design. A conservative version of this alternative would be to estimate how far out to extrapolate using the following expression: $ {\widehat{n}}_B=\left\lfloor {n}_A\times \left(1- MASE\right)\right\rfloor $ , applying the restriction that $ 0\le {\widehat{n}}_B\le {n}_B $ . Thus, the extrapolation is determined by both the number of baseline measurements ( n A ) and the goodness of fit of the trend line to the data. When MASE > 1, the expression for $ {\widehat{n}}_B $ would give a negative value, precluding extrapolation. For data in which MASE < 1, the better the fit of the trend line to the data, the further out extrapolation could be considered justified. From the expression presented for $ {\widehat{n}}_B $ , it can be seen that if the result of the multiplication is not an integer, the value representing the number of intervention-phase measurement occasions to which to extend the baseline trend ( $ {\widehat{n}}_B $ ) would be truncated. Finally, note the restriction that $ {\widehat{n}}_B $ should be equal to or smaller than n B , because it is possible that the baseline is longer than the intervention phase ( n A > n B ) and that, even after applying the correction factor representing the fit of the trend line $ {\widehat{n}}_B>{n}_B $ . Thus, whenever $ {\widehat{n}}_B>{n}_B $ , it is reset to $ {\widehat{n}}_B={n}_B $ .

Fourth issue: Out-of-bounds forecasts

Extrapolating baseline trend for five, n A , or $ {\widehat{n}}_B $ measurement occasions may make trend extrapolation more reasonable (or, at least, less unreasonable), but none of these options precludes out-of-bounds forecasts. When Parker et al. ( 2011 ) discussed the issue that certain procedures to control for baseline trend could lead to projecting trend beyond rational limits, they proposed the conservative trend correction procedure implemented in Tau-U. This procedure could be useful for statistically controlling baseline trend, although the evidence provided by Tarlow ( 2017 ) suggests that the trend control incorporated in Tau-U is insufficient (i.e., leads to false positive results), especially as compared to other procedures, including MPD. An additional limitation of this trend correction procedure is that it cannot be used when extrapolating baseline trend. Therefore, we consider other options in the following text.

Nonlinear models

One option, suggested by Rindskopf and Ferron ( 2014 ), is to use nonlinear models for representing situations in which a stable and low initial level during the baseline phase experiences a change due to the intervention (e.g., an upward trend) before settling at a stable high level. Rindskopf and Ferron suggested using logistic regression with an additional term for identifying the moment at which the response has gone halfway between the floor and the ceiling. Similarly, Shadish et al. ( 2016 ) and Verboon and Peters ( 2018 ) used a logistic model for representing data with clear floor and ceiling effects. The information that can be obtained by fitting a generalized logistic model is in terms of the floor and ceiling levels, the rate of change, and the moments at which the change from the floor to the ceiling plateau starts and stops (Verboon & Peters, 2018 ). Shadish et al. ( 2016 ) acknowledged that not all analysts are expected to be able to fit intrinsically nonlinear models and that choosing one model over another is always partly arbitrary, suggesting nonparametric smoothing as an alternative.

Focusing on the need to improve MPD, the proposals by Rindskopf and Ferron ( 2014 ) and Verboon and Peters ( 2018 ) are not applicable, since the logistic model they present deals with considering the data of a baseline phase and an intervention phase jointly, whereas in MPD baseline trend is estimated and extrapolated in order to allow for a comparison between projected and observed patterns of the outcome variable (as suggested by Kratochwill et al., 2010 , and Horner, Swaminathan, Sugai, & Smolkowski, 2012 , when performing visual analysis). In contrast, Shadish et al. ( 2016 ) used the logistic model for representing the data within one of the phases in order to explore whether any within-phase change took place, but they were not aiming to use the within-phase model for extrapolating to the subsequent phase.

Although not all systematic changes in the behavior of interest are necessarily linear, there are three drawbacks to applying nonlinear models to single-case data, or even to usually longer time-series data (Chatfield, 2000 ). First, there has not been extensive research with short-time-series data and any of the possible nonlinear models (e.g., logistic, Gompertz, or polynomial) applicable for modeling growth curves in order to ensure that known minimal and maximal values of the measurements are not exceeded. Second, it may be difficult to distinguish between a linear model with disturbance and an inherently nonlinear model. Third, a substantive justification is necessary, based either on theory or on previously fitted nonlinear models, for preferring one nonlinear model instead of another or for preferring a nonlinear model instead of the more parsimonious linear model. However, the latter two challenges are circumvented by GAMs, because they allow one to avoid the need to explicitly posit a specific model for the data (Sullivan et al., 2015 ).

Winsorizing

Faith, Allison, and Gorman ( 1997 ) suggested rescaling manually out-of-bounds predicted scores within limits, a manipulation similar to winsorization. Thus, a trend is extrapolated until the values predicted are no longer possible, and then a flat line is set at the minimum/maximum possible value (e.g., 0 when the aim is to eliminate a behavior, or 100% when the aim is to improve in the completion of a certain task). The “manual” rescaling of out-of-bounds forecasts could be supported by Chatfield’s ( 2000 , pp. 175–179) claim that it is possible to make judgmental adjustments to forecasts and also to use the “eyeball test” for checking whether forecasts are intuitively reasonable, given that background knowledge (albeit background as simple as knowing the bounds of the outcome variable) is part of nonautomatic univariate methods for forecasting in time-series analysis. In summary, just as in the logistic model, winsorizing the trend line depends on the data at hand. As a limitation, Parker et al. ( 2011 ) claimed that such a correction would impose an artificial ceiling on the effect size. However, it could also be argued that computing an effect size on the basis of impossible values is equally (or more) artificial, since it involves only crunching numbers, some of which (e.g., negative frequencies) are meaningless.

Damping trend

A third option arises from time-series forecasting, in which exponential smoothing is one of the methods commonly used (Billah, King, Snyder, & Koehler, 2006 ). Specifically, in double exponential smoothing, which can be seen as a special case of Holt’s ( 2004 ) linear trend procedure, it is possible to include a damping parameter (E. S. Gardner & McKenzie, 1985 ) that indicates how much the slope of the trend is reduced in subsequent time periods. According to the review performed by E. S. Gardner ( 2006 ), the damped additive trend is the model of choice when using exponential smoothing. A damped trend can be interpreted as an attenuation reflecting the gradual reduction of the trend until the behavior eventually settles at an upper or a lower asymptote. This would address Parker et al.’s ( 2011 ) concern that it may not be reasonable to consider that the baseline trend will continue unabated until the end of the intervention phase in the absence of an effect. Moreover, the behavioral progression is more gradual than the one implied when winsorizing. Furthermore, a gradual change is also the basis of recent proposals for modeling longitudinal data using generalized additive models (Bringmann et al., 2017 ).

Aiming for a tentative solution for out-of-bounds forecasts for techniques such as MPD, we consider it reasonable to borrow the idea of damping the trend from the linear trend model by Holt ( 2004 ). In contrast, the application of that model in its entirety to short SCED baselines (Shadish & Sullivan, 2011 ; Smith, 2012 ; Solomon, 2014 ) is limited by the need to estimate several parameters (a smoothing parameter for level, a smoothing parameter for trend, a damping parameter, the initial level, and the initial trend).

We consider that a gradually reduced trend conceptualization seems more substantively defensible than abruptly winsorizing the trend line. In that sense, instead of extrapolating the linear trend until the lower or upper bound is reached and then flattening the trend line, it is possible to estimate the damping coefficient in such a way as to ensure that impossible forecasts are not obtained during the period of extrapolation (i.e., in the $ {\widehat{n}}_B $ or n B measurement occasions after the last baseline data point, according to whether extrapolation is limited, as we propose here, or not). The damping parameter is usually represented by the Greek letter phi ( φ ), so that the trend line extrapolated into the intervention phase would be based on the baseline trend ( b 1 ) as follows: $ {b}_1\times {\varphi}^i;i=1,2,\dots, {\widehat{n}}_B $ , so that the first predicted intervention-phase measurement is $ {\widehat{y}}_1={\widehat{y}}_{n_A}+{b}_1\times \varphi $ , and the subsequent forecasts (for $ i=2,3,\dots, {\widehat{n}}_B $ ) are obtained via $ {\widehat{y}}_i={\widehat{y}}_{i-1}+{b}_1\times {\varphi}^i $ . The previous expressions are presented using $ {\widehat{n}}_B $ , but they can be rewritten using n B in the case that extrapolation is not limited in time. For avoiding extrapolation to impossible values, the damping parameter would be estimated from the data in such a way that the final predicted value $ {\widehat{y}}_{{\widehat{n}}_B} $ would still be within the bounds of the outcome variable. We propose an iterative process checking the values of φ from 0.05 to 1.00 in steps of 0.001, in order to identify the largest φ value k for which there are no out-of-bounds values, whereas for ( k + 0.001) there is one or more such values. The closer φ is to 1, the farther away in the intervention phase is the first out-of-bounds forecast produced. Estimating φ from the data and not setting it to an a-priori-chosen value is in accordance with the usually recommended practice in exponential smoothing (Billah et al., 2006 ).

Justification of the tentative solutions

Our main proposal is to combine the quantitative criterion for how far out to extrapolate baseline trend ( $ {\widehat{n}}_B $ ) with damping, in case the latter is necessary within the $ {\widehat{n}}_B $ limit. The fact that both $ {\widehat{n}}_B $ and the damping parameter φ are estimated from the data rather than being predetermined implies that this proposal is data-driven. We consider that the data-driven quantification of $ {\widehat{n}}_B $ is also not necessarily a drawback, due to three reasons: (a) An objective formula was proposed for estimating how far out it is reasonable to extrapolate the baseline trend, according to the data at hand; that is, the choice is not made subjectively by the researcher in order to favor his/her hypotheses. (b) This formula is based on both a design feature (i.e., the baseline phase length) and a data feature (i.e., the MASE as a measure of the accuracy of the trend line fitted). And (c) no substantive reason may be available a priori regarding when extrapolation becomes unjustified.

We also consider that estimating the damping parameter from the data is not a drawback, either, given that (a) φ is estimated from the data in Holt’s linear trend model for which it was proposed; (b) damping trend can be considered conceptually similar to choosing a function, in a growth curve model, that makes possible incorporating an asymptote (Chatfield, 2000 ), because both methods model decisions made by the researcher on the basis of knowing the characteristics of the data and, in both cases, the moment at which the asymptote is reached depends on the data at hand and not on a predefined criterion; and (c) the use of regression splines (Bringmann et al., 2017 ; Sullivan et al., 2015 ) for modeling a nonlinear relation is also data-driven, despite the fact that a predefined number of knots may be used.

The combined use of $ {\widehat{n}}_B $ plus the estimation of φ can be applied to the OLS baseline trend (as used in the Allison & Gorman, 1993 , model), to the split-middle trend (as used in the conservative dual criterion, Fisher et al., 2003 ; or in the percentage of data points exceeding the median trend, Wolery et al., 2010 ), or to the trend extrapolation that is part of MPD (Manolov & Solanas, 2013 ). In the following section, we focus on MPD.

The present proposal is also well-aligned with Bringmann et al.’s ( 2017 ) recommendation for models that do not require existing theories about the expected nature of the change in the behavior, excessively high computational demands, or long series of measurements. Additionally, as these authors suggested, the methods need to be readily usable by applied researchers, which is achieved by the software implementations we have created.

Limitations of the tentative solutions

As we mentioned previously, it could be argued that the tentative solutions are not necessary if the researcher simply avoids extrapolation. Moreover, we do not argue that the expressions presented for deciding whether and how far to extrapolate are the only possible, or necessarily the optimal, ones; we rather aimed at defining an objective rule on a solid, albeit arbitrary, basis. An additional limitation, as was suggested by a reviewer, is that for a baseline with no variability, MASE would not be defined. In such a case, when the same value is repeated n A times (e.g., when the value is 0 because the individual is unable to perform the action required), we do consider that an unlimited extrapolation would be warranted, because the reference to which the intervention-phase data would be compared would be clear and unambiguous.

Incorporating the tentative solutions in a data-analytical procedure

Modifying the mpd.

The revised version of the MPD includes the following steps:

Estimate the slope of the baseline trend as the average of the differenced data ( b 1( D ) ).

Fit the trend line, choosing Footnote 4 one of the three definitions of the intercept (see Appendix B at https://osf.io/js3hk/ ), according to the value of the MASE.

Extrapolate the baseline trend, if justified (i.e., if MASE < 1), for as many intervention-phase measurement occasions as is justified (i.e., for the first $ {\widehat{n}}_B $ measurement occasions of the intervention phase) and considering the need for damping the trend to avoid out-of-bounds forecasts. The damping parameter φ would be equal to 1 when all $ {\widehat{n}}_B $ forecasts are within bounds, or φ < 1, otherwise.

Compute MPD as the difference between the actually obtained and the forecast first $ {\widehat{n}}_B $ intervention-phase values.

Illustration of the proposal for modifying MPD

In the present section, we chose three of the studies included in the review that we performed (all three data sets are available at https://osf.io/js3hk/ , in the format required by the Shiny application, http://manolov.shinyapps.io/MPDExtrapolation , implementing the modified version of MPD). From the illustrations it is clear that, although the focus of the present text is comparison between a pair of phases, such a comparison can be conceptualized to be part of a more appropriate design structure, such as ABAB or multiple-baseline designs (Kratochwill et al., 2010 ; Tate et al., 2013 ), by replicating the same procedure for each AB comparison. Such a means of analyzing data corresponds to the suggestion by Scruggs and Mastropieri ( 1998 ) to perform comparisons only for data that maintain the AB sequence.

The Ciullo, Falcomata, Pfannenstiel, and Billingsley ( 2015 ) data were chosen because their multiple-baseline design includes short baselines and extrapolation to out-of-bounds forecasts (impossibly low values) for both the first tier Footnote 5 (Fig. 1 ) and the third tier. In Fig. 1 , trend extrapolation was not limited (i.e., the baseline trend was extrapolated for all n B = 7 values), to allow for comparing winsorizing and damping the trend. Limiting the extrapolation to $ {\widehat{n}}_B $ = 2 would have made either winsorizing or damping the trend unnecessary, because no out-of-bound forecasts would have been obtained; MPD would have been equal to 40.26.

Results for mean phase difference (MPD) with the slope estimated through differencing and the intercept computed as in the Theil–Sen estimator. The results in the left panel are based on winsorizing the trend line when the lower bound is reached. The results in the right panel are based on damping the trend. Trend extrapolation is not limited. The data correspond to the first tier (a participant called Salvador) in the Ciullo et al. ( 2015 ) multiple-baseline design study

Limiting the amount of extrapolation seems reasonable, because for both of methods the intervention phase is almost three times as long as the baseline phase; using $ {\widehat{n}}_B $ leads to avoiding impossibly low forecasts for these data and to more conservative estimates of the magnitude of the effect. Damping the trend line was necessary for three of the four tiers, where it also led to more conservative estimates, given that the out-of-bounds forecasts were in a direction opposite from the one desired with the intervention. The numerical results are available in Table 2 .

The data from Allen, Vatland, Bowen, and Burke ( 2015 ) were chosen, because this study represents a different data pattern: Longer baselines are available, which could allow for better estimation of the trend, but the baseline data are apparently very variable. Intervention phases were also longer, which required extrapolations farther out in time. Thus, we wanted to illustrate how limiting extrapolations affects the quantification of an effect.

For Tier 1, out-of-bounds forecasts (impossible high values in the same direction as desired for the intervention) are obtained. However, damping the trend led to avoiding such forecasts and also to greater estimates of the effect. For Tiers 2 and 3 (the latter is represented in Fig. 2 ), limiting the amount of extrapolation had a very strong effect, due to the high MASE values, and only a very short extrapolation was justified for Tiers 2 and 3. The limited extrapolation is also related to greater estimates of the magnitude of the effect for Tiers 2 and 3.

Results for mean phase difference (MPD) with the slope estimated through differencing and the intercept computed as in the Theil–Sen estimator. Trend extrapolation was not limited (left) versus limited (right). Damping the trend was not necessary in either case ( φ = 1). The data correspond to the third tier of the Allen et al. ( 2015 ) multiple-baseline design study

Therefore, using only the first $ {\widehat{n}}_B $ intervention-phase data points for the comparison reflects a reasonable doubt regarding whether the (not sufficiently clear) improving baseline trend would have continued unchanged throughout the whole intervention phase (i.e., for 23 or 16 measurement occasions, for Tiers 2 and 3, respectively). The numerical results are available in Table 3 .

The data from Eilers and Hayes ( 2015 ) were chosen because they include baselines of varying lengths, out-of-bounds forecasts for Tiers 1 and 2, and a nonlinear pattern in Tier 3 (to which a linear trend line is expected to show poor fit). For these data, damping and limiting the extrapolation, when applied separately, both correct overestimation of the effect that would arise from out-of-bounds (high) forecasts in a direction opposite from the one desired in the intervention. Such an overestimation, in the absence of damping, would lead to MPD values implying more than a 100% reduction, which is meaningless (see Fig. 3 ).

Results for mean phase difference (MPD) with the slope estimated through differencing and the intercept computed as in the Theil–Sen estimator. Trend was damped completely (right; φ = 0) versus not damped (left; φ = 1). Trend extrapolation is not limited in this figure. The data correspond to the second tier of the Eilers and Hayes ( 2015 ) multiple-baseline design study

Specifically, damping the trend is necessary in Tiers 1 and 2 to avoid such forecasts. Note that for Tier 3, the fact that a straight line does not represent the baseline data well is reflected by MASE > 1 and $ {\widehat{n}}_B<1 $ , leading to a recommendation not to extrapolate the baseline trend. The numerical results are available in Table 4 .

General comments

In general, the modifications introduced in MPD achieve the aims to (a) avoid extrapolating from a short baseline to a much longer intervention phase (Example 1); (b) avoid assuming that the trend will continue exactly the same for many measurement occasions beyond the baseline phase (Example 2); (c) follow an objective criterion regarding a baseline trend line that is not justified in being extrapolated at all (Example 3); and (d) avoid excessively large quantifications of effect when comparing to impossibly bad (countertherapeutic) forecasts in the absence of an effect (Examples 1 and 3). Furthermore, note that for all the data sets included in this illustration, the smallest MASE values were obtained using the Theil–Sen definition of the intercept.

Small-scale simulation study

To obtain additional evidence regarding the performance of the proposals, an application to generated data was a necessary complement to the application of our proposals to previously published real behavioral data. The simulation presented in this section should be understood as a proof of concept, rather than as a comprehensive source of evidence. We consider that further thought and research should be dedicated to simulating discrete bounded data (e.g., counts, percentages) and to studying the present proposals for deciding how far to extrapolate baseline trend and how to deal with impossible extrapolations.

Data generation

We simulated independent and autocorrelation count data using a Poisson model, following the article by Swan and Pustejovsky ( 2018 ) and adapting the R code available in the supplementary material to their article ( https://osf.io/gaxrv and https://www.tandfonline.com/doi/suppl/10.1080/00273171.2018.1466681 ). The adaptation consisted in adding the general trend for certain conditions (denoted here by β 1 , whereas β 2 denotes the change-in-level parameter, unlike in Swan & Pustejovsky, 2018 , who denoted the change in level by β 1 ) and simulating immediate instead of delayed effects (i.e., we set ω = 0). Given that ω = 0, the simulation model, as described by Swan and Pustejovsky, is as follows. The mathematical expectancy for each measurement occasion is μ t = exp( β 0 + β 1 t + β 2 D ), where t is the time variable defined taking values 1, 2, . . . , n A + n B , and D is a dummy variable for change in level, taking n A values of 0 followed by n B values of 1. The first value, Y 1 , is simulated from a Poisson distribution with a mean set to λ 1 = μ 1 . Subsequent values ( j = 2, 3, . . . , n A + n B ) are simulated taking autocorrelation into account ( φ j = min { φ , μ j / μ j − 1 }), leading to the following mean for the Poisson distribution: λ j = μ j − φ j μ j − 1 . Finally, the values from second to last were simulated as Y j = X j + Z j , where Z j follows a Poisson distribution with mean λ j , and X j follows a binomial distribution with Y j − 1 trials and a probability of φ j .

The specific simulation parameters for defining μ t were e β0 = 50 (representing the baseline frequency), β 1 = 0, − 0.1, − 0.2, β 2 = − 0.4 (representing the intervention effect as an immediate change in level), and autocorrelation φ = 0 or 0.4. Regarding the intervention effect, according to the formula % change = 100 % × [exp( β 2 ) − 1] (Pustejovsky, 2018b ), the effect was a reduction of approximately 33%, or 16.5 points, from the baseline level ( e β0 ), set to 50. The phase lengths ( n A = n B ) were 5, 7, and 10.

The specific simulation parameters β , as well as simulating the intervention effect as a reduction, were chosen in such a way as to produce a floor effect for certain simulation conditions. That is, for some of the conditions, the values of the dependent variable were equal or close to zero before the end of the intervention phase, and thus could not improve any more. For these conditions, extrapolating the baseline trend would lead to impossible negative forecasts. Such a data pattern represents well the findings from our review, according to which in almost 40% of the articles at least one AB comparison led to impossible negative predictions if the baseline trend were continued. Example data sets of the simulation conditions are presented as figures at https://osf.io/js3hk/ . A total of 10,000 iterations were performed for each condition using R code ( https://cran.r-project.org ).

Data analysis

Six different quantifications of the intervention effect were computed. First, an immediate effect was computed, as defined in piecewise regression (Center et al., 1985–1986 ) and by extension in multilevel models (Van den Noortgate & Onghena, 2008 ). This immediate effect represents a comparison, for the first intervention-phase measurement occasion, between the extrapolated baseline trend and the fitted intervention-phase trend. Second, an average effect was computed, as defined in the generalized least squares proposal by Swaminathan et al. ( 2014 ). This average effect ( δ AB ) is based on the expression by Rogosa ( 1980 ), initially proposed for computing an overall effect in the context of the analysis of covariance when the regression slopes were not parallel. The specific expressions are (1) for the baseline data, $ {y}_t^A={\beta}_0^A+{\beta}_0^At+{e}_t $ , where t = 1, 2, . . ., n A ; (2) for the intervention-phase data, $ {y}_t^B={\beta}_0^B+{\beta}_0^Bt+{e}_t $ , where t = n A + 1, n A + 2, . . ., n A + n B ; and (3) $ {\delta}_{AB}=\left({\beta}_0^A-{\beta}_0^B\right)+\left({\beta}_1^A-{\beta}_1^B\right)\frac{2{n}_A+{n}_B+1}{2} $ . Additionally, four versions of the MPD were computed: (a) one estimating the baseline trend line using the Theil–Sen estimator, with no limitation of the extrapolation and no correction for impossible forecasts; (b) MPD incorporating $ {\widehat{n}}_B $ for limiting the extrapolation [MPD Limited]; (c) MPD incorporating $ {\widehat{n}}_B $ and using flattening to correct impossible forecasts [MPD Limited Flat]; and (d) MPD incorporating $ {\widehat{n}}_B $ and using damping to correct impossible forecasts [MPD Limited Damping]. Finally, we obtained two additional pieces of information: the percentage of iterations in which $ {\widehat{n}}_B<1 $ (due to MASE being greater than 1) and the quartiles (plus minimum and maximum) corresponding to $ {\widehat{n}}_B $ for each experimental condition.

The results of the simulation are presented in Tables 5 , 6 , and 7 , for phase lengths of five, seven, and ten measurements, respectively. When there is an intervention effect ( β 2 = − 0.4) but no general trend ( β 1 = 0), all quantifications lead to very similar results, which are also very similar to the expected overall difference of 16.5. The most noteworthy result for these conditions is that, when there is autocorrelation, for phase lengths of seven and ten data points, the naïve method is more frequently a better model for the baseline data than the Theil–Sen trend (e.g., 17.51% for autocorrelated data vs. 6.61% for independent data when n A = n B = 10). This is logical because, according to the naïve method each data point is predicted from the previous one, and positive first-order autocorrelation entails that adjacent values are more similar to each other than would be expected by chance.

When there is a general trend and n A = n B = 5 (Table 5 ), the floor effect means that only the immediate effect remains favorable for the intervention (i.e., lower values for the dependent variable in the intervention phase). In contrast, a comparison between the baseline extrapolation and the treatment data leads to overall quantifications ( δ AB and MPD) suggesting deterioration. This is because of the impossible (negative) predicted values. The other versions of MPD entail quantifications that are less overall (i.e., $ {\widehat{n}}_B<{n}_B\Big) $ , and the MPD version that both limits extrapolation and uses damping to avoid impossible projections is the one that leads to values more similar to the immediate effect.

For conditions with n A = n B = 7 (Table 6 ), the results and the comments are equivalent. The only difference is that for a general trend expressed as β 1 = − 0.2, the baseline “spontaneous” reduction is already large enough to reach the floor values, and thus even the immediate effect is unfavorable for the intervention. The results for n A = n B = 10 (Table 7 ) are similar. For n A = n B = 10, we added another condition in which the general trend was not so pronounced (i.e., β 1 = − 0.1) as to lead to a floor effect already during the baseline. For these conditions, the results are similar to the ones for n A = n B = 5 and β 1 = − 0.2.

In summary, when there is a change in level in the absence of a general trend, the proposals for limiting the extrapolation and avoiding impossible forecasts do not affect the quantification of an overall effect. Additionally, in situations in which impossible forecasts would be obtained, these proposals lead to quantifications that better represent the data pattern. We consider that for data patterns in which the floor is reached soon after introducing the intervention, an immediate effect and subsequent values at the floor level (e.g., as quantified by the percentage zero data; Scotti, Evans, Meyer, & Walker, 1991 ) should be considered sufficient evidence (if they are replicated) for an intervention effect. That is, we consider that such quantifications would be a more appropriate evaluation of the data pattern than an overall quantification, such as δ AB and MPD in absence of the proposals. Thus, we consider the proposals to be useful. Still, the specific quantifications obtained when the proposals are applied to MPD should not be considered perfect, because they will depend on the extent to which the observed data pattern matches the expected data pattern (e.g., whether a spontaneous improvement is expected, whether an immediate effect is expected) and on the type of quantification preferred (e.g., a raw difference as in MPD, a percentage change such as the one that could be obtained from the log response ratio [Pustejovsky, 2018b ], or a difference in standard deviations, such as the BC-SMD [Shadish et al., 2014 ]).

In terms of the $ {\widehat{n}}_B $ values obtained, Tables 5 , 6 , and 7 show that most typically (i.e., the central 50%), extrapolations were considered justified from two to four measurement occasions into the interventions phase. This is well-aligned with the idea of an immediate effect consisting of the first three intervention phase measurement occasions (Kratochwill et al., 2010 ) and is broader than the immediate effect defined in piecewise regressions and multilevel models (focusing only on the first measurement occasion). Such a short extrapolation would avoid the untenable assumption that the baseline trend would continue unabated for too long. Moreover, damping the baseline trend helps identify a more appropriate reference for comparing the actual intervention data points.

General discussion

Extrapolating baseline trend: issues, breadth of these issues, and tentative solutions.

Several single-case analytical techniques entail extrapolating baseline trend—for instance, the Allison and Gorman ( 1993 ) regression model, the nonregression technique called mean phase difference (Manolov & Solanas, 2013 ), and the nonoverlap index called the percentage of data points exceeding the median trend (Wolery et al., 2010 ). An initial aspect to take into account is that these three techniques estimate the intercept and slope of the trend line in three different ways. When a trend line is fitted to the baseline data, the amount of fit of the trend line to the data has to be considered, plus whether it is reasonable to consider that the trend will continue unchanged and whether extrapolating the trend would lead to predicted values that are impossible in real data. The latter issue appeared to be present in SCED data published in 2015, given that in approximately 10% of the studies reviewed, forecasts above the maximal possible value were obtained, and in 40% the forecasts were below the minimal possible value, for all five trend line fitting procedures investigated. The proposals we make here take into account the length of the baseline phase, the amount of fit of the trend line to the data, and the need to avoid meaningless comparisons between actual values and impossible predicted values. Moreover, limiting the extrapolation emphasizes the idea that a linear trend is only a model that serves as an approximation of how the data would behave if the baseline continued for a limited amount of time, rather than assuming that a linear trend is necessarily the correct model for the progression of the measurements in the absence of an intervention.

The examples provided with real data and the simulation results from applying the proposals to the MPD illustrate how the present proposal for correcting out-of-bounds forecasts avoids both excessively low and excessively high effect estimates when the bounds of the measurement units are considered. Moreover, the quantitative criterion for deciding how far out to extrapolate baseline trend serves as an objective rule for not extrapolating a trend line into the intervention phase when the baseline data are not represented well by such a line.

Recommendations for applied researchers

In relation to our proposals, we recommend both limiting the extrapolation and allowing for damping the trend. Limiting the extrapolation leads to a quantification that combines to criteria mentioned in the What Works Clearninghouse Standards (Kratochwill et al., 2010 ): immediate change and comparison of the projected versus observed data pattern, whereas damping a trend avoids completely meaningless comparisons. Moreover, in relation to the MPD, we advocate defining its intercept according to the smallest MASE value. In relation to statistical analysis in general, we do not recommend that applied researchers should necessarily always use analytical techniques to extrapolate a baseline trend (e.g., MPD, generalized least squares analysis by Swaminathan et al., 2014 , or the Allison & Gorman, 1993 , OLS model). Rather, we caution regarding the use of such techniques for certain data sets and propose a modification of MPD that avoids obtaining quantifications of effects that are based on unreasonable comparisons. Additionally, we also caution researchers that when a trend line is fitted to the data, in order to improve transparency, it is important to report the technique used for estimating the intercept and slope of this trend line, given that several such techniques are available (Manolov, 2018 ). Finally, for cases in which the data show substantial variability and are not represented well by a straight line, or even by a curved line, we recommend applying the nonoverlap of all pairs, which makes use of all the data and not only of the first $ {\widehat{n}}_B $ measurements of the intervention-phase data.

Beyond the present focus on trend, some desirable features of analytical techniques have been suggested by Wolery et al. ( 2010 ) and expanded on by Manolov, Gast, Perdices, and Evans ( 2014 ). Readers interested in broader reviews of analytical techniques can also consult Gage and Lewis ( 2013 ) and Manolov and Moeyaert ( 2017 ). In general, we echo the recommendation to use quantitative analysis together with visual analysis (e.g., Campbell & Herzinger, 2010 ; Harrington & Velicer, 2015 ; Houle, 2009 ), and we further reflect on this point in the following section.

Validating the quantifications and enhancing their interpretation: Software developments

Visual analysis is regarded as a tool for verifying the meaningfulness of the quantitative results yielded by statistical techniques (Parker et al., 2006 ). In that sense, representing visually the trend line fitted and extrapolated or the transformed data after baseline trend has been removed is crucial. Accordingly, recent efforts have focused on using visual analysis to help choose the appropriate multilevel model (Baek, Petit-Bois, Van Den Noortgate, Beretvas, & Ferron, 2016 ). To make more transparent what exactly is being done with the data to obtain the quantifications, the output of the modified MPD is both graphical and numerical (see http://manolov.shinyapps.io/MPDExtrapolation , which allows for choosing whether to limit the extrapolation of the baseline trend and whether to use damping or winsorizing in the case of out-of-bounds forecasts). For MPD, in which the quantification is the average difference between the extrapolated baseline trend and the actual intervention phase measurements, the graphical output clearly indicates which are the forecast values (plus whether a trend is maintained or damped) and how far away the baseline trend is extrapolated. Moreover, the color of the arrows from predicted to actual intervention-phase values we have used in the figures of this article indicated, for a comparison, whether (green) or not (red) the difference was in the direction desired. In summary, the graphical representation of comparisons performed in MPD makes easier using visual analysis to validate and help interpret the information obtained.

Limitations in relation to the alternatives for extrapolating linear baseline trend for forecasting

In the present study, we discussed extrapolating linear trends because the MPD, our focal analytical technique, fits a straight line to the baseline data before extrapolating them. Nevertheless, it would be possible to fit a nonlinear (e.g., logistic) model to the baseline data (Shadish et al., 2016 ). Furthermore, there are many other alternative procedures for estimating and extrapolating trend, especially in the context of time-series analysis.

Among univariate time-series procedures for forecasting, Chatfield ( 2000 ) distinguished formal statistical models, that is, mathematical representations of reality (e.g., ARIMA; state space; growth curve models, such as logistic and Gompertz; nonlinear models, including artificial neural networks) and ad hoc methods, that is, formulas for computing forecasts. Among the ad hoc methods the most well-known and frequently used options are exponential smoothing (which can be expressed within the framework of state space models; De Gooijer & Hyndman, 2006 ) and the related Holt linear-trend procedure or the Holt–Winters procedure including a seasonal component. As we mentioned previously, the idea of damping a trend is borrowed from the Holt linear-trend procedure, on the basis of the work of E. S. Gardner and McKenzie ( 1985 ).

Regarding ARIMA, according to the Box–Jenkins approach already introduced in the single-case designs context, the aim is to identify the best parsimonious model by means of three steps: model identification, parameter estimation, and diagnostic checking. An appropriate model would then be used for forecasting. The difficulties of correctly identifying the ARIMA model for single-case data, via the analysis of autocorrelations and partial autocorrelations, have been documented (Velicer & Harrop, 1983 ), leading to fewer plausible models being proposed that would avoid this initial step (Velicer & McDonald, 1984 ). The simulation evidence available (Harrop & Velicer, 1985 ) for these models refers to data series of 40 measurements (i.e., 20 per phase), which is more than might be expected from typical single-case baselines (almost half of the initial baselines contained four or fewer data points) or series lengths (median of 20, according to the review by Shadish & Sullivan, 2011 , with most series containing fewer than 40 measurements). Moreover, to the best of our knowledge, the possibility of obtaining out-of-bounds predicted values has not been discussed, nor have tentative solutions been proposed for this issue.

Holt’s ( 2004 ) linear-trend procedure is another option for forecasting that is available in textbooks (e.g., Mendenhall & Sincich, 2012 ), and therefore is potentially accessible to applied researchers. Holt’s model is an extension of simple exponential smoothing including a linear trend. This procedure can be extended further by including a damping parameter (E. S. Gardner & McKenzie, 1985 ) that indicates how much the slope of the trend is reduced in subsequent time periods. The latter model is called the additive damped trend model , and according to the review by E. S. Gardner ( 2006 ), it is the model of choice when using exponential smoothing. The main issue with the additive damped trend model is that it requires estimating three parameters—one smoothing parameter for the level, one smoothing parameter for the trend, and the damping parameter—and it is also recommended to estimate the initial level and trend via optimization. It is unclear whether reliable estimates can be obtained with the usually short baseline phases in single-case data. We performed a small-scale check using the R code by Hyndman and Athanasopoulos ( 2013 , chap. 7.4). For instance, for the Ciullo et al. ( 2015 ) data with n A ≤ 4 and the multiple-baseline data by Eilers and Hayes ( 2015 ) with n A equal to 3, 5, and 8, the number of measurements was not sufficient to estimate the damping parameter, and thus only a linear trend was extrapolated. The same was the case for the Allen et al. ( 2015 ) data for n A = 5 and 9, whereas for n A = 16, it was possible to use the additive damped trend model. Our check suggested that the minimum baseline length required for applying the additive damped trend model is 10, which is greater than (a) the value found in at least 50% of the data sets reviewed by Shadish and Sullivan ( 2011 ); (b) the modal value of six baseline data points reported in Smith’s ( 2012 ) review; and (c) the average baseline length in the Solomon ( 2014 ) review.

Therefore, the reader should be aware that there are alternatives for estimating and extrapolating trend for forecasting. However, to the best of our knowledge, none of these alternatives is directly applicable to single-case data without any issues, or without the need to explore which model or method is more appropriate, and in which circumstances, questions that do not have clear answers even for the usually longer time-series data (Chatfield, 2000 ).

Future research

One line of future research could be to focus on testing the proposals via a broader simulation, such as one that applied different analytical techniques: for instance, the MPD, before computing δ AB in the context of regression analysis, and the percentage of data points exceeding the median trend. Another line of research could focus on a comparison between the version of MPD incorporating the proposals and the recently developed generalized logistic model of Verboon and Peters ( 2018 ). Such a comparison could entail a field test and a survey among applied researchers on the perceived ease of use and the utility of the information provided.

Author note

The authors thank Patrick Onghena for his feedback on previous versions of this article.

In contrast, in the meta-analysis by Chiu and Roberts ( 2018 ), for outcomes for which there was no true maximum, the largest value actually obtained was treated as a maximum, before converting the values into percentages. If we had followed the same procedure, we would have found a greater frequency of impossibly high forecasts.

The references in this paragraph correspond to the studies included in the review and are available in Appendix A at our Open Science Framework site: https://osf.io/js3hk/ .

Note that Tarlow ( 2017 ) identified several issues with Tau-U and proposed the “baseline-corrected Tau,” which, however, corrects the data using the linear trend as estimated with the Theil–Sen estimator, and thus implicitly assumes that a straight line is a good representation of the baseline data.

It could be argued that having three different ways of defining the intercept available (i.e., in the Shiny application) may prompt applied researchers to choose the definition that favors their hypotheses or expectations. Nevertheless, we advocate using the definition of the intercept that provides a better fit to the data, both visually and quantitatively, as assessed via the MASE.

Following Tate and Perdices ( 2018 ), we use the term “tier” to refer to each AB comparison within a multiple-baseline design. Therefore, “tiers” could refer to different individuals, if the multiple-baseline design entails a staggered replication across participants, or to different behaviors or settings, if there is replication across behaviors or settings. Additionally, the term “tier” enables us to avoid confusion with the term “baseline,” which denotes only the A phase of the AB comparison.

Allen, K. D., Vatland, C., Bowen, S. L., & Burke, R. V. (2015). Parent-produced video self-modeling to improve independence in an adolescent with intellectual developmental disorder and an autism spectrum disorder: A controlled case study. Behavior Modification , 39 , 542–556.

PubMed Google Scholar

Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case. Behaviour Research and Therapy , 31 , 621−631.

Arnau, J., & Bono, R. (1998). Short time series analysis: C statistic vs. Edgington model. Quality & Quantity , 32 , 63–75.

Google Scholar

Austin, J. E., & Tiger, J. H. (2015). Providing alternative reinforcers to facilitate tolerance to delayed reinforcement following functional communication training. Journal of Applied Behavior Analysis , 48 , 663−668.

Baek, E. K., Petit-Bois, M., Van Den Noortgate, W., Beretvas, S. N., & Ferron, J. M. (2016). Using visual analysis to evaluate and refine multilevel models of single-case studies. Journal of Special Education , 50 , 18–26.

Billah, B., King, M. L., Snyder, R. D., & Koehler, A. B. (2006). Exponential smoothing model selection for forecasting. International Journal of Forecasting , 22 , 239–247.

Brandt, J. A. A., Dozier, C. L., Juanico, J. F., Laudont, C. L., & Mick, B. R. (2015). The value of choice as a reinforcer for typically developing children. Journal of Applied Behavior Analysis , 48 , 344−362.

Bringmann, L. F., Hamaker, E. L., Vigo, D. E., Aubert, A., Borsboom, D., & Tuerlinckx, F. (2017). Changing dynamics: Time-varying autoregressive models using generalized additive modeling. Psychological Methods , 22 , 409–425. https://doi.org/10.1037/met0000085

Article PubMed Google Scholar

Campbell, J. M., & Herzinger, C. V. (2010). Statistics and single subject research methodology. In D. L. Gast (Ed.), Single subject research methodology in behavioral sciences (pp. 417–453). London: Routledge.

Cannella-Malone, H. I., Sabielny, L. M., & Tullis, C. A. (2015). Using eye gaze to identify reinforcers for individuals with severe multiple disabilities. Journal of Applied Behavior Analysis , 48 , 680–684. https://doi.org/10.1002/jaba.231

Center, B. A., Skiba, R. J., & Casey, A. (1985–1986). A methodology for the quantitative synthesis of intra-subject design research. Journal of Special Education , 19 , 387–400.

Chatfield, C. (2000). Time-series forecasting. London: Chapman & Hall/CRC.

Cheng, Y., Huang, C. L., & Yang, C. S. (2015). Using a 3D immersive virtual environment system to enhance social understanding and social skills for children with autism spectrum disorders. Focus on Autism and Other Developmental Disabilities , 30 , 222−236.

Chiu, M. M., & Roberts, C. A. (2018). Improved analyses of single cases: Dynamic multilevel analysis. Developmental Neurorehabilitation , 21 , 253–265.

Ciullo, S., Falcomata, T. S., Pfannenstiel, K., & Billingsley, G. (2015). Improving learning with science and social studies text using computer-based concept maps for students with disabilities. Behavior Modification , 39 , 117–135.

De Gooijer, J. G., & Hyndman, R. J. (2006). 25 years of time series forecasting. International Journal of Forecasting , 22 , 443–473.

Eilers, H. J., & Hayes, S. C. (2015). Exposure and response prevention therapy with cognitive defusion exercises to reduce repetitive and restrictive behaviors displayed by children with autism spectrum disorder. Research in Autism Spectrum Disorders , 19 , 18–31.

Fahmie, T. A., Iwata, B. A., & Jann, K. E. (2015). Comparison of edible and leisure reinforcers. Journal of Applied Behavior Analysis , 48 , 331−343.

Faith, M. S., Allison, D. B., & Gorman, D. B. (1997). Meta-analysis of single-case research. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 245–277). Mahwah: Erlbaum.

Ferron, J. M., Bell, B. A., Hess, M. R., Rendina-Gobioff, G., & Hibbard, S. T. (2009). Making treatment effect inferences from multiple-baseline data: The utility of multilevel modeling approaches. Behavior Research Methods , 41 , 372–384. https://doi.org/10.3758/BRM.41.2.372

Fisher, W. W., Kelley, M. E., & Lomas, J. E. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis , 36 , 387–406.

PubMed PubMed Central Google Scholar

Fiske, K. E., Isenhower, R. W., Bamond, M. J., Delmolino, L., Sloman, K. N., & LaRue, R. H. (2015). Assessing the value of token reinforcement for individuals with autism. Journal of Applied Behavior Analysis , 48 , 448−453.

Fox, J. (2016). Applied regression analysis and generalized linear models (3rd). London: Sage.

Gage, N. A., & Lewis, T. J. (2013). Analysis of effect for single-case design research. Journal of Applied Sport Psychology , 25 , 46–60.

Gardner, E. S., Jr. (2006). Exponential smoothing: The state of the art—Part II. International Journal of Forecasting , 22 , 637–666.

Gardner, E. S., Jr., & McKenzie, E. (1985). Forecasting trends in time series. Management Science , 31 , 1237–1246.

Gardner, S. J., & Wolfe, P. S. (2015). Teaching students with developmental disabilities daily living skills using point-of-view modeling plus video prompting with error correction. Focus on Autism and Other Developmental Disabilities , 30 , 195−207.

Harrington, M., & Velicer, W. F. (2015). Comparing visual and statistical analysis in single-case studies using published studies. Multivariate Behavioral Research , 50 , 162–183.

Harrop, J. W., & Velicer, W. F. (1985). A comparison of alternative approaches to the analysis of interrupted time-series. Multivariate Behavioral Research , 20 , 27–44.

Hine, J. F., Ardoin, S. P., & Foster, T. E. (2015). Decreasing transition times in elementary school classrooms: Using computer-assisted instruction to automate intervention components. Journal of Applied Behavior Analysis , 48 , 495–510. https://doi.org/10.1002/jaba.233

Holt, C. C. (2004). Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting , 20 , 5–10.

Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the systematic analysis and use of single-case research. Education and Treatment of Children , 35 , 269–290.

Houle, T. T. (2009). Statistical analyses for single-case experimental designs. In D. H. Barlow, M. K. Nock, & M. Hersen (Eds.), Single case experimental designs: Strategies for studying behavior change (3rd, pp. 271–305). Boston: Pearson.

Huitema, B. E., McKean, J. W., & McKnight, S. (1999). Autocorrelation effects on least-squares intervention analysis of short time series. Educational and Psychological Measurement , 59 , 767–786.

Hyndman, R. J., & Athanasopoulos, G. (2013). Forecasting: Principles and practice. Retrieved March 24, 2018, from https://www.otexts.org/fpp/7/4

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting , 22 , 679–688.

Knight, V. F., Wood, C. L., Spooner, F., Browder, D. M., & O’Brien, C. P. (2015). An exploratory study using science eTexts with students with Autism Spectrum Disorder. Focus on Autism and Other Developmental Disabilities , 30 , 86−99.

Kratochwill, T. R., Hitchcock, J. H., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single case designs technical documentation . In the What Works Clearinghouse: Procedures and standards handbook (Version 1.0). Available at http://ies.ed.gov/ncee/wwc/pdf/reference_resources/wwc_scd.pdf

Kratochwill, T. R., Levin, J. R., Horner, R. H., & Swoboda, C. M. (2014). Visual analysis of single-case intervention research: Conceptual and methodological issues. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances (pp. 91–125). Washington, DC: American Psychological Association.

Lane, J. D., & Gast, D. L. (2014). Visual analysis in single case experimental design studies: Brief review and guidelines. Neuropsychological Rehabilitation , 24 , 445–463.

Ledbetter-Cho, K., Lang, R., Davenport, K., Moore, M., Lee, A., Howell, A., . . . O’Reilly, M. (2015). Effects of script training on the peer-to-peer communication of children with autism spectrum disorder. Journal of Applied Behavior Analysis , 48 , 785−799.

Manolov, R. (2018). Linear trend in single-case visual and quantitative analyses. Behavior Modification , 42 , 684–706.

Manolov, R., Gast, D. L., Perdices, M., & Evans, J. J. (2014). Single-case experimental designs: Reflections on conduct and analysis. Neuropsychological Rehabilitation , 24 , 634−660. https://doi.org/10.1080/09602011.2014.903199

Manolov, R., & Moeyaert, M. (2017). Recommendations for choosing single-case data analytical techniques. Behavior Therapy , 48 , 97−114.

Manolov, R., & Rochat, L. (2015). Further developments in summarising and meta-analysing single-case data: An illustration with neurobehavioural interventions in acquired brain injury. Neuropsychological Rehabilitation , 25 , 637−662.

Manolov, R., & Solanas, A. (2009). Percentage of nonoverlapping corrected data. Behavior Research Methods , 41 , 1262–1271. https://doi.org/10.3758/BRM.41.4.1262

Manolov, R., & Solanas, A. (2013). A comparison of mean phase difference and generalized least squares for analyzing single-case data. Journal of School Psychology , 51 , 201−215.

Marso, D., & Shadish, W. R. (2015). Software for meta-analysis of single-case design: DHPS macro . Retrieved January 22, 2017, from http://faculty.ucmerced.edu/wshadish/software/software-meta-analysis-single-case-design

Matyas, T. A., & Greenwood, K. M. (1997). Serial dependency in single-case time series. In R. D. Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 215–243). Mahwah: Erlbaum.

Mendenhall, W., & Sincich, T. (2012). A second course in statistics: Regression analysis (7th). Boston: Prentice Hall.

Mercer, S. H., & Sterling, H. E. (2012). The impact of baseline trend control on visual analysis of single-case data. Journal of School Psychology , 50 , 403–419.

Parker, R. I., Cryer, J., & Byrns, G. (2006). Controlling baseline trend in single-case research. School Psychology Quarterly , 21 , 418−443.

Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy , 40 , 357–367. https://doi.org/10.1016/j.beth.2008.10.006

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U. Behavior Therapy , 42 , 284−299. https://doi.org/10.1016/j.beth.2010.08.006

Pustejovsky, J. E. (2015). Measurement-comparable effect sizes for single-case studies of free-operant behavior. Psychological Methods , 20 , 342−359.

Pustejovsky, J. E. (2018a). Procedural sensitivities of effect sizes for single-case designs with directly observed behavioral outcome measures. Psychological Methods . Advance online publication. https://doi.org/10.1037/met0000179

Pustejovsky, J. E. (2018b). Using response ratios for meta-analyzing single-case designs with behavioral outcomes. Journal of School Psychology , 68 , 99–112.

Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics , 39 , 368–393.

Rindskopf, D. M., & Ferron, J. M. (2014). Using multilevel models to analyze single-case design data. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances (pp. 221−246). Washington, DC: American Psychological Association.

Rispoli, M., Ninci, J., Burke, M. D., Zaini, S., Hatton, H., & Sanchez, L. (2015). Evaluating the accuracy of results for teacher implemented trial-based functional analyses. Behavior Modification , 39 , 627−653.

Rogosa, D. (1980). Comparing nonparallel regression lines. Psychological Bulletin , 88 , 307–321. https://doi.org/10.1037/0033-2909.88.2.307

Article Google Scholar

Saini, V., Greer, B. D., & Fisher, W. W. (2015). Clarifying inconclusive functional analysis results: Assessment and treatment of automatically reinforced aggression. Journal of Applied Behavior Analysis , 48 , 315–330. https://doi.org/10.1002/jaba.203

Article PubMed PubMed Central Google Scholar

Scotti, J. R., Evans, I. M., Meyer, L. H., & Walker, P. (1991). A meta-analysis of intervention research with problem behavior: Treatment validity and standards of practice. American Journal on Mental Retardation , 96 , 233–256.

Scruggs, T. E., & Mastropieri, M. A. (1998). Summarizing single-subject research: Issues and applications. Behavior Modification , 22 , 221–242.

Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. Journal of School Psychology , 52 , 123–147.

Shadish, W. R., Kyse, E. N., & Rindskopf, D. M. (2013). Analyzing data from single-case designs using multilevel models: New applications and some agenda items for future research. Psychological Methods , 18 , 385–405. https://doi.org/10.1037/a0032964

Shadish, W. R., Rindskopf, D. M., & Boyajian, J. G. (2016). Single-case experimental design yielded an effect estimate corresponding to a randomized controlled trial. Journal of Clinical Epidemiology , 76 , 82–88.

Shadish, W. R., Rindskopf, D. M., Hedges, L. V., & Sullivan, K. J. (2013). Bayesian estimates of autocorrelations in single-case designs. Behavior Research Methods , 45 , 813–821.

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods , 43 , 971−980. https://doi.org/10.3758/s13428-011-0111-y

Siegel, E. B., & Lien, S. E. (2015). Using photographs of contrasting contextual complexity to support classroom transitions for children with Autism Spectrum Disorders. Focus on Autism and Other Developmental Disabilities , 30 , 100−114.

Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods , 17 , 510–550. https://doi.org/10.1037/a0029312

Solanas, A., Manolov, R., & Onghena, P. (2010). Estimating slope and level change in N = 1 designs. Behavior Modification , 34 , 195−218.

Solomon, B. G. (2014). Violations of assumptions in school-based single-case data: Implications for the selection and interpretation of effect sizes. Behavior Modification , 38 , 477−496.

Stewart, K. K., Carr, J. E., Brandt, C. W., & McHenry, M. M. (2007). An evaluation of the conservative dual-criterion method for teaching university students to visually inspect AB-design graphs. Journal of Applied Behavior Analysis , 40 , 713−718.

Sullivan, K. J., Shadish, W. R., & Steiner, P. M. (2015). An introduction to modeling longitudinal data with generalized additive models: Applications to single-case designs. Psychological Methods , 20 , 26−42. https://doi.org/10.1037/met0000020

Swaminathan, H., Rogers, H. J., Horner, R., Sugai, G., & Smolkowski, K. (2014). Regression models for the analysis of single case designs. Neuropsychological Rehabilitation , 24 , 554−571.

Swan, D. M., & Pustejovsky, J. E. (2018). A gradual effects model for single-case designs. Multivariate Behavioral Research , 53 , 574–593. https://doi.org/10.1080/00273171.2018.1466681

Tarlow, K. (2017). An improved rank correlation effect size statistic for single-case designs: Baseline corrected Tau. Behavior Modification , 41 , 427–467.

Tate, R. L., & Perdices, M. (2018). Single-case experimental designs for clinical research and neurorehabilitation settings: Planning, conduct, analysis and reporting. London: Routledge.

Tate, R. L., Perdices, M., Rosenkoetter, U., Wakima, D., Godbee, K., Togher, L., & McDonald, S. (2013). Revision of a method quality rating scale for single-case experimental designs and n -of-1 trials: The 15-item Risk of Bias in N -of-1 Trials (RoBiNT) Scale. Neuropsychological Rehabilitation , 23 , 619–638. https://doi.org/10.1080/09602011.2013.824383

Van den Noortgate, W., & Onghena, P. (2008). A multilevel meta-analysis of single-subject experimental design studies. Evidence-Based Communication Assessment and Intervention , 2 , 142–151.

Vannest, K. J., Parker, R. I., Davis, J. L., Soares, D. A., & Smith, S. L. (2012). The Theil–Sen slope for high-stakes decisions from progress monitoring. Behavioral Disorders , 37 , 271–280.

Velicer, W. F., & Harrop, J. (1983). The reliability and accuracy of time series model identification. Evaluation Review , 7 , 551–560.

Velicer, W. F., & McDonald, R. P. (1984). Time series analysis without model identification. Multivariate Behavioral Research , 19 , 33–47.

Verboon, P., & Peters, G. J. (2018). Applying the generalized logistic model in single case designs: Modeling treatment-induced shifts. Behavior Modification . Advance online publication. https://doi.org/10.1177/0145445518791255

White, D. M., Rusch, F. R., Kazdin, A. E., & Hartmann, D. P. (1989). Applications of meta-analysis in individual subject research. Behavioral Assessment , 11 , 281–296.

Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., van Aert, R. C., & Van Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p -hacking. Frontiers in Psychology , 7 , 1832. https://doi.org/10.3389/fpsyg.2016.01832

Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantitatively synthesizing single-subject data. Journal of Special Education , 44 , 18–29.

Wolfe, K., & Slocum, T. A. (2015). A comparison of two approaches to training visual analysis of AB graphs. Journal of Applied Behavior Analysis , 48 , 472–477. https://doi.org/10.1002/jaba.212

Young, N. D., & Daly, E. J., III. (2016). An evaluation of prompting and reinforcement for training visual analysis skills. Journal of Behavioral Education , 25 , 95–119.

Download references

Author information

Authors and affiliations.

Department of Social Psychology and Quantitative Psychology, Faculty of Psychology, University of Barcelona, Barcelona, Spain

Rumen Manolov & Antonio Solanas

Department of Operations, Innovation and Data Sciences, ESADE Business School, Ramon Llull University, Barcelona, Spain

Rumen Manolov & Vicenta Sierra

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rumen Manolov .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References to the studies included in the present review of single-case research published in 2015 in four journals: Journal of Applied Behavior Analysis , Behavior Modification , Research in Autism Spectrum Disorders , and Focus on Autism and Other Developmental Disabilities .

Allen, K. D., Vatland, C., Bowen, S. L., & Burke, R. V. (2015). An evaluation of parent-produced video self-modeling to improve independence in an adolescent with intellectual developmental disorder and an autism spectrum disorder: A controlled case study. Behavior Modification , 39 , 542–556.

Austin, J. L., Groves, E. A., Reynish, L. C., & Francis, L. L. (2015). Validating trial-based functional analyses in mainstream primary school classrooms. Journal of Applied Behavior Analysis , 48 , 274–288.

Boudreau, B. A., Vladescu, J. C., Kodak, T. M., Argott, P. J., & Kisamore, A. N. (2015). A comparison of differential reinforcement procedures with children with autism. Journal of Applied Behavior Analysis , 48 , 918–923.

Carroll, R. A., Joachim, B. T., St Peter, C. C., & Robinson, N. (2015). A comparison of error-correction procedures on skill acquisition during discrete-trial instruction. Journal of Applied Behavior Analysis , 48 , 257–273.

Ciccone, F. J., Graff, R. B., & Ahearn, W. H. (2015). Increasing the efficiency of paired-stimulus preference assessments by identifying categories of preference. Journal of Applied Behavior Analysis , 48 , 221–226.

Ciullo, S., Falcomata, T. S., Pfannenstiel, K., & Billingsley, G. (2014). Improving learning with science and social studies text using computer-based concept maps for students with disabilities. Behavior Modification , 39 , 117–135.

Daar, J. H., Negrelli, S., & Dixon, M. R. (2015). Derived emergence of WH question–answers in children with autism. Research in Autism Spectrum Disorders , 19 , 59–71.

DeQuinzio, J. A., & Taylor, B. A. (2015). Teaching children with autism to discriminate the reinforced and nonreinforced responses of others: Implications for observational learning. Journal of Applied Behavior Analysis , 48 , 38–51.

Derosa, N. M., Fisher, W. W., & Steege, M. W. (2015). An evaluation of time in establishing operation on the effectiveness of functional communication training. Journal of Applied Behavior Analysis , 48 , 115–130.

Ditzian, K., Wilder, D. A., King, A., & Tanz, J. (2015). An evaluation of the performance diagnostic checklist–human services to assess an employee performance problem in a center-based autism treatment facility. Journal of Applied Behavior Analysis , 48 , 199–203.

Donaldson, J. M., Wiskow, K. M., & Soto, P. L. (2015). Immediate and distal effects of the good behavior game. Journal of Applied Behavior Analysis , 48 , 685–689.

Downs, H. E., Miltenberger, R., Biedronski, J., & Witherspoon, L. (2015). The effects of video self-evaluation on skill acquisition with yoga postures. Journal of Applied Behavior Analysis , 48 , 930–935.

Dupuis, D. L., Lerman, D. C., Tsami, L., & Shireman, M. L. (2015). Reduction of aggression evoked by sounds using noncontingent reinforcement and time-out. Journal of Applied Behavior Analysis , 48 , 669–674.

Engstrom, E., Mudford, O. C., & Brand, D. (2015). Replication and extension of a check-in procedure to increase activity engagement among people with severe dementia. Journal of Applied Behavior Analysis , 48 , 460–465.

Fahmie, T. A., Iwata, B. A., & Jann, K. E. (2015). Comparison of edible and leisure reinforcers. Journal of Applied Behavior Analysis , 48 , 331–343.

Fichtner, C. S., & Tiger, J. H. (2015). Teaching discriminated social approaches to individuals with Angelman syndrome. Journal of Applied Behavior Analysis , 48 , 734–748.

Fisher, W. W., Greer, B. D., Fuhrman, A. M., & Querim, A. C. (2015). Using multiple schedules during functional communication training to promote rapid transfer of treatment effects. Journal of Applied Behavior Analysis , 48 , 713–733.

Fox, A. E., & Belding, D. L. (2015). Reducing pawing in horses using positive reinforcement. Journal of Applied Behavior Analysis , 48 , 936–940.

Frewing, T. M., Rapp, J. T., & Pastrana, S. J. (2015). Using conditional percentages during free-operant stimulus preference assessments to predict the effects of preferred items on stereotypy preliminary findings. Behavior Modification , 39 , 740–765.

Fu, S. B., Penrod, B., Fernand, J. K., Whelan, C. M., Griffith, K., & Medved, S. (2015). The effects of modeling contingencies in the treatment of food selectivity in children with autism. Behavior Modification , 39 , 771–784.

Gardner, S. J., & Wolfe, P. S. (2014). Teaching students with developmental disabilities daily living skills using point-of-view modeling plus video prompting with error correction. Focus on Autism and Other Developmental Disabilities , 30 , 195–207.

Gilroy, S. P., Lorah, E. R., Dodge, J., & Fiorello, C. (2015). Establishing deictic repertoires in autism. Research in Autism Spectrum Disorders , 19 , 82–92.

Groskreutz, M. P., Peters, A., Groskreutz, N. C., & Higbee, T. S. (2015). Increasing play-based commenting in children with autism spectrum disorder using a novel script-frame procedure. Journal of Applied Behavior Analysis , 48 , 442–447.

Haq, S. S., & Kodak, T. (2015). Evaluating the effects of massed and distributed practice on acquisition and maintenance of tacts and textual behavior with typically developing children. Journal of Applied Behavior Analysis , 48 , 85–95.

Hayes, L. B., & Van Camp, C. M. (2015). Increasing physical activity of children during school recess. Journal of Applied Behavior Analysis , 48 , 690–695.

Kelley, M. E., Liddon, C. J., Ribeiro, A., Greif, A. E., & Podlesnik, C. A. (2015). Basic and translational evaluation of renewal of operant responding. Journal of Applied Behavior Analysis , 48 , 390–401.

Kodak, T., Clements, A., Paden, A. R., LeBlanc, B., Mintz, J., & Toussaint, K. A. (2015). Examination of the relation between an assessment of skills and performance on auditory–visual conditional discriminations for children with autism spectrum disorder. Journal of Applied Behavior Analysis , 48 , 52–70.

Kuhl, S., Rudrud, E. H., Witts, B. N., & Schulze, K. A. (2015). Classroom-based interdependent group contingencies increase children’s physical activity. Journal of Applied Behavior Analysis , 48 , 602–612.

Lambert, A. M., Tingstrom, D. H., Sterling, H. E., Dufrene, B. A., & Lynne, S. (2015). Effects of tootling on classwide disruptive and appropriate behavior of upper-elementary students. Behavior Modification , 39 , 413–430.

Lambert, J. M., Bloom, S. E., Samaha, A. L., Dayton, E., & Rodewald, A. M. (2015). Serial alternative response training as intervention for target response resurgence. Journal of Applied Behavior Analysis , 48 , 765–780.

Lee, G. P., Miguel, C. F., Darcey, E. K., & Jennings, A. M. (2015). A further evaluation of the effects of listener training on derived categorization and speaker behavior in children with autism. Research in Autism Spectrum Disorders , 19 , 72–81.

Lerman, D. C., Hawkins, L., Hillman, C., Shireman, M., & Nissen, M. A. (2015). Adults with autism spectrum disorder as behavior technicians for young children with autism: Outcomes of a behavioral skills training program. Journal of Applied Behavior Analysis , 48 , 233–256.

Mechling, L. C., Ayres, K. M., Foster, A. L., & Bryant, K. J. (2014). Evaluation of generalized performance across materials when using video technology by students with autism spectrum disorder and moderate intellectual disability. Focus on Autism and Other Developmental Disabilities , 30 , 208–221.

Miller, S. A., Rodriguez, N. M., & Rourke, A. J. (2015). Do mirrors facilitate acquisition of motor imitation in children diagnosed with autism? Journal of Applied Behavior Analysis , 48 , 194–198.

Mitteer, D. R., Romani, P. W., Greer, B. D., & Fisher, W. W. (2015). Assessment and treatment of pica and destruction of holiday decorations. Journal of Applied Behavior Analysis , 48 , 912–917.

Neely, L., Rispoli, M., Gerow, S., & Ninci, J. (2014). Effects of antecedent exercise on academic engagement and stereotypy during instruction. Behavior Modification , 39 , 98–116.

O’Handley, R. D., Radley, K. C., & Whipple, H. M. (2015). The relative effects of social stories and video modeling toward increasing eye contact of adolescents with autism spectrum disorder. Research in Autism Spectrum Disorders , 11 , 101–111.

Paden, A. R., & Kodak, T. (2015). The effects of reinforcement magnitude on skill acquisition for children with autism. Journal of Applied Behavior Analysis , 48 , 924–929.

Pence, S. T., & St Peter, C. C. (2015). Evaluation of treatment integrity errors on mand acquisition. Journal of Applied Behavior Analysis , 48 , 575–589. https://doi.org/10.1002/jaba.238

Peters, L. C., & Thompson, R. H. (2015). Teaching children with autism to respond to conversation partners’ interest. Journal of Applied Behavior Analysis , 48 , 544–562.

Peterson, K. M., Volkert, V. M., & Zeleny, J. R. (2015). Increasing self-drinking for children with feeding disorders. Journal of Applied Behavior Analysis , 48 , 436–441.

Protopopova, A., & Wynne, C. D. (2015). Improving in-kennel presentation of shelter dogs through response-dependent and response-independent treat delivery. Journal of Applied Behavior Analysis , 48 , 590–601.

Putnam, B. C., & Tiger, J. H. (2015). Teaching braille letters, numerals, punctuation, and contractions to sighted individuals. Journal of Applied Behavior Analysis , 48 , 466–471.

Quinn, M. J., Miltenberger, R. G., & Fogel, V. A. (2015). Using TAGteach to improve the proficiency of dance movements. Journal of Applied Behavior Analysis , 48 , 11–24.

Rosales, R., Gongola, L., & Homlitas, C. (2015). An evaluation of video modeling with embedded instructions to teach implementation of stimulus preference assessments. Journal of Applied Behavior Analysis , 48 , 209–214.

Saini, V., Gregory, M. K., Uran, K. J., & Fantetti, M. A. (2015). Parametric analysis of response interruption and redirection as treatment for stereotypy. Journal of Applied Behavior Analysis , 48 , 96–106.

Scalzo, R., Henry, K., Davis, T. N., Amos, K., Zoch, T., Turchan, S., & Wagner, T. (2015). Evaluation of interventions to reduce multiply controlled vocal stereotypy. Behavior Modification , 39 , 496–509.

Siegel, E. B., & Lien, S. E. (2014). Using photographs of contrasting contextual complexity to support classroom transitions for children with Autism Spectrum Disorders. Focus on Autism and Other Developmental Disabilities , 30 , 100–114.

Slocum, S. K., & Vollmer, T. R. (2015). A comparison of positive and negative reinforcement for compliance to treat problem behavior maintained by escape. Journal of Applied Behavior Analysis , 48 , 563–574.

Smith, K. A., Shepley, S. B., Alexander, J. L., Davis, A., & Ayres, K. M. (2015). Self-instruction using mobile technology to learn functional skills. Research in Autism Spectrum Disorders , 11 , 93–100.

Sniezyk, C. J., & Zane, T. L. (2014). Investigating the effects of sensory integration therapy in decreasing stereotypy. Focus on Autism and Other Developmental Disabilities , 30 , 13–22.

Speelman, R. C., Whiting, S. W., & Dixon, M. R. (2015). Using behavioral skills training and video rehearsal to teach blackjack skills. Journal of Applied Behavior Analysis , 48 , 632–642.

Still, K., May, R. J., Rehfeldt, R. A., Whelan, R., & Dymond, S. (2015). Facilitating derived requesting skills with a touchscreen tablet computer for children with autism spectrum disorder. Research in Autism Spectrum Disorders , 19 , 44–58.

Vargo, K. K., & Ringdahl, J. E. (2015). An evaluation of resistance to change with unconditioned and conditioned reinforcers. Journal of Applied Behavior Analysis , 48 , 643–662.

Vedora, J., & Grandelski, K. (2015). A comparison of methods for teaching receptive language to toddlers with autism. Journal of Applied Behavior Analysis , 48 , 188–193.

Wilder, D. A., Majdalany, L., Sturkie, L., & Smeltz, L. (2015). Further evaluation of the high-probability instructional sequence with and without programmed reinforcement. Journal of Applied Behavior Analysis , 48 , 511–522.

Wunderlich, K. L., & Vollmer, T. R. (2015). Data analysis of response interruption and redirection as a treatment for vocal stereotypy. Journal of Applied Behavior Analysis , 48 , 749–764.

Appendix B: Versions of the mean phase difference

In the initial proposal (Manolov & Solanas, 2013 ), MPD.2013 entails the following steps:

Estimate baseline trend as the average of the differenced baseline phase data:

Extrapolate baseline trend, adding the trend estimate ( b 1( D ) ) to the last baseline phase data point ( $ {y}_{n_A} $ ) to predict the first intervention phase data point ( $ {\widehat{y}}_{n_A+1} $ ). Formally, $ {\widehat{y}}_{n_A+1}={y}_{n_A}+{b}_{1(D)} $ . This entails that the intercept of the baseline trend line is $ {b}_{0(MPD.2013)}={y}_{n_A}-{n}_A\times {b}_{1(D)} $ .

Continue extrapolating adding the trend estimate to the previously obtained forecast. Formally, $ {\widehat{y}}_{n_A+j}={\widehat{y}}_{n_A+j-1}+{b}_{1(D)};j=2,3,\dots, {n}_B $ .

Obtain MPD as the difference between the actually obtained treatment data (y j ) and the treatment measurements as predicted from baseline trend ( $ {\widehat{y}}_j $ ): $ {MPD}_{2013}=\frac{\sum_{j=1}^{n_B}\left({y}_j-{\widehat{y}}_j\right)}{n_B} $ .

In its modified version (Manolov & Rochat, 2015 ), MPD.2015 entails the following steps:

Estimate baseline trend as the average of the differenced baseline phase data: the same b 1( D ) previously defined.

Establish the pivotal point in the baseline at the crossing of Md ( x ) = Md (1, 2, …, n A ) on the abscissa and $ Md(y)= Md\left({y}_1,{y}_2,\dots, {y}_{n_A}\right) $ on the ordinate.

Establish a fitted value at an existing baseline measurement occasion around Md ( y ). Formally, $ {\widehat{y}}_{\left\lfloor Md(x)\right\rfloor }= Md(y)-\left( Md(x)-\left\lfloor Md(x)\right\rfloor \right)\times {b}_1 $ .

Fit the baseline trend to the whole baseline, subtracting and adding the estimated baseline slope from the fitted value obtained in the previous step, according to the measurement occasion.

Therefore, the intercept of the baseline trend line is defined as:

Extrapolate the baseline trend into the treatment phase, starting from the last fitted baseline value: $ {\widehat{y}}_{n_A+1}={\widehat{y}}_{n_A}+{b}_{1(D)} $ .

Continue extrapolating adding the trend estimate to the previously obtained forecast: $ {\widehat{y}}_{n_A+j}={\widehat{y}}_{n_A+j-1}+{b}_{1(D)};j=2,3,\dots, {n}_B $ .

Obtain MPD as the difference between the actually obtained treatment data and the treatment measurements as predicted from baseline trend: $ {MPD}_{2015}=\frac{\sum_{j=1}^{n_B}\left({y}_j-{\widehat{y}}_j\right)}{n_B} $ .

We propose a third way of defining the intercept, namely, in the same way as estimated in the Theil–Sen estimator, that is, as the median difference between actual data points and the trend multiplied by the measurement occasion: b 0( TS ) = Md ( y i − b 1( D ) × i ); i = 1, 2, …, n A . Note that the slope is still estimated as in the original proposal (Manolov & Solanas, 2013 ).

Rights and permissions

Reprints and permissions

About this article

Manolov, R., Solanas, A. & Sierra, V. Extrapolating baseline trend in single-case data: Problems and tentative solutions. Behav Res 51 , 2847–2869 (2019). https://doi.org/10.3758/s13428-018-1165-x

Download citation

Published : 27 November 2018

Issue Date : December 2019

DOI : https://doi.org/10.3758/s13428-018-1165-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Single-case designs
Extrapolation
Forecasting
Find a journal
Publish with us
Track your research

- Google Chrome

Intended for healthcare professionals

My email alerts
BMA member login
Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Search form

Advanced search
Search responses
Search blogs
Generalisation and...

Generalisation and extrapolation of study results

Related content
Peer review
Philip Sedgwick , reader in medical statistics and medical education
1 Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK
p.sedgwick{at}sgul.ac.uk

Researchers assessed the effectiveness of peritendinous autologous blood injections in patients with mid-portion Achilles tendinopathy. A randomised double-blind controlled trial was performed. The intervention consisted of two unguided peritendinous injections with 3 mL of the patient’s whole blood given one month apart. The control group had no substance injected (needling only). Participants in both groups carried out a standardised and monitored 12 week eccentric calf training programme. 1

In total, 53 adults (mean age 49 years, 53% men) were recruited from a sports medicine clinic in New Zealand. Inclusion criteria included age over 18 years and presentation with first episode of mid-portion Achilles tendinopathy. Symptoms had to be present for at least three months, with the diagnosis confirmed by diagnostic ultrasonography.

The primary outcome measure was change in symptoms and function from baseline to six months as assessed by the Victorian Institute of Sport Assessment-Achilles (VISA-A) score. Significant improvements in the VISA-A score were seen at six months in the intervention group (change in score 18.7, 95% confidence interval 12.3 to 25.1) and control group (19.9, 13.6 to 26.2). However, the overall effect of treatment …

Log in using your username and password

BMA Member Log In

If you have a subscription to The BMJ, log in:

Need to activate
Log in via institution
Log in via OpenAthens

Log in through your institution

Subscribe from £184 *.

Subscribe and get access to all BMJ articles, and much more.

* For online subscription

Access this article for 1 day for: £50 / $60/ €56 ( excludes VAT )

You can download a PDF version for your personal record.

Buy this article

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

Advanced Search
Journal List
Sage Choice

Extrapolating Survival from Randomized Trials Using External Data: A Review of Methods

Associated data.

This article describes methods used to estimate parameters governing long-term survival, or times to other events, for health economic models. Specifically, the focus is on methods that combine shorter-term individual-level survival data from randomized trials with longer-term external data, thus using the longer-term data to aid extrapolation of the short-term data. This requires assumptions about how trends in survival for each treatment arm will continue after the follow-up period of the trial. Furthermore, using external data requires assumptions about how survival differs between the populations represented by the trial and external data. Study reports from a national health technology assessment program in the United Kingdom were searched, and the findings were combined with “pearl-growing” searches of the academic literature. We categorized the methods that have been used according to the assumptions they made about how the hazards of death vary between the external and internal data and through time, and we discuss the appropriateness of the assumptions in different circumstances. Modeling choices, parameter estimation, and characterization of uncertainty are discussed, and some suggestions for future research priorities in this area are given.

Models for health economic evaluation typically use observed data from randomized controlled trials (RCTs) comparing survival (or times to other events) between competing alternative interventions. However, the choice of intervention will often affect outcomes over a longer period than the follow-up time of the RCTs. Policy makers responsible for making funding decisions will then require estimates of expected survival for a longer period, and a lifetime horizon is often appropriate. 1 If the observed follow-up time covers a sufficiently large proportion of the overall survival time, then parametric models could be used to extrapolate the observed trends in the hazard of death for each treatment arm. This is the conventional approach to long-term survival estimation in health technology assessments, 2 but it assumes that the observed hazard trends will continue into the long term, which becomes less plausible as the unobserved period increases. The extent of uncertainty surrounding any extrapolation should also be quantified, 1 , 3 and this is difficult to determine from short-term data alone for the same reason.

In general, long-term survival can be reliably estimated only if there are long-term data, since the impact of long-term modeling assumptions on the decision can be substantial. 4 Since maximum follow-up in clinical trials is typically only 1 to 5 y, some external information is required. This could be taken from a disease registry, cohort or the general population, a formally elicited expert belief, or a combination of observed data and informal assumptions. Most simply, the external “information” could consist of a defensible clinical belief that the risks of death will continue in a particular way in the long term. The National Institute for Health and Care Excellence (NICE) for England and Wales 1 recommends that any extrapolation should be assessed by “both clinical and biological plausibility of the inferred outcome as well as its coherence with external data sources,” although it does not suggest specific methods to do this. A number of other national funding agencies have a similar requirement for long-term outcomes predictions. 5 This article discusses methods that have been applied to use external data explicitly to facilitate survival extrapolation, as well as their merits in different circumstances. Below we describe the scope and provide the terminology used throughout the article.

We consider situations where we have both of the following sources of data.

RCTs providing estimates of the relative treatment effect on survival for the patients of interest, with individual-level survival or censoring times available for at least 1 treatment arm (either directly or estimated from published Kaplan-Meier curves 6 ).
Information on longer-term survival from another source, describing a population with some characteristics (to be discussed later) in common with the patients of interest. After some adjustments, these data can be used to estimate the baseline long-term survival of the patients of interest. If any treatments are given, this is unrecorded, so these data give no information about intervention effects.

We assume the trial data are representative of the population for which the decision is required. In practice, however, given the selection criteria of trials, this will not always be strictly true, 7 – 9 which we will briefly discuss at the end of the article.

The data and extrapolation problem are illustrated by the hypothetical survival curves in Figure 1 . Each of the 3 “observed” curves are representative samples of survival from the populations labeled A, B, and C. The population of interest receiving a control intervention is labeled A, the population of interest receiving the intervention of interest is labeled B, and the external population is labeled C. The survivor functions assumed to generate each data set are labeled S A ( t ) , S B ( t ) , and S C ( t ) , respectively. We also define the cumulative hazard H k ( t ) = −log( S k ( t )) and hazard (or mortality) h k ( t ) = d H k ( t )/dt for each group k = A , B , C . The main quantity of interest, the difference in expected survival between interventions, is

An external file that holds a picture, illustration, etc.
Object name is 10.1177_0272989X16639900-fig1.jpg

Example survival data. The aim is to extrapolate the incremental expected survival between interventions (B–A) by using long-term data from an external population (C).

which is illustrated by the shaded area between the 2 curves. The upper limit t max is commonly infinite, giving the lifetime incremental survival.

In the conventional approach, 2 S A ( t ) and S C ( t ) are estimated by parametric models fitted to the A and B data for t < t 1 , which are extrapolated to t > t 1 to obtain the incremental survival, without explicitly considering external long-term validity. Instead, we discuss approaches that combine the information on S B ( t ) and S A ( t ) for t < t 1 , with external information on S C ( t ) for t < t max , through assumptions about:

How survival will differ between the population of interest and the external population. Specifically, how S C ( t ) compares to S B ( t ) and S A ( t ) in the interval t < t 1 may give information about how S C ( t ) compares to the disease population survival after t > t 1 .
How observed survival trends under each intervention will continue in the long term, that is, how S B ( t ) and S A ( t ) for t > t 1 are related to S B ( t ) and S A ( t ) for t < t 1 .

Commonly, instead of using this formula directly to calculate the incremental survival, S A ( t ) and S B ( t ) are used to obtain parameters in state-transition or similar decision-analytic models, which also allow discounted expected costs and quality-adjusted survival to be estimated for each competing alternative. In this article, we focus on how S A ( t ) and S B ( t ) themselves can be estimated using external long-term data and what assumptions are necessary to enable their estimation.

To find methods that have been used for survival extrapolation in cost-effectiveness analysis using external data, we searched the reports of studies carried out under the National Institute of Health Research Health Technology Assessment Programme in the United Kingdom and searched academic literature, focused on health economics and medical statistics journals, using “pearl-growing” search methods. 10 The exact search strategy, and a broad classification of the 38 relevant papers that we found, are given in the online appendix. In this article, we summarize the methods that have been used, discuss their appropriateness in different circumstances, and suggest where further research might be focused.

Potential External Data Sources

The long-term survivor function for the external data source S C ( t ) may be estimated from national administrative data on population survival, disease registries, cohort studies, or elicited expert belief. Typical life-tables published by national statistics authorities provide age, sex, country, year, and cause-specific annual survival probabilities, which can be used to estimate lifetime survival for the general population. External data may also consist of cohorts of patients who are similar to the patients of interest. This could include national or regional registries (such as cancer registries), or hospital-based cohorts including all patients with a particular condition or receiving a particular treatment, from a particular period of time. There may even be data from randomized trials in a similar population with a longer follow-up. The advantages of registry or cohort data compared to unselected national population data are that the patient population may be more representative of the target population, and relevant covariates are more likely to be recorded. However, they may not necessarily have follow-up times covering the whole lifetimes of all participants.

Framework for Survival Extrapolation Using External Data

Figure 2 illustrates the choices that need to be made when using external data for survival extrapolation. The structure is based on our categorization of different methods used in the literature and our judgment of when they are appropriate. Each of the next few sections of the article discusses a different portion of the figure in detail. Here, we give a brief overview.

An external file that holds a picture, illustration, etc.
Object name is 10.1177_0272989X16639900-fig2.jpg

Framework of model choices for survival extrapolation using external data. Long-term survival S for control and treatment groups A and B is estimated via assumptions about equivalence of hazards h between populations A, B, and C.

First, researchers should identify if the external population (C) has the same mortality at all times, or at least in the long term, as that of the disease population receiving a control intervention (A, top-left panel) and the disease population receiving the intervention of interest (B, top-right panel). In this case, the data can then be used directly to estimate each S k ( t ) without adjustment.

Otherwise, the long-term mortality of populations A and C (and/or B and C) is assumed to be different but is systematically similar in such a way that the external data (C) can be adjusted to estimate the long-term mortality for the target population with the disease (A or B). The assumptions that have been used to do this are represented by the large middle panel of the figure.

Once any systematic similarity between the internal and external data has been characterized, completing the analysis requires a choice of the functional form for each of the S k ( t ) , potential covariate or subgroup adjustment, parameter estimation, uncertainty, and sensitivity analysis. These issues are discussed later. Some suggestions for future research priorities are made, concentrating on how uncertainty about assumptions is represented and the role of “soft” or elicited information.

Difference in Mortality Between the Disease and External Populations

Disease and external populations have the same mortality at all times.

Sometimes, the disease or baseline intervention of interest is not expected to affect mortality; for example, it may affect only quality of life. Then, long-term survival of the patients of interest can be assumed to be the same as that of the national population of a similar age and sex distribution and taken directly from the relevant life-table. 11 , 12

This assumption may also hold if the disease or baseline intervention affects mortality, but the external data come from a disease registry or cohort of patients having the same disease and/or intervention, so that the survival of the control group in the trial data is the same as that of the external population. 13 – 19

Disease and External Populations Have the Same Mortality after Some Time

In other cases, the disease population may have a higher initial mortality than does the general population, but this decreases until at some time (after t = t c , say) its death rate converges to the mortality of that of the general population 20 – 29 ( Figure 3 , top left).

An external file that holds a picture, illustration, etc.
Object name is 10.1177_0272989X16639900-fig3.jpg

Example hazards for disease and external populations as functions of time, under 4 different assumptions about how the disease population hazards relate to the external population hazards.

If t c ≤ t 1 , where t 1 is the follow-up time of the RCT, survival for t ≤ t c and t > t c can be taken directly from the trial data and the life-table data, respectively. Otherwise, if t c > t 1 , short-term extrapolations from parametric models fitted to the individual-level data from the RCT might be used to estimate the survival probability between t 1 and t c . 25 , 29 , 30 If the hazard is decreasing in the short term, extrapolating directly from a parametric model might then lead to hazards that are lower than those of the age/sex-matched general population, which is assumed to be implausible; therefore, using the life-table data is more appropriate. t c is sometimes interpreted as a “cure” time, so that all patients who survive this long are assumed to be “cured” and to have mortality equivalent to that of the general population. Messori and Trippoli 27 also suggested that a compromise between “cured” population survival and “uncured” extrapolated survival might sometimes be appropriate—see the models originating from Boag, 31 discussed later in this article, for examples.

Disease and External Populations Have Different Mortality in the Short and Long Term

If the mortality of patients with the disease is different from that of the population represented by the external data at all times t < t max , then extrapolation might be achieved by adjusting the external evidence to make it more representative of the target population. This requires an assumption that mortality is systematically different between the populations in the long term, in a way that can be determined from the short-term data or informal beliefs. For example, there may be proportional or additive hazards for all-cause or cause-specific mortality between the disease and external populations. These assumptions are discussed in detail later.

Difference in Mortality Between the Treatment and Control Populations

A similar decision should be made about the difference in mortality between the intervention and control groups (B and A, respectively). If the intervention is not expected to affect mortality (e.g., if it affects only quality of life), then S B ( t ) can be assumed to equal S A ( t ) for all times. If the relative intervention effect is expected to diminish to null soon after the end t 1 of the trial data, then h B ( t ) can be assumed to equal h A ( t ) in the long term, and it is sufficient to estimate h A ( t ) .

S B ( t ) could then be estimated by combining a published relative treatment effect from trials, 32 , 33 with the extrapolated S A ( t ) . The assumptions required to do this are analogous to those required to extrapolate differences between the disease and external populations; typically, the hazard ratio between treatment groups for all-cause or cause-specific mortality might be assumed to be constant in perpetuity. Or, if individual data are available for the intervention as well as for the control arm of the trial, S B ( t ) could be produced independently of S A ( t ) by using external data and a similar method to that used to estimate S A ( t ) . Even without external data, S B ( t ) and S A ( t ) are commonly estimated independently, by parametric extrapolation. 2 This still assumes implicitly that the short-term differences between the treatment groups are representative of the long term. Bagust and Beale 30 discuss how knowledge of the treatment’s mechanism of action might be used to guide long-term estimation; for example, the effects of a drug might take some time to achieve after starting treatment and dissipate gradually when treatment stops.

The assumption about how the relative treatment effect is likely to change as t increases from t 1 , the end of trial follow-up, to the time horizon for the decision model is likely to be an important driver of which intervention is preferred. 34 It is therefore important to consider uncertainty about this assumption. The fundamental problem is that information about this effect is available only in the trial data, not in the long-term data C. NICE 1 recommends that 3 alternative scenarios be considered, corresponding to pessimistic, optimistic, and compromise assumptions about the long-term effect of a treatment that is effective in the short term. For example, expressing the effect as a hazard ratio h B ( t )/ h A ( t ) , the effect for t > t 1 could be

null, so that h B ( t )/ h A ( t ) = 1 for t > t 1 ;
the same as in the short term, thus h B ( t )/ h A ( t ) = exp( β ) , assumed constant for all t ; or
diminishing in the long term, thus h B ( t )/ h A ( t ) is increasing from exp( β ) to 1.

Beyond informal sensitivity analysis, we did not find any literature where external information, such as elicited beliefs or the effects of related treatments with longer follow-up, was used formally to quantify future changes in expected treatment effects on survival.

Adjusting External Data to Represent the Population of Interest

If patients with the disease (under either intervention) and the external population have different long-term mortality, then one of the following assumptions might be used to estimate S A ( t ) by adjusting the long-term external data, and similar methods might be used to estimate S B ( t ) .

Proportional Hazards for All-Cause Mortality between the Disease and External Populations

Several authors 35 – 37 obtained cause-specific mortalities h A ( t ) by multiplying those estimated from life-tables h C ( t ) by a constant hazard ratio obtained from literature or literature combined with expert belief. 38 These studies assumed proportional hazards; that is, the hazard ratio between the disease-specific and general populations is constant over time ( Figure 3 , top right).

This is sometimes implemented approximately by assuming the probabilities of death over a short period of time (e.g., the cycle length of a state-transition model) are proportional, instead of the hazards (the instantaneous rates of death, which are not probabilities 39 ). Instead of taking the hazard ratio from the literature, Demiris and Sharples 40 estimated it using a joint statistical model for the disease-specific and external data.

Proportional Cause-Specific Mortality

The proportional hazards assumption can be convenient since comparisons of mortality between groups are often published as hazard ratios. However, all-cause mortality may not be proportional. For example, consider the causes of death that contribute to overall mortality. Let h A ( t ) = h AD ( t ) + h AO ( t ) , where h AD ( t ) is the hazard for disease-related mortality, and h AO ( t ) is the hazard for mortality from all other causes in population A. Similar notation is used for populations B and C. Mortality from causes unrelated to the disease of interest can typically be assumed to be the same between patients with the disease and the external population, so that

Mortality for disease-related causes is typically higher. Suppose the hazards for disease-related mortality are proportional, so that h A ( t ) = γ h CD ( t ) + h O ( t ) ( Figure 3 , bottom right). This is equivalent only to an all-cause proportional hazards model h A ( t ) = δ h C ( t ) = δ ( h CD ( t ) + h O ( t )) if h CD ( t )/ h O ( t ) is independent of time. In other words, assuming proportional all-cause hazards would be valid only if disease-related mortality were a constant proportion of the overall mortality in the external population as time elapses. Benaglia and others 41 estimated the likely extent of bias in various situations when this assumption is wrongly applied.

To implement a proportional cause-specific hazards model, estimates of h CD ( t ) and h O ( t ) can often be obtained from cause-specific population mortality rates published by national agencies. As with the all-cause hazard ratio, the cause-specific hazard ratio γ for disease populations relative to the external population might be obtained from the literature or estimated from short-term comparisons between internal and external data. 42 – 44 The cause-specific hazard for the intervention group h BD ( t ) can be estimated similarly by multiplying h AD ( t ) by a published constant treatment-specific hazard ratio, representing the effect of the intervention on cause-specific mortality. This supposes, however, that the causes of death targeted by the intervention are the same as the causes that distinguish the disease population from the general population, which may need to be investigated. 41

In Benaglia and others, 41 cause-specific death rates were published in the population life-tables; thus, h CD ( t ) and h O ( t ) could be obtained easily. However, they were not published in the disease-specific individual-level survival data A. To overcome this and estimate γ , since the overall hazard for the disease population is defined as h A ( t ) = h AD ( t ) + h O ( t ) , a poly-hazard model 45 could be applied, which decomposes the hazard for all-cause mortality as the sum of cause-specific hazards. Specifically, a poly-Weibull model was used for the internal data A, where the cause-specific hazards are both Weibull, and Weibull models were simultaneously applied to the external data. The common other-cause hazard assumption and proportional cause-specific hazard assumption then enabled the parameters of all hazard functions to be estimated through a joint model for populations A and C. This model implicitly assumes that the disease has no effect on hazards that have not been defined as disease-related in the external data, which cannot be tested unless deaths occurring in the internal trial patients also have the cause of death recorded.

A related method, originating from Boag, 31 assumes a certain proportion of patients are cured and estimates a parametric survival function for the noncured patients. The cure fraction and the parameters of the noncured survival function are estimated jointly from individual data on survival and disease status. Hisashige and others 26 and Maetani and others 46 used this approach to obtain a disease-related survival curve S AD ( t ) for the patients of interest, assuming that the noncured and cured survivor functions correspond to disease-related and disease-unrelated survival, respectively. A disease-unrelated survivor function S CO ( t ) is obtained from age- and sex-matched life-table data. The overall extrapolated survivor function is then calculated as the product of the disease-related and unrelated survival, assuming equivalency to the above assumption of proportional cause-specific and identical other-cause hazards:

Additive Hazards for All-Cause Mortality between the Disease and External Populations

Instead of a constant risk ratio between internal and external data sources, some authors 47 – 49 have assumed that the disease-specific population had a constant additive excess hazard compared to the general population ( Figure 3 , bottom left).

Under this assumption, it can be shown 47 that logit( S A ( t )/ S C ( t )) converges to a linear function as t increases. Thus, the slope of a linear regression fitted to the latter part of observed data on logit( S A ( t )/ S C ( t )) for t < t 1 gives an estimate of − α . Extrapolations of S A ( t ) for t > t 1 can then be calculated given the estimate of α . Demiris and Sharples 40 also investigated additive hazard models within a Bayesian framework. An advantage of additive hazards is that cause-specific modeling is less important. If disease-related hazards are additive, so that h AD ( t ) = h CD ( t ) + α and then h A ( t ) = h CD ( t ) + α + h O ( t ) = h C ( t ) + α , so the additive all-cause hazard model also holds, and the cause-specific risk difference α is equal to the all-cause risk difference h A ( t )− h C ( t ) . The risk difference (or excess risk) is straightforward to interpret, and under the additive hazard model, it is independent of time. A proportional hazards model, however, is multiplicative, so that the excess risk depends on the baseline risk. Informally, the disease has a greater effect on people (such as older people) who are already at a higher risk of death, which is typical for a chronic disease.

The short-term fit of either the proportional or additive hazards assumption can be checked from the data by diagnostic plots 2 , 30 or by embedding in a model that contains both as special cases, as discussed by Breslow and Day. 50 The assumptions required to apply either in the long term, however, are untestable from data.

Other Models for Parameterizing Mortality Differences between Populations

Other ways of parameterizing difference in survival between groups include accelerated failure time models, in which S A ( t ) = S C ( δ t ) , so that the expected survival time in group C is δ times the expected survival time in group A, although we are unaware of these having been used in the context of survival extrapolation with external data. Nonproportional, nonadditive hazard models might also be used where the hazard ratio or excess hazard is a predictably varying function of time. For example, Andersson and others 51 extrapolated survival of cancer patients by combining cancer cohort and life-table data and modeling the log cumulative excess hazard for cancer patients as a cubic spline function of log time, 52 assuming a linear trend in the long term.

Survival Model Choice When Combining Internal and External Data

To complete the estimation and to characterize the long-term differences between the disease and external population survival S A ( t ) and S C ( t ) as well as between the treatment and control survival S B ( t ) and S A ( t ) , the form of each survival function needs to be specified.

Without external data, extrapolation of S A ( t ) and/or S B ( t ) is typically 2 based on a parametric functional form for each survival curve. With external data, a parametric function could be specified for S C ( t ) and fitted to the external data and assumptions such as proportional hazards used to derive S A ( t ) and S B ( t ) . To convert annual probabilities of death published in life-tables to individual-level survival times, which allows a survival model to be fitted, several authors 40 , 41 , 47 , 48 have used simulation.

Alternatively, survival extrapolation can be performed semiparametrically with external data if these are available up to t = t max and if a systematic difference between the external and internal populations can be assumed, such as proportional or additive hazards. 40 , 47 , 48 This has the advantage of avoiding the risk of misspecifying the baseline survival function. Fang and others 47 used semiparametric models, which gave plausible estimates where even a 3-parameter generalized gamma model did not. A hybrid approach is also possible, using nonparametric estimates up to some t * < t 1 and parametric assumptions to extrapolate, 30 although the results can be sensitive to the arbitrary choice of t * . 53

However, if the parametric form fits well, then fully parametric models can lead to greater precision in estimates. 54 The advantages of parametric and semiparametric models are combined in a class of flexible parametric models based on modeling the log hazard as a spline, or piecewise cubic, function of log time, 52 , 55 which can adapt to represent survival arbitrarily well. Since these models are fully parametric, they enable extrapolation beyond the times observed in the data. 56 The spline function is defined to be smooth, and given a particular number of pieces, results have been shown to be not sensitive to the choice of where to subdivide the log time axis. 55 Therefore, we would expect extrapolations from this model to be more robust than those from the “hybrid” approach mentioned above. Guyot and others 56 used these models, implemented in the BUGS software, 57 for survival extrapolation using a combination of trial and long-term external data. They can also be fitted to single survival data sets using Stata 58 and R. 59 Also, unlike the Cox model, they permit nonproportional hazards to be modeled 52 and extrapolated if necessary. 51

The choice between alternative parametric models for extrapolation is conventionally based on fit to the short-term data A, B. 2 However, as recommended, for example, in the NICE guidelines, 1 long-term plausibility should be considered based on external information such as knowledge of the disease, treatment and trial protocol, 30 or related long-term survival data. External information could simply be used to inform the choice of model for extrapolation or to inform particular parameters of a chosen model. A plausible distribution might be chosen to represent how the hazard of death is expected to change over time. For example, the exponential distribution corresponds to a constant hazard, which is generally unrealistic in the long term as the hazard will increase as people get older. Therefore, even though data might suggest a constant hazard over the duration of the RCT, distributions that allow changes in hazards over time are likely to be more appropriate. Bagust and Beale 30 also discuss how the apparent better fit of some parametric models may be an artifact of between-patient heterogeneity; for example, a Weibull distribution with shape less than 1 could be explained by a mixture of 2 subpopulations with different constant hazards.

Once the most appropriate model family has been chosen, its parameters can be estimated; this might be done using a combination of disease-specific data A and external evidence C. For example, Nelson and others 60 used a 2-parameter Gompertz model, which has an exponentially increasing hazard, to extrapolate survival beyond the follow-up of an RCT. The parameter governing the baseline hazard was estimated using disease-specific data, and the hazard “acceleration” parameter was estimated from national population life-tables including older people.

When long-term data are not available or sparse, expert belief about long-term survival might be elicited to either choose the parametric form or estimate particular parameters, as we discuss later.

Explaining Population Differences Through Observed Covariates

Under models such as the proportional or additive hazards specifications described above, the long-term difference between the populations underlying the trial and external data is characterized by a parameter such as the all-cause or cause-specific hazard ratio δ or γ or risk difference α . This is sufficient to estimate long-term survival of the trial population if the model assumptions hold. However, we may also want to explain this difference in terms of the characteristics of the people represented, for example, to estimate survival for subgroups of the population with certain characteristics. This is possible if relevant covariates are recorded in each source of evidence. Nelson and others, 60 for example, used a proportional hazards model in which the log hazard ratio for all-cause mortality is a linear function of the covariates that distinguish the data sets. The covariate effects were estimated using a semiparametric model fitted to the long-term external data, to obtain an expression for survival S ( t , x , β ) as a function of covariate values x and covariate effects β . The survival for group A, S A ( t ) , was estimated for all t by averaging S ( t , x , β ) over all covariate values x observed in the data A. This approach assumes that the form of the relationship with covariates is the same between populations A and C, which may not be true. For example, the relationship of the log hazard of death with age may be linear among younger people but nonlinear among older people.

It is common to assume that the increase in the hazard of death as a person gets older is fully explained by his or her increasing age. Thus, survival extrapolations often rely principally on modeling how the hazard increases with age. Population-based data commonly cover a wide range of ages and calendar periods. To exploit this diversity, Nelson and others 60 fitted joint models to a combination of RCT and cohort data in an age metric, where the t in S A ( t ) and S C ( t ) represents age rather than time since diagnosis or randomization to treatment. This assumes that hazards change through time only with increasing age, although the shape of this dependence was modeled nonparametrically, with no further distributional assumptions.

Without long-term follow-up data, age effects on mortality could be estimated from shorter-term data on individuals with widely varying ages at baseline. Speight and others 13 estimated long-term cancer survival using registry data in this way. The (within-person) increase in the risk of death as a person gets older was assumed to equal the risk ratio between people with different baseline ages.

Representing Uncertainty and Parameter Estimation

It is important to characterize uncertainty in all model inputs and “structural” model choices 3 in order to determine the uncertainty surrounding the treatment decision and assess the value of further research. In the presence of substantial decision uncertainty, the treatment might be recommended for use only in research or with otherwise limited coverage. 61 If parameters used to extrapolate survival are estimated from data, the uncertainty inherent in estimating them can be handled by probabilistic methods. For example, in Fang and others, 47 uncertainty about the estimation of the hazard increment β was propagated through the model to the estimated survival curve by bootstrapping. Alternatively, beliefs about β could be represented by a probability distribution in a standard probabilistic sensitivity analysis. Uncertainty about the choice of parametric model can be represented by choosing a sufficiently flexible model form, such as a spline-based or generalized gamma distribution, 56 and, if the level of flexibility required is uncertain and different plausible models give different results, using model averaging. 62

Bayesian methods are particularly suited to combining evidence from different sources in a model. 63 The process involves defining a joint model with shared parameters representing the aspects that the different sources of data have in common (e.g., mortality for causes other than the disease of interest) and different parameters for the parts where they are expected to differ (e.g., cause-specific mortality). The posterior distributions of model outputs (such as incremental expected survival) are estimated simultaneously conditional on all data, and the uncertainty about the model inputs is propagated to the outputs. This approach has been used for combining data in the context of survival extrapolation, 40 , 41 , 56 , 64 as well as in many other decision modeling contexts. 65 , 66 External aggregate data or expert beliefs and associated uncertainty can be included as prior distributions, for example, published hazard ratios obtained from meta-analysis. 41

A potentially more important uncertainty may arise in how the differences between the external and internal data are modeled—in other words, whether assumptions, such as those set out in this article, are valid in the long term. This is more problematic to identify from data; therefore, elicited beliefs might be used instead.

Using Elicited Beliefs in Survival Extrapolation

Expert elicitation has been used to estimate uncertain quantities in health economic models, 67 , 68 although we are unaware of this approach having been used in survival extrapolation. Here, we discuss the potential and challenges.

For example, beliefs about long-term survival might be elicited directly. Suppose that expert belief suggested that the 5-y survival probability, S (5| λ ) , (assuming t 1 < 5 ) was most likely to be around 0.2 but could be as high as 0.3 or as low as 0.1. Assuming an exponential survival model, this belief about S (5| λ ) = exp(−5 λ ) could be translated to a prior distribution for the rate λ = −log( S (5| λ ))/5 . Bayesian inference could then be used to combine this prior for long-term survival with the survival data for t < t 1 . More complex and realistic parametric models would be more challenging. For example, in a Weibull model, eliciting expected survival S ( t | α , λ ) = exp(− λ t α ) could provide a distribution for λ t α , but extra assumptions would be needed to obtain separate priors for λ and α . To our knowledge, there has been no investigation of this. Survival estimates would need to be elicited at multiple time points to provide information about multiple parameters or to suggest an appropriate distributional form. Quantities are most easily elicited if expressed on an interpretable scale. 69 Here, that could be the expected number out of 100 patients who will survive 5 y and 10 y, but it may be difficult to convert such information to priors for parameters. Expressing the elicited information as an artificial extra data set, 70 then using standard methods to analyze the original data augmented with the additional data, may be a useful technique to investigate.

If some of the assumptions used to extrapolate are uncertain, then sensitivity analysis should be performed. The most basic form of sensitivity analysis is to present results under alternative scenarios and assumptions; however, scenario analyses can be difficult to interpret. Instead, the model might be extended by adding extra parameters representing these uncertain features, with prior distributions elicited from experts, then observing how the results are affected. 71 For example, to assess the assumption of a constant hazard ratio between treatment groups, the treatment effect in the extrapolated period could be represented by a parametrically decreasing function of time, and plausible values for the parameter(s) could be elicited. This allows the associated decision uncertainty to be formally quantified and “value of information” methods used to determine whether it is worth doing further research to assess the assumption. 72 Even without elicited information, informal beliefs could be used to demonstrate, for example, that the decision about which treatment would be preferred is robust within a plausible range of assumptions about some parameter. This might involve showing that the cumulative incremental net benefit of the intervention of interest is unlikely to cross the decision threshold in the period of time being extrapolated over. 73

More research and experience are needed on the accuracy (and cost) of different methods to elicit uncertain quantities, ways to combine beliefs of different experts, what quantities should be elicited in this context, how best to use elicited information in models, and how the results can be communicated to decision makers.

Summary and Research Priorities

Survival extrapolation given short-term data is a challenging task, involving prediction of data that have not been observed. Data on a related long-term population can often be exploited, but the necessary assumptions about how the populations differ, and how short-term trends might continue into the long term, must be clearly expressed and examined for plausibility and consistency with external data. This article reviews typical assumptions that might be made. However, we may sometimes not be confident in making any of these assumptions—it may be unclear whether the external data are relevant or how to explain differences between the data sets. The information required to adjust the external population to represent the internal population may not be available, for example, a marker of disease severity. In those cases, careful sensitivity analysis and characterization of uncertainty will be important. Since long-term assumptions, such as proportional hazards, are untestable from data, they should be clearly explained and justified to decision makers. More experience is needed in situations where neither proportional nor additive hazards assumptions are appropriate to distinguish the external and disease populations, and similarly when the treatment effect or other key parameters are not constant or otherwise predictable in the long term. Important open questions concern how “soft” information, such as formally elicited beliefs or the analyst’s own assumed distribution for uncertain quantities, can be obtained and used in modeling. Finally, we assumed that the trial data are representative of the target population that will ultimately receive the treatments of interest. This is not always true given the selection criteria of trials, although is more plausible for the phase III, pragmatic trials that typically inform cost-effectiveness models. Various authors 7 – 9 have suggested methods and conditions for using external evidence to adjust the treatment effect from a trial to obtain the effect in an overlapping but nonidentical population. The covariate adjustment methods we discussed may also be used to explain differences in baseline survival between populations, if the relevant covariates are recorded.

Supplementary Material

Acknowledgments.

Thanks to the rest of the project team, including Alan Brennan, Patrick Fitzgerald, Miqdad Asaria, Ronan Mahon, and Steve Palmer.

This work was supported by the Medical Research Council, grant code G0902159 (“Methods of Extrapolating RCT Evidence for Economic Evaluation”). The first author also acknowledges support from Medical Research Council grant code U015232027.

Supplementary material for this article is available on the Medical Decision Making Web site at http://journals.sagepub.com/home/mdm .

With a $2.7 million grant from the National Institutes of Health, researchers from the Keck School of Medicine of USC will lead a multi-site study to understand and optimize brain development in children with type 1 diabetes. (Photo/Adobe Stock

USC launches large-scale nationwide study of type 1 diabetes and brain development

A new large-scale longitudinal study, led by the Keck School of Medicine of USC, will unite 12 research centers across the United States to explore how type 1 diabetes affects children during a window of time known to be critical for healthy brain development.

Ultimately, the findings could help refine clinical guidelines for managing type 1 diabetes, including what glucose levels are safest in terms of healthy brain development. The study could also aid in the creation of targeted treatments for the condition, including changes to sleep, diet, and physical activity that can help specific patients.

The five-year study is supported by a grant of more than $2.7 million from the National Institutes of Health.

To read the full story, click here .

Share This Story, Choose Your Platform!

Recent News
News Archive
For the Media
Issue Archive

Identifying and characterizing extrapolation in multivariate response data

Contributed equally to this work with: Meridith L. Bartley, Ephraim M. Hanks, Erin M. Schliep, Patricia A. Soranno, Tyler Wagner

Roles Conceptualization, Formal analysis, Methodology, Visualization, Writing – original draft

* E-mail: [email protected]

Affiliation Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America

Roles Conceptualization, Supervision, Writing – review & editing

Roles Conceptualization, Writing – review & editing

Affiliation Department of Statistics, University of Missouri, Columbia, Missouri, United States of America

Roles Data curation, Funding acquisition, Project administration, Writing – review & editing

Affiliation Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, United States of America

Affiliation U.S. Geological Survey, Pennsylvania Cooperative Fish and Wildlife Research Unit, Pennsylvania State University, University Park, Pennsylvania, United States of America

Meridith L. Bartley,
Ephraim M. Hanks,
Erin M. Schliep,
Patricia A. Soranno,
Tyler Wagner

Published: December 5, 2019
https://doi.org/10.1371/journal.pone.0225715
Reader Comments

Faced with limitations in data availability, funding, and time constraints, ecologists are often tasked with making predictions beyond the range of their data. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods are not directly applicable to multivariate response data, which are common in ecological investigations. In this paper, we extend previous work that identified extrapolation by applying the predictive variance from the univariate setting to the multivariate case. We propose using the trace or determinant of the predictive variance matrix to obtain a scalar value measure that, when paired with a selected cutoff value, allows for delineation between prediction and extrapolation. We illustrate our approach through an analysis of jointly modeled lake nutrients and indicators of algal biomass and water clarity in over 7000 inland lakes from across the Northeast and Mid-west US. In addition, we outline novel exploratory approaches for identifying regions of covariate space where extrapolation is more likely to occur using classification and regression trees. The use of our Multivariate Predictive Variance (MVPV) measures and multiple cutoff values when exploring the validity of predictions made from multivariate statistical models can help guide ecological inferences.

Citation: Bartley ML, Hanks EM, Schliep EM, Soranno PA, Wagner T (2019) Identifying and characterizing extrapolation in multivariate response data. PLoS ONE 14(12): e0225715. https://doi.org/10.1371/journal.pone.0225715

Editor: Bryan C. Daniels, Arizona State University & Santa Fe Institute, UNITED STATES

Received: May 7, 2019; Accepted: November 10, 2019; Published: December 5, 2019

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: All data and R code used in our analysis have been compiled into a GitHub repository, https://github.com/MLBartley/MV_extrapolation . Current releases are available at https://github.com/MLBartley/MV_extrapolation/releases , and a static version of the package has been publicly archived via Zenodo (DOI: 10.5281/zenodo.3523116 ). All data used and created in this analysis are archived via figshare (DOI: 10.6084/m9.figshare.10093460 ).

Funding: Funding was provided by the US NSF Macrosystems Biology Program grants, DEB-1638679; DEB-1638550, DEB-1638539, DEB-1638554 (EH, PS, TW and ES, https://www.nsf.gov/funding/pgm%20summ.jsp?pims%20id=503425 ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The use of ecological modeling to translate observable patterns in nature into quantitative predictions is vital for scientific understanding, policy making, and ecosystem management. However, generating valid predictions requires robust information across a well-sampled system which is not always feasible given constraints in gathering and accessing data. Extrapolation is defined as a prediction from a model that is a projection, extension, or expansion of an estimated model (e.g. regression equation, or Bayesian hierarchical model) beyond the range of the data set used to fit that model [ 1 ]. When we use a model fit on available data to predict a value or values at a new location, it is important to consider how dissimilar this new observation is to previously observed values. If some or many covariate values of this new point are dissimilar enough from those used when the model was fitted (i.e. either because they are outside the range of individual covariates or because they are a novel combination of covariates) predictions at this point may be unreliable. Fig 1 , adapted from work by Filstrup et al. [ 2 ], illustrates this risk with a simple linear regression between the log transformed measurements of total phosphorous (TP) and chlorophyll a (Chl a) in U.S. lakes. The data shown in blue were used to fit a linear model with the estimated regression line shown in the same color. While the selected range of data may be reasonably approximated with a linear model, the linear trend does not extend into more extreme values, and thus our model and predictions are no longer appropriate.

PPT PowerPoint slide
PNG larger image
TIFF original image

A 95% confidence interval of the mean is included around the regression line. Dashed red lines represents the 95% prediction interval. Areas shaded in darker grey indicate regions of extrapolation (using the maximum leverage value ( h ii ) to identify the boundaries).

https://doi.org/10.1371/journal.pone.0225715.g001

While ecologists and other scientists know the risks associated with extrapolating beyond the range of their data, they are often tasked with making predictions beyond the range of the available data in efforts to understand processes at broad scales, or to make predictions about the effects of different policies or management actions in new locations. Forbes and Carlow [ 3 ] discuss the double-edged sword of supporting cost-effective progress while exhibiting caution for potential misleading results that would hinder environmental protections. They outline the need for extrapolation to balance these goals in ecological risk assessment. Other works [ 4 – 6 ] explore strategies for addressing the problem of ecological extrapolation, often in space and time, across applications in management tools and estimation practices. Previous work on identifying extrapolation includes Cook’s early work on detecting outliers within a simple linear regression setting [ 7 ] and recent extensions to GLMs and similar models by Conn et al. [ 8 ]. The work of Conn et al. defines extrapolation as making predictions that occur outside of a generalized independent variable hull (gIVH), defined by the estimated predictive variance of the mean at observed data points. This definition allows for predictions to be either interpolations (inside the hull) or extrapolations (outside the hull).

However, the work of Conn et al. [ 8 ] is restricted to univariate response data, which does not allow for the application of these methods to multivariate response models. This is an important limitation because many ecological and environmental research problems are inherently multivariate in nature. Elith and Leathwick [ 9 ] note the need for additional extrapolation assessments of fit in the context of using species distribution models (SDMs) for forecasting across different spatial and temporal scales. Mesgaran et al. [ 10 ] developed a new tool for identifying extrapolation using the Mahalanobis distance to detect and quantify the degree of dissimilarity for points either outside the univariate range or forming novel combinations of covariates.

In our paper, we present a general framework for quantifying and evaluating extrapolation in multivariate response models that can be applied to a broad class of problems. Our approach may be succinctly summarized as follows:

Fit an appropriate model to available multi-response data.
Choose a numeric measure associated with extrapolation that provides a scalar value in a multivariate setting.
Choose a cutoff or range of cutoffs for extrapolation/interpolation.
Given a cutoff, identify locations that are extrapolations.
Explore where extrapolations occur. Use this knowledge to help inform future analyses and predictions.

We draw on extensive tools for measures of leverage and influential points to inform decisions of a cutoff between extrapolation and interpolation. We illustrate our framework through an application of this approach on jointly modeled lake nutrients, productivity, and water clarity variables in over 7000 inland lakes from across the Northeast and Mid-west US.

Predicting lake nutrient and productivity variables

Inland lake ecosystems are threatened by cultural eutrophication, with excess nutrients such as nitrogen (N) and phosphorus (P) resulting in poor water quality, harmful algal blooms, and negative impacts to higher trophic levels [ 11 ]. Inland lakes are also critical components in the global carbon (C) cycle [ 12 ]. Understanding the water quality in lakes allows for informed ecosystem management and better predictions of the ecological impacts of environmental change. Water quality measurements are collected regularly by federal, state, local, and tribal governments, as well as citizen-science groups trained to sample water quality.

The LAGOS-NE database is a multi-scaled geospatial and temporal database for thousands of inland lakes in 17 of the most lake-rich states in the eastern Mid-west and the Northeast of the continental United States [ 13 ]. This database includes a variety of water quality measurements and variables that describe a lake’s ecological context at multiple scales and across multiple dimensions (such as hydrology, geology, land use, and climate).

Wagner and Schliep [ 14 ] jointly modelled lake nutrient, productivity, and clarity variables and found strong evidence these nutrient-productivity variables are dependent. They also found that predictive performance was greatly enhanced by explicitly accounting for the multivariate nature of these data. Filstrup et al. [ 2 ] more closely examined the relationship between Chl a and TP and found nonlinear models fit the data better than a log-linear model. Most notably for this work, the relationship of these variables differ in the extreme values of the observed ranges; while a linear model may work for a moderate range of these data it is imperative that caution is shown before extending results to more extreme values (i.e., to extremely nutrient-poor or nutrient-rich lakes).

In this study, following Wagner and Schliep, we consider four variables: total phosphorous (TP), total nitrogen (TN), Chl a, and Secchi disk depth (Secchi) as joint response variables of interests. Each lake may have observations for all four of these variables, or only a subset. Fig 2 shows response variable availability (fully observed, partially observed, or missing) for each lake in the data set. A partially observed set of response variables for a lake indicates that at least one, but not all, of the water quality measures were sampled. We consider several covariates at the individual lake and watershed scales as explanatory variables including maximum depth (m), mean base flow (%), mean runoff (mm/yr), road density (km/ha), elevation (m), stream density (km/ha), the ratio of watershed area to lake area, and the proportion of forested and agricultural land in each lake’s watershed. One goal among many for developing this joint model is to be able to predict TN concentrations for all lakes across this region, and eventually the entire continental US. Our objective is to identify and characterize when predictions of these multivariate lake variables are extrapolations. To this end, we will review and develop methods for identifying and characterizing extrapolation in multivariate settings.

Left: map of inland lake locations with full, partial, or missing response variables. Missing response variables are lakes where all water quality measures have not been observed, while partial status indicates only some lake response variables are unobserved. Covariates were quantified for all locations. Right: subset of data status (observed or missing) for each response variable. All spatial plots in this paper were created using the Maps package [ 15 ] in R to provide outline of US states.

https://doi.org/10.1371/journal.pone.0225715.g002

Materials and methods

Review of current work, cook’s independent variable hull..

This definition remains useful without any underlying distributional assumption of the data. For example, empirically obtained quantile cutoff values can serve reasonably well as threshold for declaring outliers. However, for multivariate-normal data, the squared MD can be transformed into probabilities using a chi-squared cumulative probability distribution [ 17 ] such that points that have a very high probability of not belonging to the distribution could be classified as outliers. In either scenario, outliers can be detected using only predictor variables by calculating x 0 ( X ′ X ) −1 x 0 and comparing with max(diag( X ( X ′ X ) −1 X )).

Conn’s generalized IVH.

Prediction variance.

Extension to the multivariate case

Predictions of different response types covary in multivariate models, complicating our definition of a gIVH (see Eq 11 ) which relies on finding a maximum univariate value. Where a univariate model would yield a scalar prediction variance ( Eq 18 ), a multivariate model will have a prediction covariance matrix. We propose capturing the size of a covariance matrix using univariate measures. Note this is similar to A-optimality and D-optimality criteria used in experimental design [ 19 ].

Further, using our novel numeric measure of extrapolation, we aim to take advantage of the multivariate response variable information to explore when we may identify an additional observation’s (i.e. covariates for a new lake location) predictions as extrapolations for all response values, jointly. We also present an approach to identify when we cannot trust a prediction for only a single response variable at either a new lake location, or a currently partially-sampled lake. The latter identification would be useful for a range of applications in ecology. For example, in the inland lakes project, one important goal is to predict TN because this essential nutrient is not well-sampled across the study extent, and yet is important for understanding nutrient dynamics and for informing eutrophication management strategies for inland lakes. In this case, to accommodate TN not being observed (i.e. sampled) as often as some other water quality variables, we can leverage the knowledge gained from samples of other water quality measures taken more often than TN (e.g. Secchi disk depth [ 20 ] is a common measure of water clarity obtained on site, while other water quality measurements require samples to be sent to a lab for analysis). We first outline our approach for identifying extrapolated new observations using a measure of predictive variance for lakes that have been fully or partially sampled and used to fit a model. Then, we describe how this approach can be applied to the prediction of TN in lakes for which it has not been sampled.

Multivariate extrapolation measures.

The trace (tr) of an n × n square matrix V is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right). This does not take into account the correlation between variables and is not a scale-invariant measure. As the response variables for the inland lakes example are log transformed, we chose to explore the use of this measure for obtaining a scalar value extrapolation measure. The determinant (D) takes into account the correlations among pairs of variables and is scale-invariant. In this paper, we explore both approaches by quantifying extrapolation using our multivariate model of the LAGOS-NE lake data set by:

Finding the joint posterior distribution B , Σ | Y
Calculating the posterior predictive variance at in-sample lakes
Calculating the posterior predictive variance at out of sample lakes
Identifying extrapolations by comparing out of sample MVPV values to a cutoff value chosen using the in-sample values.

Conditional single variable extrapolation measures.

The chosen numeric measure of MV extrapolation includes information from the entire set of responses. In the inland lake example, this could be used to identify unsampled lakes where prediction of the whole vector of response variables (TN, TP, Chl a, Secchi) are extrapolations. However, even when a joint model is appropriate, there are important scientific questions that can be answered with prediction of a single variable.

Any of the four response variables may be considered to be variable 1 and so this general partition approach may be used for any variable conditioned on all others. The values of μ − ni and Σ are determined by the availability of data for the three variables we are conditioning on. These water quality measure can be fully, partially, or not observed.

Cutoffs vs continuous measures

Identifying locations as extrapolations

Choosing IVH vs PV

With several methods of identifying extrapolations available we now provide additional guidance on choosing between various options. Cook’s approach of using the maximum leverage value to define the IVH boundary may be useful for either an univariate or a joint model in a linear regression framework. However, as it depends on covariate values alone, it lacks any influence of response data. Conn et al.’s gIVH introduces the use of posterior predictive variance instead of the hat matrix to define the hull boundary in the case of a generalized model.

Visualization and interpretation

Exploring data and taking a principled approach to identifying potential extrapolation points is often aided by visualization (and interpretation) of data and predictions. With the LAGOS data we examine spatial plots of the lakes and their locations coded by extrapolation vs prediction. Plotting this for multiple cutoff choices (as in Fig 3 ) is useful to explore how this choice can influence which locations are considered extrapolations. This is important from both an ecological and management perspective. For instance, if potential areas are identified as having many extrapolations this might suggest that specific lake ecosystems or landscapes have characteristics influencing processes governing nutrient dynamics in lakes not well captured by previously collected data—and thus may require further investigation.

Four cutoff approaches are compared and presented. Lakes in orange diamonds and red triangles indicate those where predictions were beyond the 99% and 95% cutoff values, respectively, and thus considered extrapolations. The color and shape of extrapolated lake locations are determined by which cutoff value first identifies the prediction at that location as an extrapolation.

https://doi.org/10.1371/journal.pone.0225715.g003

In addition to an exploration of possible extrapolation in physical space (through the plot in Fig 3 ), we also examine possible extrapolation in covariate space. Using either of the binary/numeric Extrapolation Index values, we propose a Classification and Regression Tree (CART) analysis with the extrapolation values as the response. Our classification approach allows for further insight into what covariates may be influential in determining whether a newly observed location is too dissimilar to existing ones. A CART model allows for the identification of regions in covariate space where predictions are suspect and may inform future sampling efforts as the available data has not fully characterized all lakes.

Model fitting

Fitting our multivariate linear model to the 8,910 lakes resulted in most lakes’ predictions remaining within the extrapolation index cutoff and thus not being identified as extrapolations. We explored the use of both trace and determinant for obtaining a scalar value representation of the multivariate posterior predictive variance in addition to four cutoff criteria. Using MVPV(tr) with these cutoffs (max value, leverage max, 0.99 quantile, and 0.95 quantile) resulted in 0, 1, 9, and 33 multivariate response predictions being identified as extrapolations, respectively. In contrast, using MVPV(D) values combined with the four cutoffs resulted in 0, 0, 8, 37 predictions identified as extrapolations. Unless all response variables are on the same scale we recommend the use of MVPV(D) over MVPV(tr). However, if a scale-invariant measure if not necessary, exploring the use of MVPV(tr) (in addition to MVPV(D) may reveal single-response variables that are of interest to researchers for further exploration using our Conditional MVPV approach. Fig 3 shows the spatial locations of lakes where the collective model predictions for TP, TN, Chl a, and Secchi depth have been identified as extrapolations using MVPV(D) combined with the cutoff measures. As the cutoff values become more conservative in nature the number of extrapolations identified increases. This figure shows the level of cutoff that first identifies a location as an extrapolation, (e.g. red squares are locations first flagged using the 99% cutoff, but they would also be included in the extrapolations found with the 95% cutoff). This increasing number of extrapolations identified highlights the importance of exploring different choices for a cutoff value. When the maximum value or the leverage-informed maximum of the predictive variance measure ( k max and k lev ) are used as cutoffs for determining when a prediction for an unsampled lake location should not be fully trusted, zero lakes are identified as extrapolations. Exploratory data analysis (see S1 Fig ) indicates that for each of the lakes identified as extrapolations, the values are within the distribution of the data, with only a few exceptions. Rather than a few key variables standing out, it appears to be some combination of variables that makes a lake an extrapolation. To further characterize the type of lake more likely to be identified as an extrapolation we used a CART Model with our binary extrapolation index results using the MVPV(D) and the 0.95 quantile cutoff. This approach can help identify regions in the covariate space where extrapolations are more likely to occur ( Fig 4 ). This CART analysis suggests the most important factors associated with extrapolation include shoreline length, elevation, stream density, and lake SDF. For example, a lake with a shoreline greater than 26 kilometers and above a certain elevation (≥ 279 m), is likely to be identified as an extrapolation when using this model to obtain predictions. This type of information is useful for ecologists trying to model lake nutrients because it suggests lakes with these types of characteristics may behave differently than other lakes. In fact, lake perimeter, SDF, and elevation have been shown to be associated with reservoirs relative to natural lakes [ 24 ]. Although it is beyond the scope of our paper to fully explore this notion because our existing database does not differentiate between natural lakes and reservoirs, these results lend support to our approach and conclusions.

Each level of nodes include the thresholds and variables used to sort the data. Node color indicate whether the majority of sorted inland lake locations were identified as predictions (blue) or extrapolations (red). The first row of numbers in a node indicate the number of lakes identified as predictions (right) or extrapolations (left) that have been sorted into this node. The second row of numbers indicate the percentage of lakes that are identified as predictions (left) or extrapolation (right) with the terminal nodes (square nodes) including the percentage of records sorted by the decision tree.

https://doi.org/10.1371/journal.pone.0225715.g004

We also employed the conditional single variable extrapolation through predictive variance approach to leverage all information known about a lake when considering whether a prediction of a single response variable (e.g. TN, as explored here) is an extrapolation ( Fig 5 ). These cutoffs resulted in 0, 2, 73, and 386 lake multivariate response predictions out of 5031 being identified as extrapolations. To characterize the type of lake more likely to be identified as an extrapolation we used a CART model using the 95% cutoff criterion. CART revealed the most important factors associated with extrapolation were latitude, maximum depth, and watershed to lake size ratio. Latitude may be expected as many of the lakes without measures for TN are located in the northern region. An additional visualization and table exploring extrapolation lakes and their covariate values may be found in S1 Tables .

Four cutoff approaches are compared and presented. Lakes in blue circles represent locations where TN predictions have been not been identified as extrapolations for any cutoff choice. Lakes in red squares, orange triangles, and yellow diamonds indicate those where predictions were beyond the cutoff values and thus considered extrapolations. The color and shape of extrapolated lake locations are determined by which cutoff value first identifies the prediction at that location as an extrapolation.

https://doi.org/10.1371/journal.pone.0225715.g005

We have presented different approaches for identifying and characterizing potential extrapolation points within multivariate response data. Ecological research is often faced with the challenge of explaining processes at broad scales with limited data. Financial, temporal, and logistical restrictions often prevent research efforts from fully exploring an ecosystem or ecological setting. Rather, ecologists rely on predictions made on a select amount of available data that may not fully represent the breadth of a system of study. By better understanding when extrapolation is occurring scientists may avoid making unsound inferences.

In our inland lakes example we addressed the issue of large-scale predictions to fill in missing data using a joint linear model presented by Wagner and Schliep [ 18 ]. With our novel approach for identifying and characterizing extrapolation in a multivariate setting we were able to provide numeric measures associated with extrapolation (MVPV, CMVPV, R(C)MVPV) allowing for focus on predictions for all response variables or a single response variable while conditioning on others. Each of these measurements, when paired with a cutoff criterion, identify novel locations that are extrapolations. Our recommendations for visualization and interpretation of these extrapolated lakes is useful for future analyses and predictions which inform policy and management decisions. Insight into identified extrapolations and their characteristics provides additional sampling locations to consider for future work. In this analysis we found certain lakes, such as lakes located at relatively higher elevations in our study area, are more likely to be identified as an extrapolation. The available data may thus not fully represent these types of lakes, resulting in them being poorly predicted, or identified as extrapolations.

The tools outlined in this work provide novel insights into identifying and characterizing extrapolations in multivariate response settings. Further extensions of this work are available but not explored in this paper. In addition to the A- and D-optimality approaches (trace and determinant, respectively) used to obtain scalar value representations of the covariance matrices one may also explore the utility of E-optimality (maximum eigenvalue) as an additional criterion. This approach would focus on examining variance in the first principle component of the predictive variance matrix and, like the trace, this variance is not a scale-invariant measure. Our work takes advantage of posterior predictive inference under a Bayesian setting to obtain an estimate of the variance of the predictive mean response vector for each lake. However, a frequentist approach using simulation-based methods may also provide an estimate of this variance through non-parametric or parametric bootstrapping (a comparison of the two for spatial abundance estimates may be found in Hedley and Buckland [ 25 ]) and the extrapolation coefficients may be obtained through the trace and/or determinant of this variance.

This work results in identification of extrapolated lake locations as well as further understanding of the unique covariate space they occupy. The resulting caution shown when using joint nutrient models to estimate water quality variables at lakes with partially or completely unsampled measures is necessary for larger goals such as estimating the overall combined levels of varying water qualities in all US inland lakes. In addition, under- or overestimating concentrations of key nutrients such as TN and TP can potentially lead to misinformed management strategies which may have deleterious effects on water quality and the lake ecosystem. The identification of lake and landscape characteristics associated with extrapolation locations can further understanding between natural/anthropogenic sources of nutrients in lakes not well represented in the sampled population. In our database, TP is sampled more than TN, which is likely due to the conventional wisdom that inland waters are P limited, where P contributes the most to eutrophication [ 26 ]. However, nitrogen has been shown to be an important nutrient in eutrophication in some lakes and some regions [ 27 ], and may be as important to sample to fully understand lake eutrophication. Our results show it is possible to predict TN if other water quality variables are available, but it would be better if it was sampled more often.

The joint model used in this work can be improved upon in several regards; no spatial component is included, response variables are averages over several years worth of data and thus temporal variation is not considered, and data from different years are given equal weight. The model we use to fit these data may be considered to be a simple one, but the novel approach presented here may be applied to more complicated models. In a sample based approach using a Bayesian framework the MVPV and CMVPV values obtained come from the MCMC samples and are thus independent from model design choices.

Deeper understanding of where extrapolation is occurring will allow researchers to propagate this uncertainty forward. Follow up analyses using model-based predictions need to acknowledge that some predictions are less trustworthy than others. This approach and our analysis here shows that while a model may be able to produce an estimate and a confidence or prediction interval, that does not mean the truth is captured nor does the assumed relationship persist, especially outside the range of observed data. The methods outlined here will serve to guide future scientific inquiries involving joint distribution models.

Supporting information

S1 fig. violin plots of covariate densities and extrapolation points plotted..

https://doi.org/10.1371/journal.pone.0225715.s001

S1 Tables. Tables of covariate values for lakes identified as extrapolations using MVPV(D) and CMVPV for TN.

https://doi.org/10.1371/journal.pone.0225715.s002

Acknowledgments

We thank the LAGOS Continental Limnology Research Team for helpful discussions throughout the process of this manuscript. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

This draft manuscript is distributed solely for purposes of scientific peer review. Its content is deliberative and predecisional, so it must not be disclosed or released by reviewers. Because the manuscript has not yet been approved for publication by the US Geological Survey (USGS), it does not represent any official finding or policy.

View Article
Google Scholar
7. Cook RD. Detection of Influential Observation in Linear Regression. Technometrics. 1977.
8. Conn PB, Johnson DS, Boveng PL. On extrapolating past the range of observed data when making statistical predictions in ecology. PLoS ONE. 2015.
PubMed/NCBI
15. Becker RA, Wilks AR, Brownrigg R, Minka TP. maps: Draw Geographical Maps. R package version 23-6. 2013;.
16. Mahalanobis PC. On the generalized distance in statistics. National Institute of Science of India. 1936;.
17. Etherington TR. Mahalanobis distances and ecological niche modelling: correcting a chi-squared probability error. PeerJ. 2019.
18. Wagner T, Schliep EM. Combining nutrient, productivity, and landscape-based regressions improves predictions of lake nutrients and provides insight into nutrient coupling at macroscales. Limnology and Oceanography. 2018.
19. Gentle JE. Matrix Algebra: Theory, Computations, and Applications in Statistics; 2007.
20. Lottig NR, Wagner T, Henry EN, Cheruvelil KS, Webster KE, Downing JA, et al. Long-term citizen-collected data reveal geographical patterns and temporal trends in lake water clarity. PLoS ONE. 2014.
21. Cook RD. Influential observations in linear regression. Journal of the American Statistical Association. 1979.
22. R Core Team. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria URL http://wwwR-projectorg/ . 2016;.
24. Doubek JP, Carey CC. Catchment, morphometric, and water quality characteristics differ between reservoirs and naturally formed lakes on a latitudinal gradient in the conterminous United States. Inland Waters. 2017.
25. Hedley SL, Buckland ST. Spatial models for line transect sampling. Journal of Agricultural, Biological, and Environmental Statistics. 2004.
26. Conley DJ, Paerl HW, Howarth RW, Boesch DF, Seitzinger SP, Havens KE, et al. Ecology—Controlling eutrophication: Nitrogen and phosphorus; 2009.
27. Paerl HW, Xu H, McCarthy MJ, Zhu G, Qin B, Li Y, et al. Controlling harmful cyanobacterial blooms in a hyper-eutrophic lake (Lake Taihu, China): The need for a dual nutrient (N & P) management strategy. Water Research. 2011.

Think your dog can understand words? This scientist says you might be right

New study on soundboard-trained dogs offers insights on their word comprehension levels.

Social Sharing

When Alexis Devine was having a tough time one day, her sheepadoodle dog named Bunny consoled her — just not in a way that most people might expect.

"I was talking to my partner over FaceTime and I was crying … and Bunny pressed 'no.' I wasn't looking at her, I was just engaged in the conversation," said Devine.

After what seemed like a plea from Bunny for her to not cry, Devine said, "There was a pause, and then she pressed 'love you.'"

Bunny communicates to Devine, the author of I Am Bunny, through a soundboard device — a floor placemat with pressable buttons for dogs that play pre-recorded words out loud. To date, Devine says Bunny can "speak" about 100 words.

On TikTok, Bunny's prowess delights over 8.5 million followers, and she is among many other pet dogs on the video-sharing platform — Oski the pug, Copper the Lab, Flambo the Aussie — who are shown to communicate with their owners using soundboards.

A woman holding a dog is sitting in front of a bookshelf.

As a researcher whose interests include the cognitive abilities of different species, the rise of "talking" dogs further piqued the interests of Federico Rossano, an associate professor in the department of cognitive sciences at the University of California, San Diego.

"My next reaction was like, somebody should study it because I think once you have hundreds, thousands of people doing this [online], we should know what's going on," Rossano, who also leads the Comparative Cognition Lab at his university, told The Current's Matt Galloway.

Published in the journal PLOS ONE last month, Rossano's findings suggest dogs in the study were able to understand certain words, contributing to the possibility of an enhanced bond between dogs and their owners.

As a dog owner, Devine says the results aren't unexpected.

"I think it's not a surprise to anyone who lives with dogs that they can make associations with objects and words."

What the findings mean

To investigate whether dogs trained to use soundboards actually understand the words that are played when a button is pressed, Rossano and his colleagues conducted two controlled experiments with 59 dogs.

Humans spoke, or pressed buttons that played, one of four words for the dogs to respond to, including "out/outside," "play/toy," "food/eat/dinner/hungry," and a nonsense word, "daxing."

The researchers aimed to rule out variables such as a dog responding to their owner's cues — like, the owner putting on shoes to go outside — the identity of the human saying the word, or memorizing the location of buttons on the board.

"Part of the new idea or challenge here was that, very often, you might think that the dog understands the word 'treat.' But really what the dog sees is you moving towards the cupboard and opening [it]," said Rossano.

"We wanted to show that this animal is actually paying attention to the words, the sounds."

Daphna Buchsbaum, an assistant professor of cognitive and psychological sciences at Brown University in Providence, R.I., who was not involved in the research, told CBC the study is a first step in "systematically trying to see how the dogs use these buttons, how they respond to them, and if the dogs are actually associating the buttons with particular outcomes in the world."

"What the researchers found in this case was that for at least some of the meanings, there did seem to be a relationship between the meaning of the words to us as humans and how the dogs responded."

Moving forward, Buchsbaum says it would be interesting to see what would happen if the button words were more abstract, or even existential.

"I think it's a lot easier to ask questions [like] does the dog understand things about food, outside and playing, than do they understand things about outer space, or the future, or the meaning of life," Buchsbaum said.

A woman wearing a green dress is sitting on green grass with a golden retriever dog.

Greater implications

Rossano is realistic about what he has found. At the study's start, he said that he was skeptical about finding out if dogs were using soundboards in a way that's more than "using a human as a vending machine."

However, there were certain times during the studies that caused him to think differently about a dog's communication and comprehension capabilities — including seeing them negotiate in a human-like way.

He gave the example of how a dog could respond to a human saying that they can't go outside, and not just settle for the human's first response.

"I see the dogs doing a 'back-and-forth' with the humans," said Rossano.

Chaser, the border collie that could recognize more than 1,000 words, has died

Rossano's research is ongoing, and he sees the current findings and continuing developments in the field as an opportunity to improve the welfare of our pet dogs.

"Hopefully [it] gives them a little more voice and control over their life, having them tell you what they need instead of us just trying to guess it."

Buchsbaum agrees, and says that enhanced communication could be helpful with dogs who "serve many working roles in our society," including scent detection dogs and assistance dogs.

'Talking dog' pushes limits of animal communication

Devine says that Bunny has changed her own expectations of the special relationship that's possible with a dog.

"We've got beautiful communication both with, and without, the buttons. I think her having the buttons really allowed me to more intentionally interpret aspects of her body language, and it definitely made me a better active listener," she said.

"In order to add words that were salient to her experience that she was going to want to use, I really had to pay attention to the things she was smelling and looking at, and things that caused any sort of emotional arousal within her."

ABOUT THE AUTHOR

Catherine Zhu is a writer and associate producer for CBC Radio’s The Current. Her reporting interests include science, arts and culture and social justice. She holds a master's degree in journalism from the University of British Columbia. You can reach her at [email protected].

Audio produced by Alison Masemann

IMAGES

💐 How to write up research findings. How to write chapter 4 Research
Research Findings
Implications of the study findings
Extrapolation. Learn how continuous improvement helps fill in data gaps
Infographic summarising the key findings of the study.
Summary of the data extrapolated from the identified studies

VIDEO

Interpolation and Extrapolation in Statistics || Statistical Analysis
ACE 745: Research Report (IUP)
Module 6: Radiation Epidemiologic Studies: What makes a study flawed?
Research Methodology in English Education /B.Ed. 4th Year/ Syllabus
How to Write Discussion of Research Findings and Policy Implications
Interpolation and Extrapolation: Basic Concepts

COMMENTS

Extrapolation in Statistical Research: Definition, Examples, Types
1. Linear Extrapolation. Linear extrapolation is the process of estimating a value that is close to the existing data. To do this, the researcher plots out a linear equation on a graph and uses the sequence of the values to predict immediate future data points. You can draw a tangent line at the last point and extend this line beyond its limits.
What Is Extrapolation? (Definition, Benefits, How to Use)
Extrapolation is a statistical technique used in data science to estimate values of data points beyond the range of the known values in the data set. Extrapolation is an inexpensive and effective method you can use to predict future values and trends in data, as well as gain insight into the behavior of complex environments.
Extrapolation
This impacts generalizability since findings that cannot be reliably extrapolated may misrepresent real-world scenarios or lead to ineffective interventions. Therefore, establishing external validity through careful study design and consideration of contextual factors is crucial for valid extrapolation.
What is Extrapolation? Everything You Need To Know
Extrapolation is the process of inferring values outside the range of the existing data to make predictions. Extrapolation is one of the essential methods that data scientists use to predict future trends and outcomes. When looking at a dataset, you can use extrapolation to predict what might happen in the future.
What Is Generalizability In Research?
Defining Generalizability. Generalizability refers to the extent to which a study's findings can be extrapolated to a larger population. It's about making sure that your findings apply to a large number of people, rather than just a small group. Generalizability ensures research findings are credible and reliable.
Extrapolate findings
Extrapolate findings. An evaluation usually involves some level of generalising of the findings to other times, places or groups of people. For many evaluations, this simply involves generalising from data about the current situation or the recent past to the future. For example, an evaluation might report that a practice or program has been ...
Identifying and characterizing extrapolation in multivariate response
Identifying locations as extrapolations. With the (C)MVPV values and cutoff choice in hand, determining which locations (observed/unobserved) are extrapolations is straightforward and results in a binary (yes/no) value. We refer to this delineation as our extrapolation index (e) e k p = { 1 if v p > k 0 otherwise.
Extrapolation
Extrapolation. In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between known observations, but extrapolation is subject to greater uncertainty and a higher risk ...
Statistics Notes: Generalisation and extrapolation
The usefulness of research lies primarily in the generalisation of the findings rather than in the information gained about those particular individuals. We study the patients in a trial not to find out anything about them but to predict what might happen to future patients given these treatments.
Concealing research outcomes: Missing data, negative results and missed
Clinical research when performed as per the existing guidelines is a highly exhaustive, rigorous, focussed and time-consuming job. ... This is logically followed by submission of all of the research findings for publication in peer-reviewed journals. ... the results extrapolated from the study should also be applicable to the larger population ...
The pillars of trustworthiness in qualitative research
Transferability pertains to the degree to which the research findings can be extrapolated to alternative contexts or situations [20], [21]. Qualitative researchers aim to offer comprehensive and intricate depictions of the study's environment, participants, and procedures to enhance the potential for transferability.
Participants in research: Routine extrapolation of randomised ...
EDITOR—For more than a decade it has been an article of faith in evidence based medicine that randomised controlled trials are "best evidence" and their findings can routinely be extrapolated to clinical situations.1 In his editorial Sackett, the founder of evidence based medicine, seeks retrospectively to reassure clinicians that this practice was justifiable, but the accompanying study ...
PDF Facing an Extrapolation? Steps for Checking the Statistical Approach
Major Steps in Statistical Sampling. Select the provider or supplier. Select the period to be reviewed. Define the sampling universe, the sampling unit, and the sampling frame. Design the sampling plan and select the sample. Review each of the sampling units and determine any overpayments or underpayments. Estimate the overpayment.
The myth and fallacy of simple extrapolation in medicine
Simple extrapolation is the orthodox approach to extrapolating from clinical trials in evidence-based medicine: extrapolate the relative effect size (e.g. the relative risk) from the trial unless there is a compelling reason not to do so. I argue that this method relies on a myth and a fallacy. The myth of simple extrapolation is the idea that the relative risk is a 'golden ratio' that is ...
5 Methods of Data Collection for Quantitative Research
Below are five examples of how to conduct your study through various data collection methods: Online quantitative surveys. Online surveys are a common and effective way of collecting data from a large number of people. They tend to be made up of closed-ended questions so that responses across the sample are comparable; however, a small number ...
Extrapolating baseline trend in single-case data: Problems and
Aim of the review. It has already been stated (Parker et al., 2011) and illustrated (Tarlow, 2017) that baseline trend extrapolation can lead to impossible forecasts for the subsequent intervention-phase data.Accordingly, the research question we chose was the percentage of studies in which extrapolating the baseline trend of the data set (across several different techniques for fitting the ...
Using Real-World Data to Extrapolate Evidence From Randomized
Early evidence using real-world data and methods for extrapolating evidence should be reported with clear explanation of assumptions and limitations especially when used to support regulatory and health technology assessment decisions. Randomized controlled trials (RCTs) provide evidence for regulatory agencies, shape clinical practice ...
Transferability and Generalization in Qualitative Research
In research, both generalization and transferability refer to the ability to apply findings and/or concepts from one research study to other persons, contexts, and times. They may take several forms. As graphically portrayed by Mayring (2007) , they are both inductive and deductive processes.
Extrapolating from Animals to Humans
Study design, conduct, and reporting can be improved—for example, by using the Animals in Research: Reporting In Vivo Experiments (ARRIVE) guidelines . Much animal research is iterative and exploratory, and it is not possible to lay out in advance and in detail the research agenda of all preclinical trials to be performed.
Generalisation and extrapolation of study results
Generalisation and extrapolation of study results. Researchers assessed the effectiveness of peritendinous autologous blood injections in patients with mid-portion Achilles tendinopathy. A randomised double-blind controlled trial was performed. The intervention consisted of two unguided peritendinous injections with 3 mL of the patient's ...
Global burden of bacterial antimicrobial resistance 1990-2021: a
This study presents the first comprehensive assessment of the global burden of AMR from 1990 to 2021, with results forecasted until 2050. Evaluating changing trends in AMR mortality across time and location is necessary to understand how this important global health threat is developing and prepares us to make informed decisions regarding interventions. Our findings show the importance of ...
Without Housing Solutions, the Rise of Aged Homelessness Could Create a
Key findings. Forecasts show significant growth for the older adult population experiencing homelessness, most notably for those older than 65, over the next decade. ... The authors extrapolated cost data in this study to the country as a whole and found the older adult population could cost the nation $5 billion, on average, in health and ...
Extrapolating Survival from Randomized Trials Using External Data: A
Study reports from a national health technology assessment program in the United Kingdom were searched, and the findings were combined with "pearl-growing" searches of the academic literature. ... net benefit of the intervention of interest is unlikely to cross the decision threshold in the period of time being extrapolated over. 73. More ...
USC launches large-scale nationwide study of type 1 diabetes and brain
A new large-scale longitudinal study, led by the Keck School of Medicine of USC, will unite 12 research centers across the United States to explore how type 1 diabetes affects children during a window of time known to be critical for healthy brain development. Ultimately, the findings could help refine clinical guidelines for managing type
Identifying and characterizing extrapolation in multivariate ...
Faced with limitations in data availability, funding, and time constraints, ecologists are often tasked with making predictions beyond the range of their data. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods ...
2024 North America Airport Satisfaction Study
TROY, Mich.: 18 Sept. 2024 — More than three million passengers passed through Transportation Safety Administration checkpoints at North American airports on July 7 this year, setting a record for one-day passenger volume. According to the J.D. Power 2024 North America Airport Satisfaction Study,SM released today, such huge volumes of passengers—and all the traffic and rapidly rising ...
Think your dog can understand words? This scientist says you might be
Daphna Buchsbaum, an assistant professor of cognitive and psychological sciences at Brown University in Providence, R.I., who was not involved in the research, told CBC the study is a first step ...