Logo

Interpreting the Intercept in a Regression Model

by Karen Grace-Martin   49 Comments

Interpreting the Intercept in a regression model isn’t always as straightforward as it looks.

Here’s the definition: the intercept (often labeled the constant) is the expected value of Y when all X=0. But that definition isn’t always helpful. So what does it really mean?

Regression with One Predictor X

Start with a very simple regression equation, with one predictor, X.

If X sometimes equals 0, the intercept is simply the expected value of Y at that value. In other words, it’s the mean of Y at one value of X. That’s meaningful.

If X never equals 0, then the intercept has no intrinsic meaning. You literally can’t interpret it. That’s actually fine, though. You still need that intercept to give you unbiased estimates of the slope and to calculate accurate predicted values. So while the intercept has a purpose, it’s not meaningful.

Both these scenarios are common in real data. In scientific research, the purpose of a regression model is one of two things.

One is to understand the relationship between predictors and the response.  If so, and if X never = 0, there is no interest in the intercept. It doesn’t tell you anything about the relationship between X and Y.

So whether the value of the intercept is meaningful or not, many times you’re just not interested in it. It’s not answering an actual research question .

The other purpose is prediction. You do need the intercept to calculate predicted values.  In market research or data science, there is usually more interest in prediction, so the intercept is more important here.

When A Meaningful Intercept is Important

When X never equals 0, but you want a meaningful intercept, it’s not hard to adjust things to get a meaningful intercept. Simply consider centering X.

zero intercept hypothesis

It will look something like: NewX = X – 20.

That’s it.

Just use NewX in your model instead of X. Now the intercept has a meaning. It’s the mean value of Y at the mean value of X.

Interpreting the Intercept in Regression Models with Multiple Xs

It all gets a little trickier when you have more than one X.

The definition still holds: the intercept is the expected value of Y when all X=0.

The emphasis here is on ALL.

And this is where it gets complicated. If all Xs are numerical, it’s an uncommon (though not unheard of) situation for every X to have values of 0. This is often why you’ll hear that intercepts aren’t important or worth interpreting.

But you always have the option to center all numerical Xs to get a meaningful intercept.

And when some Xs are categorical, the situation is different. Most of the time, categorical variables are dummy coded. Dummy coded variables have values of 0 for the reference group and 1 for the comparison group. Since the intercept is the expected value of Y when X=0, it is the mean value only for the reference group (when all other X=0). So having dummy-coded categorical variables in your model can give the intercept more meaning.

This is especially important to consider when the dummy coded predictor is included in an interaction term.  Say for example that X1 is a continuous variable centered at its mean.  X2 is a dummy coded predictor, and the model contains an interaction term for X1*X2.

The B value for the intercept is the mean value of X1 only for the reference group.  The mean value of X1 for the comparison group is the intercept plus the coefficient for X2.

It’s hard to give an example because it really depends on how X1 and X2 are coded. So I put together six situations in this follow up article: How to Interpret the Intercept in 6 Linear Regression Examples

zero intercept hypothesis

Reader Interactions

' src=

November 19, 2022 at 3:33 am

Plea how can I interpret this regression equation with a negative intercept? Y=-2.73+5.4x? I come across a negative intercept of a regression model

' src=

December 21, 2022 at 3:33 pm

There’s really no difference in interpreting a negative intercept, unless negative values aren’t possible for that variable. Not every intercept has a meaningful interpretation.

' src=

October 18, 2019 at 7:23 am

Hello, i have a question for my test that i hope i will find the answer here.

I made a calibration curve for my toxine with 6 differents concentration (in ng/mL) and i got a R2= 0.99 and y=10.273x – 20.395. Is it logic that the accuracy of intercept is higher than that of the slope ? and why this happened!

' src=

June 8, 2022 at 1:38 pm

thank you and what am i doing now?

' src=

September 10, 2019 at 11:24 am

Thanks so much for your explanations, Karen! I have a question: can I interpret the intercept (Y) in a regression model where my intercept is significant and two other predictors ( say X and Z), while X can never be zero but Z can be 0 ? In my case Y is a change score. If the intercept is not equal to zero and significant can I infer from this that there is an overall change?

October 28, 2019 at 10:14 am

The intercept is only interpretable if all predictors (X and Z) can be zero.

' src=

September 9, 2019 at 1:43 am

Hi, I m analyzing logistic regression for my independent and dependent variables, form the regression coefficient I want to calculate risk score of the independent variables on dependent variable. but in the regression model i got few variables have significant association and others have no significant relation with my dependent variables. so when calculating my score should I consider the intercept of the model with all significant and non significant independent variables or I should analyse another logistic regression with only the independent variables those are significant and then should take that intercept value ? please guide me….

' src=

August 12, 2019 at 10:59 am

thanks this is useful

' src=

March 19, 2019 at 9:12 am

Hi, pls answer me, Can intercept be zero In regression analysis??

March 21, 2019 at 4:12 pm

Sure. You just don’t want to force it to be.

' src=

June 21, 2020 at 11:05 pm

Thank you Karen for this answer

' src=

October 21, 2022 at 10:49 am

Please I would like to ask about LOD using a regression calibration line. Is it more beneficial to put x=0 and y=0 or not? Knowing that I do not tried this in lab but the variables are really related theoretically so that at x=0 y also must be equal 0. And when I do this the regression equation contains a value for the intercept although I have put it 0. Can I understand why? And is this true to put intercept 0 at x=0 when calculating LOD??

' src=

May 17, 2022 at 10:25 pm

' src=

November 30, 2018 at 3:24 am

I’m using my model to calculate predicted values so I need to include the constant. I’m concerned however that whilst my 3 regression variables are significant, the constant is not. I’m concerned that the constant being not significant means that I can’t be confident about the predicted values.

Thanks John

November 30, 2018 at 11:49 am

The p-value for the constant isn’t important. It’s testing the null hypothesis that the constant = 0.

So even if the constant isn’t significantly different from 0, including it will still give you more accurate predicted values AND more accurate slopes than if you eliminate it. When you eliminate it, you set it to 0.

' src=

June 17, 2023 at 10:29 am

Thanks Karen!

' src=

January 17, 2018 at 4:18 am

Everything is very open with a clear explanation of the issues. It was really informative. Your site is very helpful. Many thanks for sharing!

' src=

March 21, 2017 at 10:56 am

Thank you for this. May I suggest that it may be really helpful to use an example with real data to help explain how this works? For me (and I expect for others too), it would make it much easier to understand.

' src=

September 13, 2018 at 8:19 pm

I 100% agree. YES

' src=

October 25, 2016 at 5:36 am

Why value of intercept is zero? If intercept is not in the model than what happened?

' src=

July 26, 2016 at 3:04 am

Is this is possible??the intercept term of a regression model is negative??Can u make me clear with an easy example plz???

' src=

July 24, 2016 at 4:02 am

Case Summaries Imp Type N Mean 1 7 5.86 2 4 5.75 3 89 4.61 Total 100 4.74 How to interpret the above? 1 = Manufacturer; 2 = Distributor; 3 = Retailer of a product.

RECODE Type (1=0) (2=0) (3=1) INTO retail. EXECUTE. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Imp /METHOD=ENTER Size retail.

How to interpret the above? 1 = Manufacturer; 2 = Distributor; 3 = Retailer of a product Anil Bandyopadhyay

' src=

August 13, 2015 at 4:26 pm

Can we use negstive intercept ? I have two nagative intercept what can i do

August 20, 2015 at 5:29 pm

Yes, intercepts can be negative even if Y can’t. This usually occurs when none of the X values are close to 0.

' src=

May 19, 2015 at 3:23 am

what if coefficient of regression is 1.797?

' src=

May 13, 2015 at 8:51 am

why intercept used negative any way??????????

' src=

March 25, 2015 at 4:48 am

what happend if the intersept in not in minus. i m confuse i found some are in positive and some are in negative intesept.. please clarify.

' src=

January 9, 2015 at 1:53 pm

Hi, What if you use a tobit model where the dependent variable takes values of zero or more than zero and you get a negative intercept. You run the tobit model and you observe a negative constant. What does this mean in this case?

' src=

January 2, 2015 at 4:46 am

What is the fixed and estimated value in regression equation? a or b? Please reply asap .☺

' src=

December 16, 2014 at 1:40 pm

Wow. Thanks so helpful

' src=

November 10, 2014 at 10:19 am

Does the value of the intercept ever change, for example, when you are trying to interpret significant interaction terms? Say, I have three predictors, X1, X2, and X3 in a significant interaction. X1 is a four level categorical, and the other two are centered and continuous. So, the intercept can be defined as one level of X1, X2 = mean, and X3 = mean. And I can see that there is a 3-way interaction in which for one level of X1 (relative to the intercept as defined above), as X2 goes up and X3 goes up by one unit, I need to adjust the estimate for the simple effect of that level of X1 by some amount. But I am confused how this then relates back to the intercept. Is it really still defined as above? Or once I start considering the interaction, do I also change the designation for X2 and X3 in the intercept? Thanks.

' src=

September 2, 2014 at 9:02 pm

What if you intercept isn’t significant, and you are using a dummy variable? Should you still use it in your prediction equation?

September 3, 2014 at 9:44 am

' src=

November 26, 2019 at 12:31 am

I have regression equation y=74.626+1.2x, then how can the meaning of y-intercept be interpreted?

' src=

May 10, 2023 at 8:48 am

Why would you use the intercept even tough it is not significant? Are there any citable sources to use the intercept, even tough it is insignificant? Thanks

' src=

August 22, 2014 at 11:17 am

I will like to ask, when dealing with two indepents variable and our priori expectation for our coefficient is said to be greater than zero, what of if it happens that the intercept is negative, are we saying this is significant or insignificant ?

' src=

April 21, 2014 at 9:16 am

If I built 3 index variables and several dummy variables, then was told to test to see if there is a relationship between how satisfied employees are in their job (index variable 1) and how they see the environment around them. I ran the regression and my results were that my two Independent Variable Indexes were significant, but my constant was not. The Adj. R-square was .608 and the F Sig was .000.

What am I doing wrong?? Or what can I interpret from my results?

' src=

March 11, 2014 at 9:52 pm

In a negative binomial regression, what would it mean if the Exp(B) value for the intercept falls below the lower limit of the 95% Confidence Interval?

April 4, 2014 at 9:57 am

Hmm, not sure I understand your question. CI for what?

' src=

December 4, 2013 at 12:55 pm

Hi! What happens if all of my variables can be 0 which had a significant regressions coefficient? (I have four Xs, 3 of them have a significant coefficient and can be 0 as they are either dummies or are on a scale from 0 and there are 0s in the sample, but one of the Xs cannot be 0. It’s also the one with not significant coefficient.) Thanks!

December 9, 2013 at 10:50 am

If ANY of the Xs can’t be 0, then the intercept doesn’t mean anything. Or rather, it’s just an anchor point, but it’s not directly interpretable.

' src=

November 25, 2013 at 5:20 pm

als would like to as about, if we decrease sample by half will SSE, SSR, SST increase or decrease, a bit confused.

November 25, 2013 at 5:58 pm

None would change, theoretically. Sums of Squares are not directly affected by sample size.

November 25, 2013 at 5:17 pm

does this mean that if education is =to zero, i.e no education, then the expected mean of y =-5

November 25, 2013 at 5:57 pm

November 23, 2013 at 10:28 am

quetion: if wage =-5+10*years of education and wage is measure in 1000s; how do you interpret the coeffficient and does the intercept make sende

November 25, 2013 at 3:29 pm

This sounds like a homework question, so I’m going to try to answer only by getting you to think through it.

Since the intercept ALWAYS is the mean of Y (1000 of dollars or whatever the currency is) when X=0, it will only be meaningful if it’s meaningful that X=0 AND if there are examples in the data set. Is there anyone in the data set with years of education = 0?

' src=

February 2, 2013 at 6:25 pm

I’d like to now why the need for a column of ones in the model to account for the intercept. I would need a basic answer, since I’m not a mathematician. Thank you.

March 4, 2013 at 11:08 am

In the X matrix, each column is the value of the X that is multiplied by that regression coefficient.

Since the intercept isn’t multiplied by any values of X, we put in 1s.

It makes all the matrix algebra work out.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Privacy Overview

Teach yourself statistics

Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y .

The test focuses on the slope of the regression line

Y = Β 0 + Β 1 X

where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

  • The dependent variable Y has a linear relationship to the independent variable X .
  • For each value of X, the probability distribution of Y has the same standard deviation σ.
  • The Y values are independent.
  • The Y values are roughly normally distributed (i.e., symmetric and unimodal ). A little skewness is ok if the sample size is large.

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

If there is a significant linear relationship between the independent variable X and the dependent variable Y , the slope will not equal zero.

H o : Β 1 = 0

H a : Β 1 ≠ 0

The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero.

Analyze Sample Data

Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.

Predictor Coef SE Coef T P
Constant 76 30 2.53 0.01
X 35 20 1.75 0.04

SE = s b 1 = sqrt [ Σ(y i - ŷ i ) 2 / (n - 2) ] / sqrt [ Σ(x i - x ) 2 ]

  • Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35.

t = b 1 / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Annual bill = 0.55 * Home size + 15

Predictor Coef SE Coef T P
Constant 15 3 5.0 0.00
Home size 0.55 0.24 2.29 0.01

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance.

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

H o : The slope of the regression line is equal to zero.

H a : The slope of the regression line is not equal to zero.

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero.

We get the slope (b 1 ) and the standard error (SE) from the regression output.

b 1 = 0.55       SE = 0.24

We compute the degrees of freedom and the t statistic, using the following equations.

DF = n - 2 = 101 - 2 = 99

t = b 1 /SE = 0.55/0.24 = 2.29

where DF is the degrees of freedom, n is the number of observations in the sample, b 1 is the slope of the regression line, and SE is the standard error of the slope.

  • Interpret results . Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13.6 Testing the Regression Coefficients

Learning objectives.

  • Conduct and interpret a hypothesis test on individual regression coefficients.

Previously, we learned that the population model for the multiple regression equation is

[latex]\begin{eqnarray*} y & = & \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_kx_k +\epsilon \end{eqnarray*}[/latex]

where [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex] are the population parameters of the regression coefficients, and [latex]\epsilon[/latex] is the error variable.  In multiple regression, we estimate each population regression coefficient [latex]\beta_i[/latex] with the sample regression coefficient [latex]b_i[/latex].

In the previous section, we learned how to conduct an overall model test to determine if the regression model is valid.  If the outcome of the overall model test is that the model is valid, then at least one of the independent variables is related to the dependent variable—in other words, at least one of the regression coefficients [latex]\beta_i[/latex] is not zero.  However, the overall model test does not tell us which independent variables are related to the dependent variable.  To determine which independent variables are related to the dependent variable, we must test each of the regression coefficients.

Testing the Regression Coefficients

For an individual regression coefficient, we want to test if there is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].

  • No Relationship .  There is no relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficient [latex]\beta_i[/latex] is zero.  This is the claim for the null hypothesis in an individual regression coefficient test:  [latex]H_0: \beta_i=0[/latex].
  • Relationship.  There is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficients [latex]\beta_i[/latex] is not zero.  This is the claim for the alternative hypothesis in an individual regression coefficient test:  [latex]H_a: \beta_i \neq 0[/latex].  We are not interested if the regression coefficient [latex]\beta_i[/latex] is positive or negative, only that it is not zero.  We only need to find out if the regression coefficient is not zero to demonstrate that there is a relationship between the dependent variable and the independent variable. This makes the test on a regression coefficient a two-tailed test.

In order to conduct a hypothesis test on an individual regression coefficient [latex]\beta_i[/latex], we need to use the distribution of the sample regression coefficient [latex]b_i[/latex]:

  • The mean of the distribution of the sample regression coefficient is the population regression coefficient [latex]\beta_i[/latex].
  • The standard deviation of the distribution of the sample regression coefficient is [latex]\sigma_{b_i}[/latex].  Because we do not know the population standard deviation we must estimate [latex]\sigma_{b_i}[/latex] with the sample standard deviation [latex]s_{b_i}[/latex].
  • The distribution of the sample regression coefficient follows a normal distribution.

Steps to Conduct a Hypothesis Test on a Regression Coefficient

[latex]\begin{eqnarray*} H_0: &  &  \beta_i=0 \\ \\ \end{eqnarray*}[/latex]

[latex]\begin{eqnarray*} H_a: &  & \beta_i \neq 0 \\ \\ \end{eqnarray*}[/latex]

  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}t & = & \frac{b_i-\beta_i}{s_{b_i}} \\ \\ df &  = & n-k-1 \\  \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

The required [latex]t[/latex]-score and p -value for the test can be found on the regression summary table, which we learned how to generate in Excel in a previous section.

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week”.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \beta_1=0 \\   H_a: & & \beta_1 \neq 0 \end{eqnarray*}[/latex]

The regression summary table generated by Excel is shown below:

Multiple R 0.711779225
R Square 0.506629665
Adjusted R Square 0.436148189
Standard Error 1.585212784
Observations 25
Regression 3 54.189109 18.06303633 7.18812504 0.001683189
Residual 21 52.770891 2.512899571
Total 24 106.96
Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

The  p -value for the test on the hours of unpaid work per week regression coefficient is in the bottom part of the table under the P-value column of the Hours of Unpaid Work per Week row .  So the  p -value=[latex]0.0082[/latex].

Conclusion:  

Because p -value[latex]=0.0082 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”

  • The null hypothesis [latex]\beta_1=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 1 because the “hours of unpaid work per week” is defined as [latex]x_1[/latex] in the regression model.
  • The p -value for the tests on the regression coefficients are located in the bottom part of the table under the P-value column heading in the corresponding independent variable row. 
  • Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  This is the value calculated out by Excel in the regression summary table.
  • The p -value of 0.0082 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_1[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”  This means that the independent variable “hours of unpaid work per week” is useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “age”.

[latex]\begin{eqnarray*} H_0: & & \beta_2=0 \\   H_a: & & \beta_2 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the age regression coefficient is in the bottom part of the table under the P-value column of the Age row .  So the  p -value=[latex]0.8439[/latex].

Because p -value[latex]=0.8439 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “age.”

  • The null hypothesis [latex]\beta_2=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “age.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “age.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 2 because “age” is defined as [latex]x_2[/latex] in the regression model.
  • The p -value of 0.8439 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the regression coefficient [latex]\beta_2[/latex] is zero, and so there is no relationship between the dependent variable “job satisfaction” and the independent variable “age.”  This means that the independent variable “age” is not particularly useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “income”.

[latex]\begin{eqnarray*} H_0: & & \beta_3=0 \\   H_a: & & \beta_3 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the income regression coefficient is in the bottom part of the table under the P-value column of the Income row .  So the  p -value=[latex]0.0060[/latex].

Because p -value[latex]=0.0060 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”

  • The null hypothesis [latex]\beta_3=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “income.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “income.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 3 because “income” is defined as [latex]x_3[/latex] in the regression model.
  • The p -value of 0.0060 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_3[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”  This means that the independent variable “income” is useful in predicting the dependent variable.

Concept Review

The test on a regression coefficient determines if there is a relationship between the dependent variable and the corresponding independent variable.  The p -value for the test is the sum of the area in tails of the [latex]t[/latex]-distribution.  The p -value can be found on the regression summary table generated by Excel.

The hypothesis test for a regression coefficient is a well established process:

  • Write down the null and alternative hypotheses in terms of the regression coefficient being tested.  The null hypothesis is the claim that there is no relationship between the dependent variable and independent variable.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and independent variable.
  • Collect the sample information for the test and identify the significance level.
  • The p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  Use the regression summary table generated by Excel to find the p -value.
  • Compare the  p -value to the significance level and state the outcome of the test.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

zero intercept hypothesis

Snapsolve any problem by taking a picture. Try it in the Numerade app?

Student Solutions Manual t/a Basic Econometrics

Damodar gujarati, two-variable regression: interval estimation and hypothesis testing - all with video answers.

Chapter Questions

State with reason whether the following statements are true, false, or uncertain. Be precise. a. The $t$ test of significance discussed in this chapter requires that the sampling distributions of estimators $\hat{\beta}_1$ and $\hat{\beta}_2$ follow the normal distribution. b. Even though the disturbance term in the CLRM is not normally distributed, the OLS estimators are still unbiased. c. If there is no intercept in the regression model, the estimated $u_i\left(=\hat{u}_i\right)$ will not sum to zero. d. The $p$ value and the size of a test statistic mean the same thing. e. In a regression model that contains the intercept, the sum of the residuals is always zero. f. If a null hypothesis is not rejected, it is true. g. The higher the value of $\sigma^2$, the larger is the variance of $\hat{\beta}_2$ given in (3.3.1). h. The conditional and unconditional means of a random variable are the same things. i. In the two-variable PRF, if the slope coefficient $\beta_2$ is zero, the intercept $\beta_1$ is estimated by the sample mean $\bar{Y}$. j. The conditional variance, $\operatorname{var}\left(Y_i \mid X_i\right)=\sigma^2$, and the unconditional variance of $Y, \operatorname{var}(Y)=\sigma_Y^2$, will be the same if $X$ had no influence on $Y$.

Rashmi Sinha

Set up the ANOVA table in the manner of Table 5.4 for the regression model given in (3.7.2) and test the hypothesis that there is no relationship between food expenditure and total expenditure in India.

From the data given in Table 2.6 on earnings and education, we obtained the following regression [see Eq. (3.7.3)]: $$ \begin{aligned} & \overline{\text { Meanwage }}_i=0.7437+0.6416 \text { Education }_i \\ & \mathrm{se}=(0.8355) \quad(\quad) \\ & t=(\quad) \quad(9.6536) \quad r^2=0.8944 \quad n=13 \\ & \end{aligned} $$ a. Fill in the missing numbers. b. How do you interpret the coefficient 0.6416 ? c. Would you reject the hypothesis that education has no effect whatsoever on wages? Which test do you use? And why? What is the $p$ value of your test statistic? d. Set up the ANOVA table for this example and test the hypothesis that the slope coefficient is zero. Which test do you use and why? e. Suppose in the regression given above the $r^2$ value was not given to you. Could you have obtained it from the other information given in the regression?

Heather Duong

Let $\rho^2$ represent the true population coefficient of correlation. Suppose you want to test the hypothesis that $\rho^2=0$. Verbally explain how you would test this hypothesis. Hint: Use Eq. (3.5.11). See also exercise 5.7.

Srikar Katta

What is known as the characteristic line of modern investment analysis is simply the regression line obtained from the following model: $$ r_{i t}=\alpha_i+\beta_i r_{m i}+u_t $$ where $r_{i t}=$ the rate of return on the $i$ th security in time $t$ $r_{m t}=$ the rate of return on the market portfolio in time $t$ $u_t=$ stochastic disturbance term

In this model $\beta_i$ is known as the beta coefficient of the $i$ th security, a measure of market (or systematic) risk of a security.

On the basis of 240 monthly rates of return for the period 1956-1976, Fogler and Ganapathy obtained the following characteristic line for IBM stock in relation to the market portfolio index developed at the University of Chicago*: $$ \begin{array}{rlrl} \hat{r}_{i t} & =0.7264+1.0598 r_{m t} & r^2 & =0.4710 \\ \text { se }=(0.3001)(0.0728) & \mathrm{df} & =238 \\ F_{1,23 s} & =211.896 \end{array} $$ a. A security whose beta coefficient is greater than one is said to be a volatile or aggressive security. Was IBM a volatile security in the time period under study? b. Is the intercept coefficient significantly different from zero? If it is, what is its practical meaning?

Sheryl Ezze

Equation (5.3.5) can also be written as $$ \operatorname{Pr}\left[\hat{\beta}_2-t_{\alpha / 2} \operatorname{se}\left(\hat{\beta}_2\right)<\beta_2<\hat{\beta}_2+t_{\alpha / 2} \operatorname{se}\left(\hat{\beta}_2\right)\right]=1-\alpha $$

That is, the weak inequality $(\leq$ ) can be replaced by the strong inequality $(<)$. Why?

Victor Salazar

R. A. Fisher has derived the sampling distribution of the correlation coefficient defined in (3.5.13). If it is assumed that the variables $X$ and $Y$ are jointly normally distributed, that is, if they come from a bivariate normal distribution (see Appendix 4A, exercise 4.1), then under the assumption that the population correlation coefficient $\rho$ is zero, it can be shown that $t=r \sqrt{n-2} / \sqrt{1-r^2}$ follows Student's $t$ distribution with $n-2$ df." Show that this $t$ value is identical with the $t$ value given in (5.3.2) under the null hypothesis that $\beta_2=0$. Hence establish that under the same null hypothesis $F=t^2$. (See Section 5.9.)

Consider the following regression output': $$ \begin{array}{lll} \hat{Y}_i & =0.2033+0.6560 X_t \\ \text { se } & =(0.0976) & (0.1961) \\ r^2 & =0.397 \quad \text { RSS }=0.0544 & \text { ESS }=0.0358 \end{array} $$ where $Y=$ labor force participation rate (LFPR) of women in 1972 and $X=$ LFPR of women in 1968 . The regression results were obtained from a sample of 19 cities in the United States. a. How do you interpret this regression? b. Test the hypothesis: $H_0: \beta_2=1$ against $H_1: \beta_2>1$. Which test do you use? And why? What are the underlying assumptions of the test(s) you use? c. Suppose that the LFPR in 1968 was 0.58 (or 58 percent). On the basis of the regression results given above, what is the mean LFPR in 1972? Establish a $95 \%$ confidence interval for the mean prediction. d. How would you test the hypothesis that the error term in the population regression is normally distribute? Show the necessary calculations.

Table 5.5 gives data on average public teacher pay (annual salary in dollars) and spending on public schools per pupil (dollars) in 1985 for 50 states and the District of Columbia. TABLE CAN'T COPY. To find out if there is any relationship between teacher's pay and per pupil expenditure in public schools, the following model was suggested: $\mathrm{Pay}_i=$ $\beta_1+\beta_2$ Spend $_i+u_i$, where Pay stands for teacher's salary and Spend stands for per pupil expenditure. a. Plot the data and eyeball a regression line. b. Suppose on the basis of a you decide to estimate the above regression model. Obtain the estimates of the parameters, their standard errors, $r^2$, RSS, and ESS. c. Interpret the regression. Does it make economic sense? d. Establish a $95 \%$ confidence interval for $\beta_2$. Would you reject the hypothesis that the true slope coefficient is 3.0 ? e. Obtain the mean and individual forecast value of Pay if per pupil spending is $\$ 5000$. Also establish $95 \%$ confidence intervals for the true mean and individual values of Pay for the given spending figure. f. How would you test the assumption of the normality of the error term? Show the test(s) you use.

Pritesh Ranjan

Refer to exercise 3.20 and set up the ANOVA tables and test the hypothesis that there is no relationship between productivity and real wage compensation. Do this for both the business and nonfarm business sectors.

Beth Stone

Refer to exercise 1.7 . a. Plot the data with impressions on the vertical axis and advertising expenditure on the horizontal axis. What kind of relationship do you observe? b. Would it be appropriate to fit a bivariate linear regression model to the data? Why or why not? If not, what type of regression model will you fit the data to? Do we have the necessary tools to fit such a model? c. Suppose you do not plot the data and simply fit the bivariate regression model to the data. Obtain the usual regression output. Save the results for a later look at this problem.

Adriano Chikande

Refer to exercise 1.1 . a. Plot the U.S. Consumer Price Index (CPI) against the Canadian CPI. What does the plot show? b. Suppose you want to predict the U.S. CPI on the basis of the Canadian CPI. Develop a suitable model. c. Test the hypothesis that there is no relationship between the two CPIs. Use $\alpha=5 \%$. If you reject the null hypothesis, does that mean the Canadian CPI "causes" the U.S. CPI? Why or why not?

Bobby Barnes

Refer to exercise 3.22 . a. Estimate the two regressions given there, obtaining standard errors and the other usual output. b. Test the hypothesis that the disturbances in the two regression models are normally distributed. c. In the gold price regression, test the hypothesis that $\beta_2=1$, that is, there is a one-to-one relationship between gold prices and CPI (i.e., gold is a perfect hedge). What is the $p$ value of the estimated test statistic? d. Repeat step c for the NYSE Index regression. Is investment in the stock market a perfect hedge against inflation? What is the null hypothesis you are testing? What is its $p$ value? e. Between gold and stock, which investment would you choose? What is the basis of your decision?

Dominador Tan

Table 5.6 gives data on GNP and four definitions of the money stock for the United States for 1970-1983. Regressing GNP on the various definitions of money, we obtain the results shown in Table 5.7. The monetarists or quantity theorists maintain that nominal income (i.e., nominal GNP) is largely determined by changes in the quantity or the stock of money, although there is no consensus as to the "right" definition of money. Given the results in the preceding table, consider these questions: a. Which definition of money seems to be closely related to nominal GNP? b. Since the $r^2$ terms are uniformly high, does this fact mean that our choice for definition of money does not matter? c. If the Fed wants to control the money supply, which one of these money measures is a better target for that purpose? Can you tell from the regression results? TABLE CAN'T COPY.

Oluwadamilola Ameobi

Suppose the equation of an indifference curve between two goods is $$ X_i Y_i=\beta_1+\beta_2 X_i $$ TABLE CAN'T COPY. How would you estimate the parameters of this model? Apply the preceding model to the data in Table 5.8 and comment on your results.

Since 1986 the Economist has been publishing the Big Mac Index as a crude, and hilarious, measure of whether international currencies are at their "correct" exchange rate, as judged by the theory of purchasing power parity (PPP). The PPP holds that a unit of currency should be able to buy the same bundle of goods in all countries. The proponents of PPP argue that, in the long run, currencies tend to move toward their PPP. The Economist uses McDonald's Big Mac as a representative bundle and gives the information in Table 5.9. Consider the following regression model: $$ Y_i=\beta_1+\beta_2 X_i+u_i $$ where $Y=$ actual exchange rate and $X=$ implied PPP of the dollar: TABLE CAN'T COPY. a. If the PPP holds, what values of $\beta_1$ and $\beta_2$ would you expect a priori? b. Do the regression results support your expectation? What formal test do you use to test your hypothesis? c. Should the Economist continue to publish the Big Mac Index? Why or why not?

Refer to the S.A.T. data given in exercise 2.16. Suppose you want to predict the male math $(Y)$ scores on the basis of the female math scores $(X)$ by running the following regression: $$ Y_t=\beta_1+\beta_2 X_t+u_t $$ a. Estimate the preceding model. b. From the estimated residuals, find out if the normality assumption can be sustained. c. Now test the hypothesis that $\beta_2=1$, that is, there is a one-to-one correspondence between male and female math scores. d. Set up the ANOVA table for this problem.

Repeat the exercise in the preceding problem but let $Y$ and $X$ denote the male and female verbal scores, respectively.

Table 5.10 gives annual data on the Consumer Price Index (CPI) and the Wholesale Price Index (WPI), also called Producer Price Index (PPI), for the U.S. economy for the period 1960-1999. a. Plot the CPI on the vertical axis and the WPI on the horizontal axis. A priori, what kind of relationship do you expect between the two indexes? Why? $$ \begin{aligned} &\text { CPI AND WPI, UNITED STATES, 1960-1999 }\\ &\begin{array}{cccccc} \hline \text { Year } & \text { CPI } & \text { WPI } & \text { Year } & \text { CPI } & \text { WPI } \\ \hline 1960 & 29.8 & 31.7 & 1980 & 86.3 & 93.8 \\ 1961 & 30.0 & 31.6 & 1981 & 94.0 & 98.8 \\ 1962 & 30.4 & 31.6 & 1982 & 97.6 & 100.5 \\ 1963 & 30.9 & 31.6 & 1983 & 101.3 & 102.3 \\ 1964 & 31.2 & 31.7 & 1984 & 105.3 & 103.5 \\ 1965 & 31.8 & 32.8 & 1985 & 109.3 & 103.6 \\ 1966 & 32.9 & 33.3 & 1986 & 110.5 & 99.70 \\ 1967 & 33.9 & 33.7 & 1987 & 115.4 & 104.2 \\ 1968 & 35.5 & 34.6 & 1988 & 120.5 & 109.0 \\ 1969 & 37.7 & 36.3 & 1989 & 126.1 & 113.0 \\ 1970 & 39.8 & 37.1 & 1990 & 133.8 & 118.7 \\ 1971 & 41.1 & 38.6 & 1991 & 137.9 & 115.9 \\ 1972 & 42.5 & 41.1 & 1992 & 141.9 & 117.6 \\ 1973 & 46.2 & 47.4 & 1993 & 145.8 & 118.6 \\ 1974 & 51.9 & 57.3 & 1994 & 149.7 & 121.9 \\ 1975 & 55.5 & 59.7 & 1995 & 153.5 & 125.7 \\ 1976 & 58.2 & 62.5 & 1996 & 158.6 & 128.8 \\ 1977 & 62.1 & 66.2 & 1997 & 161.3 & 126.7 \\ 1978 & 67.7 & 72.7 & 1998 & 163.9 & 122.7 \\ 1979 & 76.7 & 83.4 & 1999 & 168.3 & 128.0 \\ \hline \end{array} \end{aligned} $$

Emily Frampton

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

2.1 - inference for the population intercept and slope.

Recall that we are ultimately always interested in drawing conclusions about the population , not the particular sample we observed . In the simple regression setting, we are often interested in learning about the population intercept \(\beta_{0}\) and the population slope \(\beta_{1}\). As you know, confidence intervals and hypothesis tests are two related, but different, ways of learning about the values of population parameters. Here, we will learn how to calculate confidence intervals and conduct hypothesis tests for both \(\beta_{0}\) and \(\beta_{1}\).

Let's revisit the example concerning the relationship between skin cancer mortality and state latitude ( Skin Cancer data ). The response variable y is the mortality rate (number of deaths per 10 million people) of white males due to malignant skin melanoma from 1950-1959. The predictor variable x is the latitude (degrees North) at the center of each of the 49 states in the United States. A subset of the data looks like this:

Mortality Rate of White Males Due to Malignant Skin Melanoma

 \(\vdots\)

and a plot of the data with the estimated regression equation looks like this:

mortality vs latitude plot

Is there a relationship between state latitude and skin cancer mortality? Certainly, since the estimated slope of the line, b 1 , is -5.98, not 0, there is a relationship between state latitude and skin cancer mortality in the sample of 49 data points. But, we want to know if there is a relationship between the population of all the latitudes and skin cancer mortality rates. That is, we want to know if the population slope \(\beta_{1}\)is unlikely to be 0.

(1-\(\alpha\))100% t-interval for the slope parameter \(\beta_{1}\) Section  

The formula for the confidence interval for \(\beta_{1}\), in words, is:

Sample estimate ± (t-multiplier × standard error)

and, in notation, is:

\(b_1 \pm t_{(\alpha/2, n-2)}\times \left( \dfrac{\sqrt{MSE}}{\sqrt{\sum(x_i-\bar{x})^2}} \right)\)

The resulting confidence interval not only gives us a range of values that is likely to contain the true unknown value \(\beta_{1}\). It also allows us to answer the research question "is the predictor x linearly related to the response y ?" If the confidence interval for \(\beta_{1}\) contains 0, then we conclude that there is no evidence of a linear relationship between the predictor x and the response y in the population. On the other hand, if the confidence interval for \(\beta_{1}\)does not contain 0, then we conclude that there is evidence of a linear relationship between the predictor x and the response y in the population.

An \(\alpha\)-level hypothesis test for the slope parameter \(\beta_{1}\)  Section  

We follow standard hypothesis test procedures in conducting a hypothesis test for the slope \(\beta_{1}\). First, we specify the null and alternative hypotheses:

  • Null hypothesis \(H_{0} \colon \beta_{1}\)= some number \(\beta\)
  • Alternative hypothesis \(H_{A} \colon \beta_{1}\)≠ some number \(\beta\)

The phrase "some number \(\beta\)" means that you can test whether or not the population slope takes on any value. Most often, however, we are interested in testing whether \(\beta_{1}\) is 0. By default, Minitab conducts the hypothesis test with the null hypothesis, \(\beta_{1}\) is equal to 0, and the alternative hypothesis, \(\beta_{1}\)is not equal to 0. However, we can test values other than 0 and the alternative hypothesis can also state that \(\beta_{1}\) is less than (<) some number \(\beta\) or greater than (>) some number \(\beta\).

Second, we calculate the value of the test statistic using the following formula:

Third, we use the resulting test statistic to calculate the P -value. As always, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Finally, we make a decision:

  • If the P -value is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. We conclude that "there is sufficient evidence at the \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y ."
  • If the P -value is larger than the significance level \(\alpha\), we fail to reject the null hypothesis. We conclude "there is not enough evidence at the \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y ."

Minitab 18

Minitab ®

Drawing conclusions about the slope parameter \(\beta_{1}\) using minitab section  .

Let's see how we can use Minitab to calculate confidence intervals and conduct hypothesis tests for the slope \(\beta_{1}\). Minitab's regression analysis output for our skin cancer mortality and latitude example appears below.

The line pertaining to the latitude predictor, Lat , in the summary table of predictors has been bolded. It tells us that the estimated slope coefficient \(b_{1}\), under the column labeled Coef , is -5.9776 . The estimated standard error of \(b_{1}\), denoted se (\(b_{1}\)), in the column labeled SE Coef for "standard error of the coefficient," is 0.5984 .

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Constant 1 36464 36464 98.80 0.000
Residual Error 47 17173 365    
Total 48 53637      

Coefficients

Predictor Coef SE Coef T-Value P-Value
Constant 389.19 23.81 16.34 0.000
Lat -5.9776 0.5984 -9.99 0.000

Model Summary

S R-sq R-sq(adj)
19.12 68.0% 67.3%

The Regression equation

Mort = 389 - 5.98 Lat

By default, the test statistic is calculated assuming the user wants to test that the slope is 0. Dividing the estimated coefficient of -5.9776 by the estimated standard error of 0.5984, Minitab reports that the test statistic T is -9.99 .

By default, the P -value is calculated assuming the alternative hypothesis is a "two-tailed, not-equal-to" hypothesis. Upon calculating the probability that a t -random variable with n -2 = 47 degrees of freedom would be larger than 9.99, and multiplying the probability by 2, Minitab reports that P is 0.000 (to three decimal places). That is, the P -value is less than 0.001. (Note we multiply the probability by 2 since this is a two-tailed test.)

Because the P -value is so small (less than 0.001), we can reject the null hypothesis and conclude that \(\beta_{1}\) does not equal 0. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that there is a linear relationship in the population between skin cancer mortality and latitude.

It's easy to calculate a 95% confidence interval for \(\beta_{1}\) using the information in the Minitab output. You just need to use Minitab to find the t -multiplier for you. It is \(t_{\left(0.025, 47\right)} = 2.0117\). Then, the 95% confidence interval for \(\beta_{1}\)is \(-5.9776 ± 2.0117(0.5984) \) or (-7.2, -4.8). (Alternatively, Minitab can display the interval directly if you click the "Results" tab in the Regression dialog box, select "Expanded Table" and check "Coefficients.")

We can be 95% confident that the population slope is between -7.2 and -4.8. That is, we can be 95% confident that for every additional one-degree increase in latitude, the mean skin cancer mortality rate decreases between 4.8 and 7.2 deaths per 10 million people.

Video: Using Minitab for the Slope Test

Factors affecting the width of a confidence interval for \(\beta_{1}\) Section  

Recall that, in general, we want our confidence intervals to be as narrow as possible. If we know what factors affect the length of a confidence interval for the slope \(\beta_{1}\), we can control them to ensure that we obtain a narrow interval. The factors can be easily determined by studying the formula for the confidence interval:

First, subtracting the lower endpoint of the interval from the upper endpoint of the interval, we determine that the width of the interval is:

So, how can we affect the width of our resulting interval for \(\beta_{1}\)?

  • As the confidence level decreases, the width of the interval decreases. Therefore, if we decrease our confidence level, we decrease the width of our interval. Clearly, we don't want to decrease the confidence level too much. Typically, confidence levels are never set below 90%.
  • As MSE decreases, the width of the interval decreases. The value of MSE depends on only two factors — how much the responses vary naturally around the estimated regression line, and how well your regression function (line) fits the data. Clearly, you can't control the first factor all that much other than to ensure that you are not adding any unnecessary error in your measurement process. Throughout this course, we'll learn ways to make sure that the regression function fits the data as well as it can.
  • The more spread out the predictor x values, the narrower the interval. The quantity \(\sum(x_i-\bar{x})^2\) in the denominator summarizes the spread of the predictor x values. The more spread out the predictor values, the larger the denominator, and hence the narrower the interval. Therefore, we can decrease the width of our interval by ensuring that our predictor values are sufficiently spread out.
  • As the sample size increases, the width of the interval decreases. The sample size plays a role in two ways. First, recall that the t -multiplier depends on the sample size through n -2. Therefore, as the sample size increases, the t -multiplier decreases, and the length of the interval decreases. Second, the denominator \(\sum(x_i-\bar{x})^2\) also depends on n . The larger the sample size, the more terms you add to this sum, the larger the denominator, and the narrower the interval. Therefore, in general, you can ensure that your interval is narrow by having a large enough sample.

Six possible outcomes concerning slope \(\beta_{1}\) Section  

There are six possible outcomes whenever we test whether there is a linear relationship between the predictor x and the response y , that is, whenever we test the null hypothesis \(H_{0} \colon \beta_{1}\) = 0 against the alternative hypothesis \(H_{A} \colon \beta_{1} ≠ 0\).

When we don't reject the null hypothesis, \(H_{0} \colon \beta_{1} = 0\), any of the following three realities are possible:

  • We committed a Type II error. That is, in reality \(\beta_{1} ≠ 0\) and our sample data just didn't provide enough evidence to conclude that \(\beta_{1}\)≠ 0.
  • There really is not much of a linear relationship between x and y .
  • There is a relationship between x and y — it is just not linear.

When we do reject the null hypothesis , \(H_{0} \colon \beta_{1}\)= 0 in favor of the alternative hypothesis \(H_{A} \colon \beta_{1}\)≠ 0, any of the following three realities are possible:

  • We committed a Type I error. That is, in reality \(\beta_{1} = 0\), but we have an unusual sample that suggests that \(\beta_{1} ≠ 0\).
  • The relationship between x and y is indeed linear.
  • A linear function fits the data, okay, but a curved ("curvilinear") function would fit the data even better.

(1-\(\alpha\))100% t-interval for intercept parameter \(\beta_{0}\) Section  

Calculating confidence intervals and conducting hypothesis tests for the intercept parameter \(\beta_{0}\) is not done as often as it is for the slope parameter \(\beta_{1}\). The reason for this becomes clear upon reviewing the meaning of \(\beta_{0}\). The intercept parameter \(\beta_{0}\) is the mean of the responses at x = 0. If x = 0 is meaningless, as it would be, for example, if your predictor variable was height, then \(\beta_{0}\) is not meaningful. For the sake of completeness, we present the methods here for those situations in which \(\beta_{0}\) is meaningful.

The formula for the confidence interval for \(\beta_{0}\), in words, is:

\(b_0 \pm t_{\alpha/2, n-2} \times \sqrt{MSE} \sqrt{\dfrac{1}{n}+\dfrac{\bar{x}^2}{\sum(x_i-\bar{x})^2}}\)

The resulting confidence interval gives us a range of values that is likely to contain the true unknown value \(\beta_{0}\). The factors affecting the length of a confidence interval for \(\beta_{0}\) are identical to the factors affecting the length of a confidence interval for \(\beta_{1}\).

An \(\alpha\)-level hypothesis test for intercept parameter \(\beta_{0}\) Section  

Again, we follow standard hypothesis test procedures. First, we specify the null and alternative hypotheses:

  • Null hypothesis \(H_{0}\): \(\beta_{0}\) = some number \(\beta\)
  • Alternative hypothesis \(H_{A}\): \(\beta_{0}\) ≠ some number \(\beta\)

The phrase "some number \(\beta\)" means that you can test whether or not the population intercept takes on any value. By default, Minitab conducts the hypothesis test for testing whether or not \(\beta_{0}\) is 0. But, the alternative hypothesis can also state that \(\beta_{0}\) is less than (<) some number \(\beta\) or greater than (>) some number \(\beta\).

\(t^*=\dfrac{b_0-\beta}{\sqrt{MSE} \sqrt{\dfrac{1}{n}+\dfrac{\bar{x}^2}{\sum(x_i-\bar{x})^2}}}=\dfrac{b_0-\beta}{se(b_0)}\)

Third, we use the resulting test statistic to calculate the P -value. Again, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Finally, we make a decision. If the P -value is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. If we conduct a "two-tailed, not-equal-to-0" test, we conclude "there is sufficient evidence at the \(\alpha\) level to conclude that the mean of the responses is not 0 when x = 0." If the P -value is larger than the significance level \(\alpha\), we fail to reject the null hypothesis.

Drawing conclusions about intercept parameter \(\beta_{0}\) using Minitab Section  

Let's see how we can use Minitab to calculate confidence intervals and conduct hypothesis tests for the intercept \(\beta_{0}\). Minitab's regression analysis output for our skin cancer mortality and latitude example appears below. The work involved is very similar to that for the slope \(\beta_{1}\).

The line pertaining to the intercept, which Minitab always refers to as Constant , in the summary table of predictors has been bolded. It tells us that the estimated intercept coefficient \(b_{0}\), under the column labeled Coef , is 389.19 . The estimated standard error of \(b_{0}\), denoted se (\(b_{0}\)), in the column labeled SE Coef is 23.81 .

Regression Equation 

By default, the test statistic is calculated assuming the user wants to test that the mean response is 0 when x = 0. Note that this is an ill-advised test here because the predictor values in the sample do not include a latitude of 0. That is, such a test involves extrapolating outside the scope of the model. Nonetheless, for the sake of illustration, let's proceed to assume that it is an okay thing to do.

Dividing the estimated coefficient of 389.19 by the estimated standard error of 23.81, Minitab reports that the test statistic T is 16.34 . By default, the P -value is calculated assuming the alternative hypothesis is a "two-tailed, not-equal-to-0" hypothesis. Upon calculating the probability that a t random variable with n -2 = 47 degrees of freedom would be larger than 16.34, and multiplying the probability by 2, Minitab reports that P is 0.000 (to three decimal places). That is, the P -value is less than 0.001.

Because the P -value is so small (less than 0.001), we can reject the null hypothesis and conclude that \(\beta_{0}\) does not equal 0 when x = 0. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean mortality rate at a latitude of 0 degrees North is not 0. (Again, note that we have to extrapolate in order to arrive at this conclusion, which in general is not advisable.)

Proceed as previously described to calculate a 95% confidence interval for \(\beta_{0}\). Use Minitab to find the t -multiplier for you. Again, it is \(t_{\left(0.025, 47\right)} = 2.0117 \). Then, the 95% confidence interval for \(\beta_{0}\) is \(389.19 ± 2.0117\left(23.81\right) = \left(341.3, 437.1\right) \). (Alternatively, Minitab can display the interval directly if you click the "Results" tab in the Regression dialog box, select "Expanded Table" and check "Coefficients.") We can be 95% confident that the population intercept is between 341.3 and 437.1. That is, we can be 95% confident that the mean mortality rate at a latitude of 0 degrees North is between 341.3 and 437.1 deaths per 10 million people. (Again, it is probably not a good idea to make this claim because of the severe extrapolation involved.)

Statistical inference conditions Section  

We've made no mention yet of the conditions that must be true in order for it to be okay to use the above confidence interval formulas and hypothesis testing procedures for \(\beta_{0}\) and \(\beta_{1}\). In short, the "LINE" assumptions we discussed earlier — linearity, independence, normality, and equal variance — must hold. It is not a big deal if the error terms (and thus responses) are only approximately normal. If you have a large sample, then the error terms can even deviate somewhat far from normality.

Regression Through the Origin (RTO) Section  

In rare circumstances, it may make sense to consider a simple linear regression model in which the intercept, \(\beta_{0}\), is assumed to be exactly 0. For example, suppose we have data on the number of items produced per hour along with the number of rejects in each of those time spans. If we have a period where no items were produced, then there are obviously 0 rejects. Such a situation may indicate deleting \(\beta_{0}\) from the model since \(\beta_{0}\) reflects the amount of the response (in this case, the number of rejects) when the predictor is assumed to be 0 (in this case, the number of items produced). Thus, the model to estimate becomes

\(\begin{equation*} y_{i}=\beta_{1}x_{i}+\epsilon_{i},\end{equation*}\)

which is called a Regression Through the Origin (or RTO ) model. The estimate for \(\beta_{1}\)when using the regression through the origin model is:

\(b_{\textrm{RTO}}=\dfrac{\sum_{i=1}^{n}x_{i}y_{i}}{\sum_{i=1}^{n}x_{i}^{2}}.\)

Thus, the estimated regression equation is

\(\begin{equation*} \hat{y}_{i}=b_{\textrm{RTO}}x_{i}\end{equation*}.\)

Note that we no longer have to center (or "adjust") the \(x_{i}\)'s and \(y_{i}\)'s by their sample means (compare this estimate for \(b_{1}\) to that of the estimate found for the simple linear regression model). Since there is no intercept, there is no correction factor and no adjustment for the mean (i.e., the regression line can only pivot about the point (0,0)).

Generally, regression through the origin is not recommended due to the following:

  • Removal of \(\beta_{0}\) is a strong assumption that forces the line to go through the point (0,0). Imposing this restriction does not give ordinary least squares as much flexibility in finding the line of best fit for the data.
  • In a simple linear regression model, \(\sum_{i=1}^{n}(y_{i}-\hat{y}_i)=\sum_{i=1}^{n}e_{i}=0\). However, in regression through the origin, generally \(\sum_{i=1}^{n}e_{i}\neq 0\). Because of this, the SSE could actually be larger than the SSTO, thus resulting in \(r^{2}<0\).
  • Since \(r^{2}\) can be negative, the usual interpretation of this value as a measure of the strength of the linear component in the simple linear regression model cannot be used here.

If you strongly believe that a regression through the origin model is appropriate for your situation, then statistical testing can help justify your decision. Moreover, if data has not been collected near \(x=0\), then forcing the regression line through the origin is likely to make for a worse-fitting model. So again, this model is not usually recommended unless there is a strong belief that it is appropriate.

To fit a "regression through the origin model in Minitab click "Model" in the regular regression window and then uncheck the "Include the constant term in the model."

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to test for intercept in a regression problem in R

so for my question it asks me to test if the intercept for a linear regression model is greater than 9 in R. I'm not exactly sure how to do that. any ideas? I tried the t-test manual but it's not working for me.

user92777's user avatar

  • 1 $\begingroup$ You will need to paste in the question for context. In addition, please add the [self-study] tag & read its wiki . Then tell us what you understand thus far, what you've tried & where you're stuck. We'll provide hints to help you get unstuck. $\endgroup$ –  gung - Reinstate Monica Commented Oct 21, 2015 at 23:05
  • $\begingroup$ The estimate of your regressor with features = zero(s) should give you the intercept term. $\endgroup$ –  jeff Commented Oct 22, 2015 at 1:34

The R output, of summary(lm(y~x)) , gives a parameter estimate $b_{0}$ and a standard error se($b_{0}$) for the intercept. The typical t-test (for $H_0: \beta_0 = 0$) is obtained by comparing $T = \frac{b_{0} - 0}{\text{se}(b_{0})}$ to a t-distribution with the number of degrees of freedom equal to the error degrees of freedom. So, how would you modify the statistic $T$ when your hypothesis is $H_0: \beta_0 = 9$?

Scortchi - Reinstate Monica's user avatar

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged r regression or ask your own question .

  • Featured on Meta
  • Bringing clarity to status tag usage on meta sites
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Announcing a change to the data-dump process

Hot Network Questions

  • What would be non-slang equivalent of "copium"?
  • Input Impedance of a converter and output impedance of battery
  • "TSA regulations state that travellers are allowed one personal item and one carry on"?
  • Stuck on Sokoban
  • Is Intuitionism Indispensable in Mathematics?
  • Is there a phrase for someone who's really bad at cooking?
  • Using "no" at the end of a statement instead of "isn't it"?
  • Help identify part with marking code |ACG and VE
  • Can a rope thrower act as a propulsion method for land based craft?
  • Why was this lighting fixture smoking? What do I do about it?
  • My visit is for two weeks but my host bought insurance for two months is it okay
  • Are carbon fiber parts riveted, screwed or bolted?
  • Reusing own code at work without losing licence
  • Regression techniques for a “triangular” scatterplot
  • Distinctive form of "לאהוב ל-" instead of "לאהוב את"
  • Has the US said why electing judges is bad in Mexico but good in the US?
  • Do the amplitude and frequency of gravitational waves emitted by binary stars change as the stars get closer together?
  • How would you say a couple of letters (as in mail) if they're not necessarily letters?
  • Passport Carry in Taiwan
  • The size of elementary particles
  • Dress code for examiner in UK PhD viva
  • What explanations can be offered for the extreme see-sawing in Montana's senate race polling?
  • MetaPost: Get text width in a narrower environment
  • What's the difference between '$var' and 'var' in an arithmetic expansion?

zero intercept hypothesis

IMAGES

  1. Plot of points for zero-intercept linear regression of distance from

    zero intercept hypothesis

  2. The zero-intercept linear relationship between absorbance and Cm of

    zero intercept hypothesis

  3. Algebra: Finding Zeros (x-intercepts) of Polynomials By Factoring

    zero intercept hypothesis

  4. PPT

    zero intercept hypothesis

  5. Zeros Intercepts 01 Definitions and Solve by Graphing

    zero intercept hypothesis

  6. Linear regressions with zero intercept. Using linear regression with

    zero intercept hypothesis

VIDEO

  1. Z Transform

  2. Hypothesis with thousands of successful predictions VS Hypothesis with Zero predictions Which one r

  3. Hypothesis Testing

  4. First several non-trivial zeros of the Riemann Zeta function

  5. Global Test

  6. Hypothesis Testing using one-sample T-test and Z-test

COMMENTS

  1. hypothesis testing

    In hypothesis testing you do the same; if you want to show that the intercept (or the slope) is signficantly different from zero, then you assume the opposite, i.e. H0: β0 = 0 H 0: β 0 = 0 and try to derive a contradiction from this. As in statistics nothing is impossible we will not be able to derive something ''contradictory'' but we will ...

  2. 12.2.1: Hypothesis Test for Linear Regression

    The two test statistic formulas are algebraically equal; however, the formulas are different and we use a different parameter in the hypotheses. The formula for the t-test statistic is t = b1 (MSE SSxx)√ t = b 1 ( M S E S S x x) Use the t-distribution with degrees of freedom equal to n − p − 1 n − p − 1.

  3. Interpreting the Intercept in a Regression Model

    Start with a very simple regression equation, with one predictor, X. If X sometimes equals 0, the intercept is simply the expected value of Y at that value. In other words, it's the mean of Y at one value of X. That's meaningful. If X never equals 0, then the intercept has no intrinsic meaning.

  4. 7.2: Confidence interval and hypothesis tests for the slope and intercept

    In Chapter 6, the relationship between Hematocrit and body fat % for females appeared to be a weak negative linear association. The 95% confidence interval for the slope is -0.186 to 0.0155. For a 1% increase in body fat %, we are 95% confident that the change in the true mean Hematocrit is between -0.186 and 0.0155% of blood.

  5. 3.1

    An α-level hypothesis test for intercept parameter β 0. Conducting hypothesis tests and calculating confidence intervals for the intercept parameter β 0 is not done as often as it is for the slope parameter β 1. The reason for this becomes clear upon reviewing the meaning of β 0. The intercept parameter β 0 is the mean of the responses at ...

  6. Test regression slope

    Thus if your original x data were 4, 6, 9, 14 and your original y data were 14, 25, 36, 55, you perform the test for zero slope using x data 4, 6, 9, 14 and y data 18, 31, 45, 69. Since the intercept didn't change you simply use the usual test for intercept = zero, as shown in Example 2 of Confidence and Prediction Intervals. Charles. Reply

  7. 6.4

    Hypothesis test for testing that a subset — more than one, but not all — of the slope parameters are 0. Hypothesis test for testing that one slope parameter is 0. ... The regression parameter for \(x_{2}\) represents the difference between the estimated intercept for treatment 1 and the estimated intercept for reference treatment 3.

  8. 11.1: Testing the Hypothesis that β = 0

    11.1: Testing the Hypothesis that β = 0. The correlation coefficient, r r, tells us about the strength and direction of the linear relationship between x x and y y. However, the reliability of the linear model also depends on how many observed data points are in the sample.

  9. Hypothesis Test for Regression Slope

    Hypothesis Test for Regression Slope. This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y.. The test focuses on the slope of the regression line Y = Β 0 + Β 1 X. where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of ...

  10. 2.12

    The null and alternative hypotheses for a hypotheses test about the intercept are written as: \(H_{0} \colon \beta_{0} = 0\) \(H_{A} \colon \beta_{0} \ne 0\) In other words, the null hypothesis is testing if the population intercept is equal to 0 versus the alternative hypothesis that the population intercept is not equal to 0.

  11. How to Test the Significance of a Regression Slope

    Step 1. State the hypotheses. The null hypothesis (H0): B1 = 0. The alternative hypothesis: (Ha): B1 ≠ 0. Step 2. Determine a significance level to use. Since we constructed a 95% confidence interval in the previous example, we will use the equivalent approach here and choose to use a .05 level of significance. Step 3.

  12. When forcing intercept of 0 in linear regression is acceptable

    If you manually code x1 x 1 & x2 x 2 dummies for cyl as 0 or 1, then using it will force the intercept to zero & fit the two-parameter model E Y = β1x1 +β2x2 E. ⁡. Y = β 1 x 1 + β 2 x 2. But if R's been told that cyl_factor 's categorical it gets clever & fits the three-parameter model you describe.

  13. How to Interpret the Intercept in a Regression Model (With Examples)

    Interpreting the Intercept in Simple Linear Regression. A simple linear regression model takes the following form: ŷ = β0 + β1(x) where: ŷ: The predicted value for the response variable. β0: The mean value of the response variable when x = 0. β1: The average change in the response variable for a one unit increase in x.

  14. PDF Chapter 9 Simple Linear Regression

    c plot.9.2 Statistical hypothesesFor simple linear regression, the chief null hypothesis is H0 : β1 = 0, and the corresponding alter. ative hypothesis is H1 : β1 6= 0. If this null hypothesis is true, then, from E(Y ) = β0 + β1x we can see that the population mean of Y is β0 for every x value, which t.

  15. How do I test hypothesis of slope = 1 and intercept = 0 for observed vs

    Now look at the individual tests of the intercept and slope, if either is significant then you should reject your null of 0,1. You may also want to look at the correlation between the estimated intercept and slope, if that is high (in absolute value) then you may want to rerun with your predictions (x-variable) centered.

  16. 13.6 Testing the Regression Coefficients

    The null hypothesis [latex]\beta_1=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is zero. That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable "hours of unpaid work per week."

  17. 6.2.3

    6.2.3 - More on Model-fitting. Suppose two models are under consideration, where one model is a special case or "reduced" form of the other obtained by setting k of the regression coefficients (parameters) equal to zero. The larger model is considered the "full" model, and the hypotheses would be. H 0: reduced model versus H A: full model.

  18. 15.5: Hypothesis Tests for Regression Models

    Formally, our "null model" corresponds to the fairly trivial "regression" model in which we include 0 predictors, and only include the intercept term b 0. H 0 :Y i =b 0 +ϵ i. If our regression model has K predictors, the "alternative model" is described using the usual formula for a multiple regression model: H1: Yi = (∑K k=1 ...

  19. Chapter 5, TWO-VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS

    Set up the ANOVA table in the manner of Table 5.4 for the regression model given in (3.7.2) and test the hypothesis that there is no relationship between food expenditure and total expenditure in India.

  20. R linear regression test hypothesis for zero slope

    If I understand correctly, you're overthinking it. You're interested in a linear model mapping X, and an intercept, to Y. You're interested in the slope a mapping X to Y. The linear regression that you run (without that offset) finds a coefficient of 0.0062095. That coefficinet is very much significantly different from zero, with p<.00001.

  21. 9.1: Hypothesis Tests for Regression Coefficients

    A non-directional hypothesis, β≠0β≠0 does not imply a particular direction, it only implies that there is a relationship. This requires a two-tailed test where the critical value is 0.025 on both sides of the distribution. ... (Intercept) 10.540430 11.0968947 ## ideol -1.102355 -0.9903377. As shown, the upper limit of our estimated BB is ...

  22. 2.1

    It's easy to calculate a 95% confidence interval for β 1 using the information in the Minitab output. You just need to use Minitab to find the t -multiplier for you. It is t ( 0.025, 47) = 2.0117. Then, the 95% confidence interval for β 1 is − 5.9776 ± 2.0117 ( 0.5984) or (-7.2, -4.8).

  23. How to test for intercept in a regression problem in R

    The estimate of your regressor with features = zero (s) should give you the intercept term. Thanks for contributing an answer to Cross Validated! Asking for help, clarification, or responding to other answers. Making statements based on opinion; back them up with references or personal experience.