Simple Linear Regression
Affiliate Institute
Simple Linear Regression
Linear regression is a method of predicting an outcome of some event based on existing data. The regression utilizes known relationships that exist between different variables for predicting future behaviours of these variables (Thompson and Borrello, 1985). The variable that is explained is usually identified as the dependent variable because it is solely dependent on the changes of the independent variable, also called regressand. Regressions can be utilized for predicting such events as future sales, stock prices, and the level of utilization of a product among others.
One of the relationships that can be modelled using a simple linear regression is relationship between a companys sales and its advertisements. Theoretically, one can assume that sales and advertisement are somehow interconnected and that their relationship can be predicted. In such a case, the expenditures on advertisements are assumed to relate to the outcomes of sales. Therefore, amount of sales becomes the dependent variable, because the assumption made is that advertisement expenditures can affect this behaviour. However, a prediction from simple observations cannot reveal whether there is such a relationship and, if it does exist, its direction (Aguinis, 1995). In order to determine whether there is a negative or positive relationship one must conduct a robust econometric analysis. Furthermore, the explanatory power of an independent variable such as, in this case, advertisement expenditures, cannot be determined without statistical tools.
From the Linear Functions in Algebraic studies, one can recall that:
Whereby X is the independent variable, f(X) is the value of the dependent variable and
is a constant value. Furthermore, m is a parameter that describes the amount of change that occurs in the dependent variable as triggered by the corresponding change in the independent variable. In simple linear regression, and specifically in econometrics, the values of dependent variable are known and represented by the symbol in many cases and the independent variable is represented by X.Thus, the model specification in the case of sales and advertisement expenditures is given below:
; Whereby y represents the dependent variable (sales), is the y-intercept of the regression line (the mean value of y when x=0), describes the slope of the regression line, X is the independent variable, is error (disturbance), which at this stage is explained as the distance between the actual data points and the corresponding points in the regression model (Tjandrawinata and Simanjuntak, 2011).Read more about Research Paper Writing Help for Any Student. Feel free to order your paper from Essays-Services and forget about your worries.
Thus, the full model becomes:
+ (Lets use a hypothetical data of sales and advert expenditures in a company XYZ for a period of 10 years as shown in the table below:
YEAR |
SALES (in1000s) |
ADVERTISEMENT(000 US $) |
2005 |
100 |
55 |
2006 |
110 |
58 |
2007 |
112 |
60 |
2008 |
115 |
59 |
2009 |
117 |
62 |
2010 |
120 |
70 |
2011 |
115 |
60 |
2012 |
130 |
79 |
2013 |
125 |
80 |
2014 |
117 |
49 |
The data is transferred to a statistical software package such as Excel, STATA or SPSS. Next we can draw a scatter plot.
A statistical analysis conducted in STATA software package reveals the 90% confidence intervals the position of the fitted values. The scatter plot has helped in visualizing the variables and the trend line approximates where the real values should lie at. As can be observed from the diagram above, not all values have fallen at the regression line, meaning that the model in inadequate, or that there are unobserved factors during the random sampling. This problem is more often seen in econometrics and has led to complex modelling to minimize the effect of error terms (Uriel, 2013). The estimated model using STATA software is given in the table below:
Source |
Sum of Square |
Degree of freedom MS |
Number of observations |
10 |
|
F( 1, 8) |
11.11 |
||||
Model |
351.644308 |
1 351.644308 |
Prob> F |
0.0103 |
|
Residual |
253.255692 |
8 31.6569615 |
R-squared |
0.5813 |
|
Adj R-squared |
0.529 |
||||
Total |
604.9 |
9 67.2111111 |
Root MSE |
5.6265 |
|
Sales |
Coefficients |
Std. Err. t |
P>t |
[95% Conf. |
Interval] |
Advert |
0.6204028 |
.1861472 3.33 |
0.01 |
0.1911465 |
1.049659 |
_cons |
76.89054 |
11.89829 6.46 |
0 |
49.45304 |
104.328 |
Interpretation:
The above table indicate that the F-statistic is 11.11 at I degree of freedom generated from the ANOVA table. The R-Squared of 0.5813 can also be interpreted to mean that 58.13 percent of variation in sales is explained by the variation in the amount of advertisement expenditures. Therefore, from this analysis, one can conclude that there is a positive correlation between the two variables. The regression measures the overall strength of the association and does not reflect the extent of association of the two variables.
The typical procedure for finding the line of best fit as observed in the diagram above is called the method of Ordinary Least Squares. OLS is based upon a principle that the sum of squared 0errors (SSE) should be minimized in order to make the regression line have the least amount of errors. Analysing this trend can help in predictions of the values of the dependent variable so that a company can prepare its resources towards meeting the predicted events. However, extending the line should meet the assumption that the underlying process actually causes the relationship to exist and is valid even beyond the range of sampled datas (Durham College, 2011).
The last variable labelled (cons) is the intercept. It shows that when the company XYZ spends 0 dollars in expenditures (the independent variable is thus 0), the sales remain constant at 7689. Thus, this is associated with lack of willingness to spend any money on ads by the company. The result is therefore considered significantly different from 0 at 0.05 p-value.
The p-value of the regression model is 0.01. This follows a two-tailed t-test and it is used to test a given null hypothesis that the coefficient (parameter) is 0 (zero) at the significance level of 0.05. However, for interpretation we use the coefficient of advertisement which is 0.6204028. So without any other alterations, for every one unit increase in in advertisement expenditures, the sales volume increases by 0.6204028. Therefore, the increase in sales is a logical fact because mostly, advertisement leads to increase in sales since the companys visibility is increased.
Hypothesis test:
This hypothesis test posits a condition that a dependent variable is in a linear relationship with the independent variable. Also, for every given value assigned to the independent variable, the probability distribution of the corresponding dependent variable should exhibit the same standard deviation ?. For any given value of X, the Y value should be normally distributed. These are the Gauss Markov conditions (Osborne, 2001). Here is a mathematical representation of the hypothesis and, usually, if a significant relationship really occurs between the two or numerous variables, the slope cannot be equal zero.
The null hypothesis states that the slope of the above equation is equal to zero whereas the alternative hypothesis contradicts the first statement and concludes that the slope is not equal to zero. Tests of hypothesis usually rely on 95% confidence level. Also, the software running the test calculates the p-value that is very important for determining the validity of the outcome. In this case, p-value generated from the regression above will act as a guide to confirm the theory that sales and advertisement are correlated variables at least in theory.
Thus the null hypothesis (Ho) that advertisement is not significantly different from zero is rejected in favor of the alternative hypothesis (HA) which states that it is indeed different from zero. Interpreted in another way, the p-value of 0.01 indicates that if advertisement did not have any effect, the study would obtain and report the observed difference or more in 1 % of other studies due to random sampling error.