DescriptionEffect of multicollinearity on coefficients of linear model.png
English: The true parameters are a_1= 2,a_2 =4 which are reliably estimated in the case of uncorrelated X_1 and X_2 (black case) but are unreliably estimated when X_1 and X_2 are correlated (red case). 1000 linear fits on 1000 training data sets are performed.
library(tidyverse)
sim <- function(rho){
#Number of samples to draw
N = 50
#Make a covariance matrix
covar = matrix(c(1,rho, rho, 1), byrow = T, nrow = 2)
# Append a column of 1s to N draws from a 2-dimensional
# Gaussian
# With covariance matrix covar
X = cbind(rep(1,N),MASS::mvrnorm(N, mu = c(0,0),
Sigma = covar))
# True betas for our regression
betas = c(1,2,4)
# Make the outcome
y = X%*%betas + rnorm(N,0,1)
# Fit a linear model
model = lm(y ~ X[,2] + X[,3])
# Return a dataframe of the coefficients
return(tibble(a1 = coef(model)[2], a2 = coef(model)[3]))
}
#Run the function 1000 times and stack the results
zero_covar = rerun(1000, sim(0)) %>%
bind_rows
#Same as above, but the covariance in covar matrix
#is now non-zero
high_covar = rerun(1000, sim(0.95)) %>% bind_rows
#plot
zero_covar %>%
ggplot(aes(a1,a2)) +
geom_point(data = high_covar, color = 'red') +
geom_point()
to share – to copy, distribute and transmit the work
to remix – to adapt the work
Under the following conditions:
attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
The true parameters are a_1= 2,a_2 =4 which are reliably estimated in the case of uncorrelated X_1 and X_2 (black case) but are unreliably estimated when X_1 and X_2 are correlated (red case)