# Biostat 276 Homework 1 Bayesian Adaptive Lasso Solved

Consider p = 1. Simulate 5,000 Monte Carlo samples from the conditional prior β|τ2 = 1 and obtain a plot of the density using the R function density.

n <- 5000

plot(density(rnorm(n,0,1)), main=TeX(paste("$\\beta$", "marginal")))

β marginal

−4
−2
0
2
4
0.0
0.1
0.2
0.3
0.4
Density

N = 5000   Bandwidth = 0.1624

Part b.

Consider p = 1. Simulate 5,000 Monte Carlo samples from the marginal prior β, considering λ2 = 2, so that E(τ2|λ) = 1. Obtain a plot of the density as in a.

lambda <- sqrt(2) tau.sq <- rgamma(n,shape=1,rate = lambda^2/2) beta.marginal <- rnorm(n,0,sqrt(tau.sq))

plot(density(beta.marginal), main=TeX(paste("$\\lambda^2 = 2$")), xlim=c(-5,5))

−4
−2
0
2
4
0.0
0.2
0.4
0.6
λ
2
=
2
Density

N = 5000   Bandwidth = 0.1183

Part c.

Consider p = 1. Add a hyper prior on γ = 1/γ ∼ Gamma(a,rate = b). Assess how the marginal prior of β changes for a = 1 and values of b ≥ 1.

set.seed(1) par(mfrow=c(2,2)) rates <- c(1,3,5,10) for(b in rates){

lambda <- 1/rgamma(n,1,b) tau.sq <- rgamma(n,shape=1,rate = lambda^2/2) beta.marginal <- rnorm(n,0,sqrt(tau.sq)) plot(density(beta.marginal), main=paste("rate b = ",b),xlim=c(-5,5))

}

0.0
0.4
0.8
Density
0.0
1.5
Density

rate b =  1
−4      −2       0        2            4

N = 5000   Bandwidth = 0.09665

rate b =  3
−4         −2       0        2           4

N = 5000   Bandwidth = 0.03314

0
2
4
Density
0
4
8
Density

rate b =  5
−4         −2       0        2           4

N = 5000   Bandwidth = 0.01929

rate b =  10
−4          −2       0        2        4

N = 5000   Bandwidth = 0.009661

Part d.

Considering the hyper prior in c., describe a Markov Chain Monte Carlo algorithm to sample from the posterior distribution of β and σ2.

I will implement a joint Gibbs and Metropolis sampler. The model is

y|β,σ2 ∼ N(Xβ,σ2I)

Inverse-Gamma(1, λ22) λ ∼ Inverse-Gamma(a,1/b) σ2 ∼ Inverse-Gamma(0.1,0.1).

I need the full conditionals

{β1,...,βp|y,σ2,τ12,...,τp2,λ},

{σ2|y,β1,...,βp,τ12,...,τp2,λ},

{τ12,...,τp2|y,β1,...,βp,σ2,λ},

{λ|y,β1,...,βp,σ2,τ12,...,τp2}

which are all proportional to

p(y|β1,...,βp,τ12,...,τp2,σ2,λ) × p(β1,...,βp|τ12,...,τp2) × p(τ12,...,τp2|λ)p(λ)p(σ2)

p(β1,...,βp,τ12,...,τ ,...,βp,τ12,...,τp2,σ2,λ)

× p(β1,...,βp|τ12,...,τp2)

× p ,...,τp2|λ)p(λ)p(σ2).

As a function of just σ2, this is proportional to

p(y|β1,...,βp,τ12,...,τp2,σ2,λ)p(σ2) =             N(Xβ,σ2I)IG(0.1,0.1).

Time to show this is inverse-gamma distributed.

N(y;Xβ,σ2I)IG(σ2;q,r)

r

σ2

As a function of β, the conditional is proportional to

p(y|β1,...,βp,τ12,...,τp2,σ2,λ)p(β1,...,βp|τ12,...,τp2)

p

=            N(Xβ,σ2I) · YN(0,τi2)

i=1 =     N(Xβ,Σ) · N(0,Ω), where Ω = diag(τ12,...,τp2)

=       N(m,M)

because the posterior is determined by the quadratic form

(y − Xβ)Σ−1(y − Xβ) + βΩ−1β = (β − m)M−1(β − m).

Completing the square gives m = MXΣ−1y and M = (XΣ−1X + Ω−1)−1.

As a function of τ12,...,τp2, the target is proportional to

p(β1,...,βp|τ12,...,τp2)p(τ12,...,τp2|λ)

p                                             p                                 λ2

= YN(βi;0,τi2) · YIG(τi2;1, 2 )

i=1                                          i=1

Finally, as a function of λ, the target distribution is proportional to

p(τ12,...,τp2|λ)p(λ)

p                                 λ2

= YIG(τi2;1, 2 ) · IG(λ;a,b)

i=1

Now I can build an algorithm to iteratively update through these target distributions. I take the starting value of β(0) to be the least-squares solution βˆ along with σ2(0) = σˆ2, the MLE for σ2.

Result: Samples from joint posterior p(β,σ2|y) for s in # samples do

λ
else
λ
end
end
σ
2(
s
+1)
β
(
s
+1)
note: extra term due in logr due to Jacobian of transformation λ ← exp(log(λ(s)) + ε), ε ∼ N(0,δ2) logr ← logπλ(λ∗) − logπλλ(s) + logλ∗ − logλ(s) if (logunif(0,1) < logr) then

(s+1) ← λ∗

(s+1) ← λ(s) for j in 1:p do

note: extra term due in logr due to Jacobian of transformation τj2∗ ← exp(log(τj2(s)) + ε), ε ∼ N(0,δ2) logr ← logπτj2(τj2∗) − logπτj2(τj2(s)) + log(τj2∗) − log(τj2(s)) if (logunif(0,1) < logr) then

else τj2(s+1) ← τj2(s)

end
∼ IG(n/2 + a,2b + (y − Xβ(s))(y − Xβ(s))/2)

∼ N(m,M),where

M = (XΣ−1X + Ω−1)−1 and m = M(XΣ−1y)

Σ = σ2(s+1) and Ω = diag ,...,τp2(s+1)) end

Part f.

Implement such algorithm in R and compare your results with estimates obtained using glmnet(). In particular, you should test your results on the diabetes data available from lars, (use the matrix of predictors x).

## Data processing

sourceCpp("helperFunctions.cpp") set.seed(1) data("diabetes")

X <- cbind(rep(1,length(diabetes$x)),cbind(diabetes$x)); y <- diabetes$y; n <- nrow(X); p <- ncol(X); samples <- 1000 ## Initialize starting values lambda <- 1 tau2 <- rep(1000,p) beta <- solve(t(X)%*%X)%*%t(X)%*%y sigma2 <- t(y-X%*%beta)%*%(y-X%*%beta)/n sigma2.chain <- lambda.chain <- rep(0,samples) beta.chain <- tau2.chain <- matrix(0,nrow=p,ncol=samples) ## MCMC for(s in 2:samples){ lambda <- lambdaDraw(current=lambda,tau2=tau2,a=1,b=1) tau2 <- tau2Draw(current=tau2,beta=beta,lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM$cov)) lambda.chain[s] <- lambda sigma2.chain[s] <- sigma2 beta.chain[,s] <- beta tau2.chain[,s] <- tau2 } # Examine markov chains # plot(beta.chain[1,floor(samples/4):samples],type="l") # Plot table of coefficients from Glmnet and Bayesian Lasso comparison <- data.frame( "Bayesian Lasso" = rowMeans(beta.chain[,floor(samples/4):samples]), "Glmnet" = matrix(coef(glmnet(y=y,x=X),alpha=1,s=1)[-2]) ) kable(comparison, "latex", booktabs = T) Bayesian.Lasso Glmnet 152.0653282 152.13348 0.0986925 0.00000 -146.5473735 -195.89915 515.9450575 522.05142 267.6553181 296.18834 -64.7222100 -101.86185 -36.1819103 0.00000 -170.9613979 -223.22347 59.2924739 0.00000 468.2759684 513.57366 50.7368620 53.86052 • I initially notice the difference in parameterization between glment’s lasso and my Bayesian lasso. I’m viewing the coefficients for λ = 1 in glmnet and a value of b = 1 for Bayesian lasso. As I’ll show later, in this implementation of Bayesian lasso, shrinkage is very sensitive to the hyperparameter b of λ. Part g. Free λ and carry out a sensitivity analysis assessing the behavior of the posterior distribution of β and σ2, as hyper parameters a and b are changed. Explain clearly the rationale you use to assess sensitivity and provide recommendations for the analysis of the diabetes data. # Sequence of lambdas for comparison with glmnet lambdas <- seq(from=-12, to = -1, length.out = 12) # Keep track of posterior mean of beta for each fixed lambda post.means <- matrix(NA,nrow=p,ncol=length(lambdas)) for(i in 1:length(lambdas)){ for(s in 2:samples){ lambda <- exp(lambdas[i]) tau2 <- tau2Draw(current=tau2,beta=beta,lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM$cov)) sigma2.chain[s] <- sigma2 beta.chain[,s] <- beta tau2.chain[,s] <- tau2 } post.means[,i] <- matrix(rowMeans(beta.chain[,floor(samples/4):samples]),nrow = p) } −4 −2 0 2 4 −600 −200 200 600 Log Lambda Coefficients 10 10 7 4 0 −12 −10 −8 −6 −4 −2 −500 0 500 log lambdas Coefficients • Glmnet is on the left and Bayesian lasso on the right. These regularization paths look very similar. Part g. Free λ and carry out a sensitivity analysis assessing the behavior of the posterior distribution of β and σ2, as hyper parameters a and b are changed. Explain clearly the rationale you use to assess sensitivity and provide recommendations for the analysis of the diabetes data. ## Sequence of b's that define a path of hyperparameters for lambda bs <- c(seq(from=1e-5,to=20,length.out = 30)) betasB <- matrix(0,nrow=p,ncol=length(bs)) ## Keep track of posterior means for each b value whith lambda free post.means <- matrix(NA,nrow=p,ncol=length(bs)) for(j in 1:length(bs)){ for(s in 2:samples){ lambda <- lambdaDraw(current=lambda,tau2=tau2,a=1,b=1/bs[j]) tau2 <- tau2Draw(current=tau2,beta=beta, lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM\$cov)) beta.chain[,s] <- beta

} betasB[,j] <- rowMeans(beta.chain[,floor(samples/4):samples]) }
Effect of 1/b

0
5
10
15
20
−200
0
100
300
500
1
/b's
Coefficients

•     Though I didn’t include the plots, the shrinkage of coefficients seemed very robust to changes in a for fixed b, so I chose to fix a = 1 and focus on varying b. From the plot, I notice as 1/b approaches zero, the coefficients approach the least-squares estimates. As 1β increases, there’s an increasing amount of shrinkage towards zero. This matches behavior of the regularization paths in part g as lambda is fixed and increasing.

Part e.
C++
Helper functions.

#include <cmath

#include <math.h

#include <random

#include <RcppArmadillo.h using namespace Rcpp; using namespace std;

double loglambdaTarget(double lambda, vector<double tau2, double a, double b) { return((-a-1)*log(lambda)-(pow(lambda,2)/2*accumulate(tau2.begin(),tau2.end(),0))-1

}

double logtau2jTarget(double tau2j, double lambda, double betaj){ return(-log(sqrt(tau2j)) - 1.0/2.0*(1.0/tau2j*pow(betaj,2) + pow(lambda,2)*tau2j)); }

// [[Rcpp::export]]

double lambdaDraw(double current, vector<double tau2, double a, double b){ std::random_device rd; std::mt19937 mt(rd()); std::uniform_real_distribution<double dist(0, 1.0); std::normal_distribution<double norm(0, current); double proposed = exp(log(current)+norm(mt)); double logr = loglambdaTarget(proposed,tau2,1,b)loglambdaTarget(current,tau2,1,b)+log(proposed)log(current);

if(log(dist(mt))<logr){ current = proposed;

} return current;

}

// [[Rcpp::export]]

vector<double tau2Draw(vector<double current, vector<double beta, double lambda){ std::random_device rd; std::mt19937 mt(rd()); std::uniform_real_distribution<double dist(0, 1.0); std::normal_distribution<double norm(0, 1); for(int j=0;j < current.size(); j++){ double tau2j_proposed = exp(log(current[j])+norm(mt)); double logr = logtau2jTarget(tau2j_proposed, lambda, beta[j]) logtau2jTarget(current[j], lambda, beta[j]) + log(tau2j_proposed)-log(current[j]);

if(log(dist(mt)) < logr){ current[j] = tau2j_proposed; } } return current;
/(b*lambda));

}

// [[Rcpp::export]]

double sigma2Draw(const arma::vec & beta, const arma::vec & y, const arma::mat & X){ std::random_device rd; std::mt19937 mt(rd()); int n = X.n_rows; arma::colvec coef = arma::solve(X, y); arma::colvec resid = y - X*coef; double rss = arma::as_scalar(arma::trans(resid)*resid); std::gamma_distribution<double gamma(n/2.0+0.1,1.0/(rss/2.0+0.1)); return 1.0/gamma(mt);

}

// [[Rcpp::export]]

List betaMeanCov(const arma::vec & y, const arma::mat & X, double sigma2, const std::random_device rd; std::mt19937 mt(rd()); int p = X.n_cols; arma::mat I = arma::eye(p,p); arma::mat tau2_inv = arma::inv(arma::diagmat(tau2)); arma::mat M = arma::inv(X.t()*X/sigma2+tau2_inv); arma::colvec m = M*X.t()*y/sigma2; return List::create(Named("cov") = M, Named("mean")= m);

}

// List mcmc(double lambda, double sigma2, const arma::vec & tau2, const arma::vec & beta2){

//

//                       return List::create(Named("beta.chain") = M, Named("sigma2.chain")= m,

//             Named("tau2.chain") = M, Named("lambda.chain") = M); // }
arma::vec