Title: | Elastic Net SearcheR |
---|---|
Description: | Elastic net regression models are controlled by two parameters, lambda, a measure of shrinkage, and alpha, a metric defining the model's location on the spectrum between ridge and lasso regression. glmnet provides tools for selecting lambda via cross validation but no automated methods for selection of alpha. Elastic Net SearcheR automates the simultaneous selection of both lambda and alpha. Developed, in part, with support by NICHD R03 HD094912. |
Authors: | Peter DeWitt [aut, cre], Tell Bennett [ctb] |
Maintainer: | Peter DeWitt <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0.9001 |
Built: | 2024-11-04 05:35:31 UTC |
Source: | https://github.com/dewittpe/ensr |
Search a grid of values of alpha and lambda for minimum mean CV error
ensr( x, y, alphas = seq(0, 1, length = 10), nlambda = 100L, standardize = TRUE, nfolds = 10L, foldid, envir = parent.frame(), ... )
ensr( x, y, alphas = seq(0, 1, length = 10), nlambda = 100L, standardize = TRUE, nfolds = 10L, foldid, envir = parent.frame(), ... )
x |
|
y |
response |
alphas |
a sequence of alpha values |
nlambda |
The number of |
standardize |
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on the
original scale. Default is |
nfolds |
number of folds - default is 10. Although |
foldid |
an optional vector of values between 1 and |
envir |
environment in which to evaluate a cv.glmnet call |
... |
Other arguments that can be passed to |
Construct a data frame with values for lambda and alpha with an indicator to know if the model is worth fitting.
lambda_alpha_grid(lambdas, alphas, nlambda = 10L, lmin_factor = 1e-04)
lambda_alpha_grid(lambdas, alphas, nlambda = 10L, lmin_factor = 1e-04)
lambdas |
a vector of max lambda values for each alpha given |
alphas |
a vector of alpha values corresponding to the max lambdas |
nlambda |
number of lambdas to generate for each alpha before creating the grid |
lmin_factor |
the smallest lambda value is defined as |
data(tbi) Xmat <- model.matrix( ~ . - injury1 - injury2 - injury3 - 1, data = tbi) Yvec <- matrix(tbi$injury1, ncol = 1) alphas <- seq(0, 1, length = 20) lga <- lambda_alpha_grid(alphas = alphas, lambdas = lambda_max(Yvec, Xmat, alpha = alphas)) ggplot2::ggplot() + ggplot2::theme_bw() + ggplot2::aes_string(x = "a", y = "log10(l)") + ggplot2::geom_path(data = lga$top) + ggplot2::geom_point(data = lga$lgrid, mapping = ggplot2::aes(color = cos(a) + sin(log10(l)))) + ggplot2::geom_contour(data = lga$lgrid, mapping = ggplot2::aes(z = cos(a) + sin(log10(l)))) + ggplot2::scale_color_gradient2(low = "blue", high = "red", mid = "grey")
data(tbi) Xmat <- model.matrix( ~ . - injury1 - injury2 - injury3 - 1, data = tbi) Yvec <- matrix(tbi$injury1, ncol = 1) alphas <- seq(0, 1, length = 20) lga <- lambda_alpha_grid(alphas = alphas, lambdas = lambda_max(Yvec, Xmat, alpha = alphas)) ggplot2::ggplot() + ggplot2::theme_bw() + ggplot2::aes_string(x = "a", y = "log10(l)") + ggplot2::geom_path(data = lga$top) + ggplot2::geom_point(data = lga$lgrid, mapping = ggplot2::aes(color = cos(a) + sin(log10(l)))) + ggplot2::geom_contour(data = lga$lgrid, mapping = ggplot2::aes(z = cos(a) + sin(log10(l)))) + ggplot2::scale_color_gradient2(low = "blue", high = "red", mid = "grey")
Determine the lambda_max value that would be generated from a call to
glmnet
without making that call.
lambda_max(y, x, standardize = TRUE, alpha = 0, lmin_factor = 1e-04, ...)
lambda_max(y, x, standardize = TRUE, alpha = 0, lmin_factor = 1e-04, ...)
y |
the response vector |
x |
the predictor matrix |
standardize |
logicial, should the x matrix be standardized? |
alpha |
the glmnet alpha value |
lmin_factor |
the smallest lambda value is defined as |
... |
other args |
data(tbi) Xmat <- model.matrix( ~ . - injury1 - injury2 - injury3 - 1, data = tbi) Yvec <- matrix(tbi$injury1, ncol = 1) alphas <- seq(0, 1, length = 20) lambda_max(Yvec, Xmat, alpha = alphas) # Look at different options for standardizing the inputs. dat <- expand.grid(standardize = c(TRUE, FALSE), alpha = alphas) lmax <- Map(lambda_max, standardize = dat$standardize, alpha = dat$alpha, MoreArgs = list(y = Yvec, x = Xmat)) gmax <- Map(glmnet::glmnet, standardize = dat$standardize, alpha = dat$alpha, MoreArgs = list(y = Yvec, x = Xmat)) dat$gmax <- sapply(gmax, function(f) f$lambda[1]) dat$lmax <- unlist(lmax) par(mfrow = c(1, 2)) with(subset(dat, standardize == TRUE), { plot(log10(gmax), log10(lmax)) abline(0, 1) title(main = "standardize == TRUE") }) with(subset(dat, standardize == FALSE), { plot(log10(gmax), log10(lmax)) abline(0, 1) title(main = "standardize == FALSE") })
data(tbi) Xmat <- model.matrix( ~ . - injury1 - injury2 - injury3 - 1, data = tbi) Yvec <- matrix(tbi$injury1, ncol = 1) alphas <- seq(0, 1, length = 20) lambda_max(Yvec, Xmat, alpha = alphas) # Look at different options for standardizing the inputs. dat <- expand.grid(standardize = c(TRUE, FALSE), alpha = alphas) lmax <- Map(lambda_max, standardize = dat$standardize, alpha = dat$alpha, MoreArgs = list(y = Yvec, x = Xmat)) gmax <- Map(glmnet::glmnet, standardize = dat$standardize, alpha = dat$alpha, MoreArgs = list(y = Yvec, x = Xmat)) dat$gmax <- sapply(gmax, function(f) f$lambda[1]) dat$lmax <- unlist(lmax) par(mfrow = c(1, 2)) with(subset(dat, standardize == TRUE), { plot(log10(gmax), log10(lmax)) abline(0, 1) title(main = "standardize == TRUE") }) with(subset(dat, standardize == FALSE), { plot(log10(gmax), log10(lmax)) abline(0, 1) title(main = "standardize == FALSE") })
A computer simulation of water moving through a landfill. Detailed
explanation for the variables and the construction of the data set is found
in
vignette("ensr-datasets", package = "ensr")
landfill
landfill
An object of class data.table
(inherits from data.frame
) with 974 rows and 48 columns.
vignette("ensr-datasets", package = "ensr")
Using either the lambda.min
or lambda.1se
, find the preferable
model from the ensr
object and return a prediction.
## S3 method for class 'ensr' predict(object, ...) ## S3 method for class 'ensr' coef(object, ...)
## S3 method for class 'ensr' predict(object, ...) ## S3 method for class 'ensr' coef(object, ...)
object |
a |
... |
other arguments passed along to |
The glmnet::predict
argument s
is ignored if specified and
attempted to be passed via ...
. The value of s
that is passed
to glmnet::predict
is determined by the value of lambda.min
or
lambda.1se
found from a call to preferable
.
Find the preferable Elastic Net Model from an ensr object. The preferable model is defined as the model with the lowest mean cross validation error and largest alpha value.
preferable(object, ...)
preferable(object, ...)
object |
an ensr object |
... |
not currently used. |
a glmnet object associated with the smallest cvm. If the min cvm is not unique, then the model with the smallest cvm with largest alpha value is returned. If that is not unique, then is all the "preferable" models have zero non-zero coefficients the model with the largest lambda and largest alpha value is returned. Lastly, if a unquie model is still not identified an error will be thrown.
Center and scale vectors by mean/standard deviation or median/IQR with the option to base the standardization only on unique observations.
standardize( x, stats = list(center = "mean", scale = "sd"), use_unique = TRUE, margin )
standardize( x, stats = list(center = "mean", scale = "sd"), use_unique = TRUE, margin )
x |
numeric data to standardize |
stats |
a list defining the centering and scaling statistics. |
use_unique |
use only unique values of |
margin |
passed to apply if |
x <- 1:100 standardize(x) standardize(x, stats = list(center = "median", scale = "IQR")) xmat <- matrix(1:50, nrow = 10) standardize(xmat, margin = 0) standardize(xmat, margin = 1) standardize(xmat, margin = 2) xarray <- array(1:60, dim = c(5, 2, 6)) standardize(xarray, margin = 0) standardize(xarray, margin = 1:2) # Standardize a data.frame standardize(mtcars) # a generic list object alist <- list(x = rep(1:10, 2), y = rnorm(100), z = matrix(1:10, nrow = 2)) standardize(alist, margin = 0) standardize(alist, margin = 1)
x <- 1:100 standardize(x) standardize(x, stats = list(center = "median", scale = "IQR")) xmat <- matrix(1:50, nrow = 10) standardize(xmat, margin = 0) standardize(xmat, margin = 1) standardize(xmat, margin = 2) xarray <- array(1:60, dim = c(5, 2, 6)) standardize(xarray, margin = 0) standardize(xarray, margin = 1:2) # Standardize a data.frame standardize(mtcars) # a generic list object alist <- list(x = rep(1:10, 2), y = rnorm(100), z = matrix(1:10, nrow = 2)) standardize(alist, margin = 0) standardize(alist, margin = 1)
This data is synthetic, that is, it is random data generated to have similar
properties to a data set used for studying traumatic brain injuries. The
pcode1
... pcode6
, ncode1
... ncode6
columns
are indicators for procedure or billing codes associated with a hospital
stay for TBI.
tbi
tbi
a data.table with 1323 rows and 18 columns.
Each row of the tbi
data.table is a unique subject.
age, in days
indicator for sex, 1 == female, 0 == male
length of stay in the hosptial
indicator for if the patient had pcode1
indicator for if the patient had pcode2
indicator for if the patient had pcode3
indicator for if the patient had pcode4
indicator for if the patient had pcode5
indicator for if the patient had pcode6
indicator for if the patient had ncode1
indicator for if the patient had ncode2
indicator for if the patient had ncode3
indicator for if the patient had ncode4
indicator for if the patient had ncode5
indicator for if the patient had ncode6
First of three specific types of TBI
Second of three specific types of TBI
Third of three specific types of TBI
vignette("ensr-datasets", package = "ensr")