asgl fits penalized regression models for high-dimensional variable selection.
It supports linear (lm), quantile (qr), and logistic (logit) regression,
with a rich menu of penalizations — from plain Lasso to Adaptive Sparse Group
Lasso (ASGL) — all through a single scikit-learn compatible Regressor class.
The package is especially useful when:
- Variables have a known group structure (gene pathways, dummy-variable families, …)
- You need simultaneous group- and individual-level sparsity
- You want adaptive weights to improve oracle properties
- Your design matrix
Xis ascipy.sparsematrix
Based on:
- Adaptive Sparse Group Lasso in Quantile Regression
asgl: A Python Package for Penalized Linear and Quantile Regression
| Feature | Details |
|---|---|
| Models | Linear (lm), quantile (qr), logistic binary classification (logit) |
| Penalizations | lasso, ridge, gl, sgl, alasso, aridge, agl, asgl, or None |
| Sparse input | Both dense and scipy.sparse matrices accepted. |
| Multi-output Y | lm and qr accept a 2D y matrix for simultaneous multi-response fitting |
| Solver fallback | solver accepts a list; falls back through installed CVXPY solvers automatically |
| Adaptive weights | 8 built-in weight techniques: pca_pct, pca_1, pls_pct, pls_1, lasso, ridge, unpenalized, sparse_pca |
| sklearn API | Full fit / predict / score / GridSearchCV / cross_val_predict support |
pip install asglRequirements: Python >= 3.10, cvxpy >= 1.5.0, numpy >= 1.20.0, scikit-learn >= 1.6, scipy >= 1.1
To run the test suite after installation:
pytestimport numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor
X, y = make_regression(n_samples=500, n_features=50, n_informative=20,
noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = Regressor(model='lm', penalization='lasso', lambda1=0.1)
model.fit(X_train, y_train)
print(model.coef_)
print(mean_squared_error(y_test, model.predict(X_test)))from asgl import Regressor
Regressor(
model='lm', # 'lm' | 'qr' | 'logit'
penalization='lasso', # see Penalizations table below, or None
quantile=0.5, # quantile level (qr only)
fit_intercept=True,
lambda1=0.1, # penalization strength
alpha=0.5, # lasso/group-lasso tradeoff for sgl/asgl
solver='default', # str or list[str] — CVXPY solver(s)
canon_backend='CPP', # 'CPP' | 'SCIPY' | 'COO'
verbose=False,
weight_technique='pca_pct', # adaptive weight method (adaptive penalties only)
individual_power_weight=1,
group_power_weight=1,
variability_pct=0.9,
lambda1_weights=0.1,
spca_alpha=1e-5,
spca_ridge_alpha=1e-2,
individual_weights=None, # override weight estimation with custom array
group_weights=None,
tol=1e-3,
weight_tol=1e-4,
)Penalizations
penalization |
Type | Group structure required |
|---|---|---|
None |
Unpenalized | No |
'lasso' |
Individual | No |
'ridge' |
Individual | No |
'gl' |
Group | Yes |
'sgl' |
Individual + Group | Yes |
'alasso' |
Adaptive individual | No |
'aridge' |
Adaptive individual | No |
'agl' |
Adaptive group | Yes |
'asgl' |
Adaptive individual + Group | Yes |
Key methods
| Method | Description |
|---|---|
fit(X, y, group_index=None) |
Fit the model |
predict(X) |
Predict (regression output or class labels for logit) |
predict_proba(X) |
Class probabilities (logit only) |
decision_function(X) |
Raw linear scores |
score(X, y) |
R² (regression) or accuracy (classifier) |
Fitted attributes
| Attribute | Description |
|---|---|
coef_ |
(n_features,) or (n_features, n_outputs) coefficient array |
intercept_ |
Intercept scalar |
n_features_in_ |
Number of features seen during fit |
solver_stats_ |
Dict with solver name, iterations, timing |
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from asgl import Regressor
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25,
noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
group_index = np.repeat(np.arange(1, 11), 5) # 10 groups of 5 features each
model = Regressor(model='qr', penalization='asgl', quantile=0.5)
param_grid = {
'lambda1': [1e-3, 1e-2, 1e-1, 1.0],
'alpha': [0.0, 0.5, 1.0],
}
cv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error',
n_iter=12, cv=5)
cv.fit(X_train, y_train, **{'group_index': group_index})
print(cv.best_params_)
print(cv.score(X_test, y_test))import scipy.sparse as sp
from sklearn.datasets import make_regression
from asgl import Regressor
X_dense, y = make_regression(n_samples=500, n_features=200, n_informative=30,
random_state=0)
X = sp.random(500, 200, density=0.05, format='csr') # or your real sparse matrix
model = Regressor(model='lm', penalization='lasso', lambda1=0.05)
model.fit(X, y)
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")import numpy as np
from sklearn.datasets import make_regression
from asgl import Regressor
X, y_1d = make_regression(n_samples=300, n_features=30, n_informative=10,
noise=3, random_state=7)
y = np.column_stack([y_1d, y_1d * 0.5 + np.random.randn(300) * 2]) # 2 outputs
group_index = np.repeat(np.arange(1, 6), 6) # 5 groups
model = Regressor(model='lm', penalization='gl', lambda1=0.1)
model.fit(X, y, group_index=group_index)
print(model.coef_.shape) # (n_features, 2)from asgl import Regressor
# Try CLARABEL first, then SCS, then let cvxpy choose
model = Regressor(model='lm', penalization='lasso',
solver=['CLARABEL', 'SCS', 'default'])
model.fit(X_train, y_train)
print(model.solver_stats_['solver_name'])import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import accuracy_score
from asgl import Regressor
X, y = make_classification(n_samples=1000, n_features=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
model = Regressor(model='logit', penalization='ridge')
proba_cv = cross_val_predict(model, X_train, y_train, method='predict_proba', cv=5)
# Find threshold that maximises CV accuracy
thresholds = np.linspace(0.01, 0.99, 99)
best_thr = thresholds[np.argmax(
[accuracy_score(y_train, (proba_cv[:, 1] >= t).astype(int)) for t in thresholds]
)]
model.fit(X_train, y_train)
test_preds = (model.predict_proba(X_test)[:, 1] >= best_thr).astype(int)
print(f"Test accuracy: {accuracy_score(y_test, test_preds):.3f}")If you use asgl in a scientific publication, please cite:
@article{mendez2022adaptive,
title = {Adaptive sparse group lasso in quantile regression},
author = {M{\'e}ndez-Civieta, {\'A}lvaro and Aguilera-Morillo, M Carmen and Lillo, Rosa E},
journal = {Advances in Data Analysis and Classification},
year = {2021},
doi = {10.1007/s11634-020-00413-8}
}Full paper | Package paper | Towards Data Science walkthrough
Contributions are welcome! Please open an issue to discuss ideas or submit a pull request.
See CONTRIBUTORS.md for a full list of contributors.
v2.2.0 incorporates a major contribution from zeyuz35: sparse matrix support, multi-output Y regression, solver fallbacks, performance improvements (vectorized group weights, PLS optimization), and an expanded test suite. See CONTRIBUTORS.md for details.
- Sparse matrix (
scipy.sparse) input support throughout - Multivariate Y (multi-output) for
lmandqrmodels solveraccepts a list of names with automatic fallback- New parameters:
verbose,canon_backend - Performance: vectorized group weights, PLS without refitting
- Internal refactor:
skmodels.py→ 5 focused modules - Test suite: 24 → 96 test functions
- Requires Python >= 3.10
- scikit-learn estimator tag compliance
- Quantile loss optimized via residual-splitting LP
- Logistic model rewritten:
predict_proba,decision_functionadded logit_probaandlogit_rawmodel types removed
- Ridge and adaptive ridge penalizations added (
'ridge','aridge')
Regressorclass introduced with full scikit-learn compatibility
GPL-3.0 — open source, modifications must be redistributed under the same license. See LICENSE for full text.
