Hyperparameter Tuning Methods & Cross validation#

以下示範直接結合search方法以及cross validation。

演算法的部分使用上一部份介紹的Elastic Net來做示範。

資料集的部分同樣也使用scikit-learn內建的toy dataset: diabetes來示範。

import numpy as np
import pandas as pd
from hyperopt import hp, tpe, fmin, Trials, space_eval
from hyperopt.pyll.base import scope
from scipy.stats import uniform
from scipy.stats import randint
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split

讀入資料集

# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

切分資料集為train / test

# 80% for training data, 20% for testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 第1個數字代表row, 第2個數字column
print(X.shape, X_train.shape, X_test.shape)
print(y.shape, y_train.shape, y_test.shape)
(442, 10) (353, 10) (89, 10)
(442,) (353,) (89,)

Bayesian Optimization#

使用 hyperopt 套件來實作貝氏優化法。相關的套件還有:optuna、 Scikit-Optimize等等。

跟random search一樣,要建立的是超參數的搜尋空間。

這邊超參數的分佈必須使用hyperopt的相關function。

詳細用法請參考:官方說明文件

# Define the hyperparameter space
param_space = {
    'n_estimators': scope.int(hp.randint('n_estimators', 10, 101)),
    'max_depth': scope.int(hp.randint('max_depth', 1, 6)),
    'min_samples_split': hp.uniform('min_samples_split', 0, 1),
    'criterion': hp.choice('criterion', ['friedman_mse', 'squared_error'])
}

hyperopt必須定義一個objective function。

這個objective function的輸入是選定的特定一組超參數組合,輸出則是衡量模型表現的指標。

注意這邊的正負號:

cross_val_score的score是要越大越好,所以必須使用”neg_mean_squared_error”。

而hyperopt則會讓衡量指標越小越好,所以最後必須乘回一個負號。

# Define objective function for hyperopt
def objective(params):
    model = GradientBoostingRegressor(**params)
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error')
    return -np.mean(scores)

以下語法可以執行hyperopt的超參數搜尋流程。

首先要建立Trials物件,這個物件會根據先前的試驗結果來選定下一組要試驗的超參數組合。

接著,建立fmin物件,該物件會依據自身的參數設定,產出最佳的超參數組合。

參數說明如下:

  • space為超參數搜尋空間。

  • algo參數則是指定用來估計 objective function 機率分佈的演算法。

  • max_evals是試驗的次數。

  • 最後,傳入Trials物件。

# Perform Bayesian optimization
trials = Trials()

bayesian_best_params = fmin(
    objective,
    space=param_space,
    algo=tpe.suggest,
    max_evals=24,
    trials=trials,
    rstate=np.random.default_rng(42))
  0%|                                                                                   | 0/24 [00:00<?, ?trial/s, best loss=?]
  8%|████▊                                                     | 2/24 [00:00<00:01, 13.29trial/s, best loss: 3368.723874697856]
 17%|█████████▋                                                | 4/24 [00:00<00:01, 12.69trial/s, best loss: 3368.723874697856]
 25%|██████████████▌                                           | 6/24 [00:00<00:01,  9.61trial/s, best loss: 3368.723874697856]
 33%|███████████████████▎                                      | 8/24 [00:00<00:01, 10.93trial/s, best loss: 3309.551860951369]
 42%|███████████████████████▊                                 | 10/24 [00:00<00:01, 11.41trial/s, best loss: 3309.551860951369]
 50%|████████████████████████████▌                            | 12/24 [00:01<00:01, 11.15trial/s, best loss: 3309.551860951369]
 58%|█████████████████████████████████▎                       | 14/24 [00:01<00:00, 12.09trial/s, best loss: 3306.415696270635]
 67%|██████████████████████████████████████                   | 16/24 [00:01<00:00, 13.63trial/s, best loss: 3306.415696270635]
 75%|██████████████████████████████████████████▊              | 18/24 [00:01<00:00, 12.20trial/s, best loss: 3306.415696270635]
 83%|███████████████████████████████████████████████▌         | 20/24 [00:01<00:00, 12.06trial/s, best loss: 3306.415696270635]
 92%|████████████████████████████████████████████████████▎    | 22/24 [00:01<00:00, 13.58trial/s, best loss: 3306.415696270635]
100%|█████████████████████████████████████████████████████████| 24/24 [00:02<00:00, 11.10trial/s, best loss: 3306.415696270635]
100%|█████████████████████████████████████████████████████████| 24/24 [00:02<00:00, 11.70trial/s, best loss: 3306.415696270635]

如上所述,fmin輸出的是一組超參數組合。

注意這邊’criterion’的取值為1,代表當初搜尋空間中定義的list中,index=1的超參數值。

# Print best hyperparameters
print("Best hyperparameters:", bayesian_best_params)
Best hyperparameters: {'criterion': 1, 'max_depth': 1, 'min_samples_split': 0.4671485658354062, 'n_estimators': 73}

所以利用hyperopt的 space_eval function 轉換為原始的值。

print(space_eval(param_space, bayesian_best_params))
{'criterion': 'squared_error', 'max_depth': 1, 'min_samples_split': 0.4671485658354062, 'n_estimators': 73}

使用最佳的超參數組合,手動執行refit。

bayesian_search = GradientBoostingRegressor(**space_eval(param_space, bayesian_best_params))
# find best hyperparameters
bayesian_search.fit(X_train, y_train)
GradientBoostingRegressor(criterion='squared_error', max_depth=1,
                          min_samples_split=0.4671485658354062,
                          n_estimators=73)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
# Make predictions
y_pred_best_bayesian = bayesian_search.predict(X_test)
# Calculate Mean Squared Error (MSE) on test set
mse_best_bayesian = mean_squared_error(y_test, y_pred_best_bayesian)
print("Best Gradient Boosting Regressor MSE:", mse_best_bayesian)
Best Gradient Boosting Regressor MSE: 2761.0626899606505

結論#

Bayesian optimization 會有額外的計算時間,若每次train模型的時間很快的話,使用random search可能會更有效率。