{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tree-based models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "首先介紹Decision Tree演算法,接著再比較各種樹模型(random forest, gradient boost tree, xgboost)。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from sklearn.datasets import load_breast_cancer\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.metrics import precision_score\n", "from sklearn.metrics import recall_score\n", "from sklearn.metrics import f1_score\n", "from sklearn.metrics import roc_auc_score\n", "from sklearn.metrics import average_precision_score\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.ensemble import GradientBoostingClassifier\n", "from sklearn.tree import plot_tree\n", "from sklearn.tree import export_graphviz\n", "import graphviz\n", "from xgboost import XGBClassifier\n", "import xgboost as xgb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "先用假造的簡單資料集如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = pd.DataFrame({\n", " 'is_default': [0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1], \n", " 'is_male': [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],\n", " 'is_fullpay': [1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0],\n", " 'age': [28, 33, 22, 30, 51, 47, 49, 32, 24, 23, 42, 57]\n", "})" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | is_default | \n", "is_male | \n", "is_fullpay | \n", "age | \n", "
---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "1 | \n", "28 | \n", "
1 | \n", "1 | \n", "1 | \n", "1 | \n", "33 | \n", "
2 | \n", "1 | \n", "1 | \n", "0 | \n", "22 | \n", "
3 | \n", "1 | \n", "1 | \n", "0 | \n", "30 | \n", "
4 | \n", "0 | \n", "1 | \n", "1 | \n", "51 | \n", "
5 | \n", "0 | \n", "1 | \n", "1 | \n", "47 | \n", "
6 | \n", "0 | \n", "0 | \n", "0 | \n", "49 | \n", "
7 | \n", "0 | \n", "0 | \n", "1 | \n", "32 | \n", "
8 | \n", "1 | \n", "0 | \n", "1 | \n", "24 | \n", "
9 | \n", "0 | \n", "0 | \n", "1 | \n", "23 | \n", "
10 | \n", "0 | \n", "0 | \n", "1 | \n", "42 | \n", "
11 | \n", "1 | \n", "0 | \n", "0 | \n", "57 | \n", "
DecisionTreeClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=42)
DecisionTreeClassifier(max_depth=2, min_samples_split=4, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(max_depth=2, min_samples_split=4, random_state=42)
DecisionTreeClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=42)
DecisionTreeClassifier(max_depth=3, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(max_depth=3, random_state=42)
RandomForestClassifier(max_depth=2, n_estimators=30, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestClassifier(max_depth=2, n_estimators=30, random_state=42)
GradientBoostingClassifier(learning_rate=0.5, max_depth=2, n_estimators=30,\n", " random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GradientBoostingClassifier(learning_rate=0.5, max_depth=2, n_estimators=30,\n", " random_state=42)
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric=None, feature_types=None,\n", " gamma=None, grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=0.8, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=2, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " multi_strategy=None, n_estimators=30, n_jobs=None,\n", " num_parallel_tree=None, random_state=42, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric=None, feature_types=None,\n", " gamma=None, grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=0.8, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=2, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " multi_strategy=None, n_estimators=30, n_jobs=None,\n", " num_parallel_tree=None, random_state=42, ...)