{ "cells": [ { "cell_type": "markdown", "id": "7744158c", "metadata": {}, "source": [ "# Intro to Pandas\n", "\n", "pandas 是使用Python進行資料分析,或是快速進行一些資料處理時必用的套件。\n", "\n", "在小型專案中,可能也適合使用pandas來建立資料流程的管線。\n", "\n", "但在較大型的專案中,資料量大或是資料處理流程的複雜度高的情況下,搭配使用其他工具可能就比較適合了(例如spark或是dbt),端看使用的情境以及專案團隊或組織的資源。\n", "\n", "總而言之,pandas是使用Python進行資料分析必學的套件,就讓我們開始吧!\n", "\n", "以下假設你已經建立好jupyter notebook的環境(也可以使用google colab),並且安裝好pandas套件。\n", "\n", "## 什麼是 DataFrame?\n", "\n", "首先先來瞭解什麼是DataFrame。\n", "\n", "DataFrame是pandas讀取資料表後的形式,跟excel資料表非常相似。\n", "\n", "首先第一步是把pandas套件import進來。" ] }, { "cell_type": "code", "execution_count": 1, "id": "a1b17460", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "d8263727", "metadata": {}, "source": [ "定義一個DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "id": "dbf91088", "metadata": {}, "outputs": [], "source": [ "# pd.Timestamp 是pandas建立時間資料的方法\n", "# pd.Categorical 是pandas建立類別資料的方法。類別資料與字串不同,具有其他特性,之後會說明。\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"col_1\": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", " \"col_2\": [i/10 for i in range(1,11)],\n", " \"col_3\": ['a', 'i', 'u', 'e', 'o', 'a', 'e', 'i', 'o', 'u'],\n", " \"col_4\": pd.Timestamp(\"20240101\"),\n", " \"col_5\": [True, False] * 5,\n", " \"col_6\": pd.Categorical([\"low\"] * 5 + [\"high\"] * 5)\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "b58cba39", "metadata": {}, "source": [ "直接執行它,就會印出DataFrame的內容。" ] }, { "cell_type": "code", "execution_count": 3, "id": "4ad70dee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_1col_2col_3col_4col_5col_6
010.1a2024-01-01Truelow
120.2i2024-01-01Falselow
230.3u2024-01-01Truelow
340.4e2024-01-01Falselow
450.5o2024-01-01Truelow
560.6a2024-01-01Falsehigh
670.7e2024-01-01Truehigh
780.8i2024-01-01Falsehigh
890.9o2024-01-01Truehigh
9101.0u2024-01-01Falsehigh
\n", "
" ], "text/plain": [ " col_1 col_2 col_3 col_4 col_5 col_6\n", "0 1 0.1 a 2024-01-01 True low\n", "1 2 0.2 i 2024-01-01 False low\n", "2 3 0.3 u 2024-01-01 True low\n", "3 4 0.4 e 2024-01-01 False low\n", "4 5 0.5 o 2024-01-01 True low\n", "5 6 0.6 a 2024-01-01 False high\n", "6 7 0.7 e 2024-01-01 True high\n", "7 8 0.8 i 2024-01-01 False high\n", "8 9 0.9 o 2024-01-01 True high\n", "9 10 1.0 u 2024-01-01 False high" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df # or display(df)" ] }, { "cell_type": "markdown", "id": "dd48dfd3", "metadata": {}, "source": [ "具有以下組成部分:\n", "\n", "- **索引(index):** 最左側數字0~1的部分" ] }, { "cell_type": "code", "execution_count": 4, "id": "4bcdaec3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=10, step=1)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ " df.index" ] }, { "cell_type": "markdown", "id": "7d5e668f", "metadata": {}, "source": [ "- **欄位(column)**:上方col_1 ~ col_6的部分" ] }, { "cell_type": "code", "execution_count": 5, "id": "3d85a93a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6'], dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "id": "f9019300", "metadata": {}, "source": [ "- **資料(data)**:資料內容本身。" ] }, { "cell_type": "code", "execution_count": 6, "id": "63a37750", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0.1, 'a', Timestamp('2024-01-01 00:00:00'), True, 'low'],\n", " [2, 0.2, 'i', Timestamp('2024-01-01 00:00:00'), False, 'low'],\n", " [3, 0.3, 'u', Timestamp('2024-01-01 00:00:00'), True, 'low'],\n", " [4, 0.4, 'e', Timestamp('2024-01-01 00:00:00'), False, 'low'],\n", " [5, 0.5, 'o', Timestamp('2024-01-01 00:00:00'), True, 'low'],\n", " [6, 0.6, 'a', Timestamp('2024-01-01 00:00:00'), False, 'high'],\n", " [7, 0.7, 'e', Timestamp('2024-01-01 00:00:00'), True, 'high'],\n", " [8, 0.8, 'i', Timestamp('2024-01-01 00:00:00'), False, 'high'],\n", " [9, 0.9, 'o', Timestamp('2024-01-01 00:00:00'), True, 'high'],\n", " [10, 1.0, 'u', Timestamp('2024-01-01 00:00:00'), False, 'high']],\n", " dtype=object)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.values" ] }, { "cell_type": "markdown", "id": "a16f320e", "metadata": {}, "source": [ "可以注意到印出的結果有```array```的字樣,其實pandas是基於另外一個套件numpy而延伸出來的工具。\n", "\n", "numpy也是一個python套件,提供大量的陣列運算功能,而且速度非常快。\n", "\n", "numpy的基本資料結構被稱作```array```。\n", "\n", "事實上numpy array是一個可以容納多維的陣列,稱作```ndarray``` (n-dimensional array)。" ] }, { "cell_type": "code", "execution_count": 7, "id": "31b47df4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df.values)" ] }, { "cell_type": "markdown", "id": "0fbdeeb0", "metadata": {}, "source": [ "建立一個numpy array:" ] }, { "cell_type": "code", "execution_count": 8, "id": "be3c4868", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "arr = np.array([1, 2, 3, 4, 5])" ] }, { "cell_type": "markdown", "id": "cc4e70b4", "metadata": {}, "source": [ "Pandas DataFrame的每一欄具有特定的資料型態。" ] }, { "cell_type": "code", "execution_count": 9, "id": "7b81741e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "col_1 int64\n", "col_2 float64\n", "col_3 object\n", "col_4 datetime64[s]\n", "col_5 bool\n", "col_6 category\n", "dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "id": "c809e90b", "metadata": {}, "source": [ "pandas資料型態大致如下:\n", "\n", "- Int64: integer\n", "- float64: float\n", "- object: strings (注意當資料型態混用的時候也會是object)\n", "- category: pandas特殊的資料型態,基本上也是string,但具有其他方便的特性。\n", "- bool: 布林值\n", "- datetime64[ns]: 時間\n", "\n", "## 認識你的資料\n", "\n", "以下的資料內容取自在kaggle找到的資料集:[credit_risk_customers (kaggle.com)](https://www.kaggle.com/datasets/ppb00x/credit-risk-customers/data)。可以自行下載成csv檔。\n", "\n", "這是一個跟銀行客戶信用風險相關的資料集,資料內容說明請見:[OpenML](https://www.openml.org/search?type=data&sort=runs&status=active&id=31)。\n", "\n", "首先透過以下語法讀入該csv檔:" ] }, { "cell_type": "code", "execution_count": 10, "id": "f60e4f31", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('./data/credit_customers.csv')" ] }, { "cell_type": "markdown", "id": "0d52b8fa", "metadata": {}, "source": [ "在進行資料分析的一開始,通常會先大致瞭解一下資料的長相。\n", "\n", "### 快速查看DataFrame內容\n", "\n", "以下語法可以快速查看DataFrame的相關資訊:" ] }, { "cell_type": "code", "execution_count": 11, "id": "ae86ab54", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 1000 entries, 0 to 999\n", "Data columns (total 21 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 checking_status 1000 non-null object \n", " 1 duration 1000 non-null float64\n", " 2 credit_history 1000 non-null object \n", " 3 purpose 1000 non-null object \n", " 4 credit_amount 1000 non-null float64\n", " 5 savings_status 1000 non-null object \n", " 6 employment 1000 non-null object \n", " 7 installment_commitment 1000 non-null float64\n", " 8 personal_status 1000 non-null object \n", " 9 other_parties 93 non-null object \n", " 10 residence_since 1000 non-null float64\n", " 11 property_magnitude 1000 non-null object \n", " 12 age 1000 non-null float64\n", " 13 other_payment_plans 186 non-null object \n", " 14 housing 1000 non-null object \n", " 15 existing_credits 1000 non-null float64\n", " 16 job 1000 non-null object \n", " 17 num_dependents 1000 non-null float64\n", " 18 own_telephone 404 non-null object \n", " 19 foreign_worker 1000 non-null object \n", " 20 class 1000 non-null object \n", "dtypes: float64(7), object(14)\n", "memory usage: 164.2+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "id": "758a9afa", "metadata": {}, "source": [ "查看DataFrame前幾筆資料,可以使用```df.head()```,預設是顯示5筆,但也可指定筆數。" ] }, { "cell_type": "code", "execution_count": 12, "id": "e9d2fa12", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_parties...property_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0<06.0critical/other existing creditradio/tv1169.0no known savings>=74.0male singleNaN...real estate67.0NaNown2.0skilled1.0yesyesgood
10<=X<20048.0existing paidradio/tv5951.0<1001<=X<42.0female div/dep/marNaN...real estate22.0NaNown1.0skilled1.0NaNyesbad
2no checking12.0critical/other existing crediteducation2096.0<1004<=X<72.0male singleNaN...real estate49.0NaNown1.0unskilled resident2.0NaNyesgood
\n", "

3 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "0 <0 6.0 critical/other existing credit radio/tv \n", "1 0<=X<200 48.0 existing paid radio/tv \n", "2 no checking 12.0 critical/other existing credit education \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "0 1169.0 no known savings >=7 4.0 \n", "1 5951.0 <100 1<=X<4 2.0 \n", "2 2096.0 <100 4<=X<7 2.0 \n", "\n", " personal_status other_parties ... property_magnitude age \\\n", "0 male single NaN ... real estate 67.0 \n", "1 female div/dep/mar NaN ... real estate 22.0 \n", "2 male single NaN ... real estate 49.0 \n", "\n", " other_payment_plans housing existing_credits job \\\n", "0 NaN own 2.0 skilled \n", "1 NaN own 1.0 skilled \n", "2 NaN own 1.0 unskilled resident \n", "\n", " num_dependents own_telephone foreign_worker class \n", "0 1.0 yes yes good \n", "1 1.0 NaN yes bad \n", "2 2.0 NaN yes good \n", "\n", "[3 rows x 21 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(3)" ] }, { "cell_type": "markdown", "id": "49a6c6cd", "metadata": {}, "source": [ "反過來就是:" ] }, { "cell_type": "code", "execution_count": 13, "id": "17061707", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_parties...property_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
995no checking12.0existing paidfurniture/equipment1736.0<1004<=X<73.0female div/dep/marNaN...real estate31.0NaNown1.0unskilled resident1.0NaNyesgood
996<030.0existing paidused car3857.0<1001<=X<44.0male div/sepNaN...life insurance40.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
997no checking12.0existing paidradio/tv804.0<100>=74.0male singleNaN...car38.0NaNown1.0skilled1.0NaNyesgood
998<045.0existing paidradio/tv1845.0<1001<=X<44.0male singleNaN...no known property23.0NaNfor free1.0skilled1.0yesyesbad
9990<=X<20045.0critical/other existing creditused car4576.0100<=X<500unemployed3.0male singleNaN...car27.0NaNown1.0skilled1.0NaNyesgood
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "995 no checking 12.0 existing paid \n", "996 <0 30.0 existing paid \n", "997 no checking 12.0 existing paid \n", "998 <0 45.0 existing paid \n", "999 0<=X<200 45.0 critical/other existing credit \n", "\n", " purpose credit_amount savings_status employment \\\n", "995 furniture/equipment 1736.0 <100 4<=X<7 \n", "996 used car 3857.0 <100 1<=X<4 \n", "997 radio/tv 804.0 <100 >=7 \n", "998 radio/tv 1845.0 <100 1<=X<4 \n", "999 used car 4576.0 100<=X<500 unemployed \n", "\n", " installment_commitment personal_status other_parties ... \\\n", "995 3.0 female div/dep/mar NaN ... \n", "996 4.0 male div/sep NaN ... \n", "997 4.0 male single NaN ... \n", "998 4.0 male single NaN ... \n", "999 3.0 male single NaN ... \n", "\n", " property_magnitude age other_payment_plans housing existing_credits \\\n", "995 real estate 31.0 NaN own 1.0 \n", "996 life insurance 40.0 NaN own 1.0 \n", "997 car 38.0 NaN own 1.0 \n", "998 no known property 23.0 NaN for free 1.0 \n", "999 car 27.0 NaN own 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "995 unskilled resident 1.0 NaN yes \n", "996 high qualif/self emp/mgmt 1.0 yes yes \n", "997 skilled 1.0 NaN yes \n", "998 skilled 1.0 yes yes \n", "999 skilled 1.0 NaN yes \n", "\n", " class \n", "995 good \n", "996 good \n", "997 good \n", "998 bad \n", "999 good \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "markdown", "id": "728a9c4d", "metadata": {}, "source": [ "若要隨機查看的話:" ] }, { "cell_type": "code", "execution_count": 14, "id": "17d17a03", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_parties...property_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
754no checking12.0delayed previouslyrepairs1555.0>=1000>=74.0male singleNaN...no known property55.0NaNfor free2.0skilled2.0NaNyesbad
8150<=X<20036.0delayed previouslynew car7432.0<1001<=X<42.0female div/dep/marNaN...life insurance54.0NaNrent1.0skilled1.0NaNyesgood
366no checking18.0critical/other existing creditused car3850.0<1004<=X<73.0male singleNaN...car27.0NaNown2.0skilled1.0NaNyesgood
\n", "

3 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "754 no checking 12.0 delayed previously repairs \n", "815 0<=X<200 36.0 delayed previously new car \n", "366 no checking 18.0 critical/other existing credit used car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "754 1555.0 >=1000 >=7 4.0 \n", "815 7432.0 <100 1<=X<4 2.0 \n", "366 3850.0 <100 4<=X<7 3.0 \n", "\n", " personal_status other_parties ... property_magnitude age \\\n", "754 male single NaN ... no known property 55.0 \n", "815 female div/dep/mar NaN ... life insurance 54.0 \n", "366 male single NaN ... car 27.0 \n", "\n", " other_payment_plans housing existing_credits job num_dependents \\\n", "754 NaN for free 2.0 skilled 2.0 \n", "815 NaN rent 1.0 skilled 1.0 \n", "366 NaN own 2.0 skilled 1.0 \n", "\n", " own_telephone foreign_worker class \n", "754 NaN yes bad \n", "815 NaN yes good \n", "366 NaN yes good \n", "\n", "[3 rows x 21 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sample(3)" ] }, { "cell_type": "markdown", "id": "f2f88598", "metadata": {}, "source": [ "注意到,由於資料的欄位數較多,實際顯示出來的欄位數較少,中間會以\"...\"的方式略過。\n", "\n", "若想要查看完整的欄位,可以使用以下語法:" ] }, { "cell_type": "code", "execution_count": 15, "id": "25bf4b89", "metadata": {}, "outputs": [], "source": [ "pd.set_option(\"display.max_columns\", None)" ] }, { "cell_type": "markdown", "id": "5c277159", "metadata": {}, "source": [ "意思是告訴pandas要顯示出全部的欄位數:" ] }, { "cell_type": "code", "execution_count": 16, "id": "79b799f1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0<06.0critical/other existing creditradio/tv1169.0no known savings>=74.0male singleNaN4.0real estate67.0NaNown2.0skilled1.0yesyesgood
10<=X<20048.0existing paidradio/tv5951.0<1001<=X<42.0female div/dep/marNaN2.0real estate22.0NaNown1.0skilled1.0NaNyesbad
2no checking12.0critical/other existing crediteducation2096.0<1004<=X<72.0male singleNaN3.0real estate49.0NaNown1.0unskilled resident2.0NaNyesgood
3<042.0existing paidfurniture/equipment7882.0<1004<=X<72.0male singleguarantor4.0life insurance45.0NaNfor free1.0skilled2.0NaNyesgood
4<024.0delayed previouslynew car4870.0<1001<=X<43.0male singleNaN4.0no known property53.0NaNfor free2.0skilled2.0NaNyesbad
\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "0 <0 6.0 critical/other existing credit \n", "1 0<=X<200 48.0 existing paid \n", "2 no checking 12.0 critical/other existing credit \n", "3 <0 42.0 existing paid \n", "4 <0 24.0 delayed previously \n", "\n", " purpose credit_amount savings_status employment \\\n", "0 radio/tv 1169.0 no known savings >=7 \n", "1 radio/tv 5951.0 <100 1<=X<4 \n", "2 education 2096.0 <100 4<=X<7 \n", "3 furniture/equipment 7882.0 <100 4<=X<7 \n", "4 new car 4870.0 <100 1<=X<4 \n", "\n", " installment_commitment personal_status other_parties residence_since \\\n", "0 4.0 male single NaN 4.0 \n", "1 2.0 female div/dep/mar NaN 2.0 \n", "2 2.0 male single NaN 3.0 \n", "3 2.0 male single guarantor 4.0 \n", "4 3.0 male single NaN 4.0 \n", "\n", " property_magnitude age other_payment_plans housing existing_credits \\\n", "0 real estate 67.0 NaN own 2.0 \n", "1 real estate 22.0 NaN own 1.0 \n", "2 real estate 49.0 NaN own 1.0 \n", "3 life insurance 45.0 NaN for free 1.0 \n", "4 no known property 53.0 NaN for free 2.0 \n", "\n", " job num_dependents own_telephone foreign_worker class \n", "0 skilled 1.0 yes yes good \n", "1 skilled 1.0 NaN yes bad \n", "2 unskilled resident 2.0 NaN yes good \n", "3 skilled 2.0 NaN yes good \n", "4 skilled 2.0 NaN yes bad " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "f2ed26dd", "metadata": {}, "source": [ "查看資料的維度:" ] }, { "cell_type": "code", "execution_count": 17, "id": "4c206615", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1000, 21)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "markdown", "id": "db64ba3b", "metadata": {}, "source": [ "回傳的是一個tuple,第1個元素是index的長度,就是列(row)的數量,第2個元素是column的長度,就是欄(column)的數量。\n", "\n", "若要快速得知資料的統計資訊:" ] }, { "cell_type": "code", "execution_count": 18, "id": "19be3963", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
durationcredit_amountinstallment_commitmentresidence_sinceageexisting_creditsnum_dependents
count1000.0000001000.0000001000.0000001000.0000001000.0000001000.0000001000.000000
mean20.9030003271.2580002.9730002.84500035.5460001.4070001.155000
std12.0588142822.7368761.1187151.10371811.3754690.5776540.362086
min4.000000250.0000001.0000001.00000019.0000001.0000001.000000
25%12.0000001365.5000002.0000002.00000027.0000001.0000001.000000
50%18.0000002319.5000003.0000003.00000033.0000001.0000001.000000
75%24.0000003972.2500004.0000004.00000042.0000002.0000001.000000
max72.00000018424.0000004.0000004.00000075.0000004.0000002.000000
\n", "
" ], "text/plain": [ " duration credit_amount installment_commitment residence_since \\\n", "count 1000.000000 1000.000000 1000.000000 1000.000000 \n", "mean 20.903000 3271.258000 2.973000 2.845000 \n", "std 12.058814 2822.736876 1.118715 1.103718 \n", "min 4.000000 250.000000 1.000000 1.000000 \n", "25% 12.000000 1365.500000 2.000000 2.000000 \n", "50% 18.000000 2319.500000 3.000000 3.000000 \n", "75% 24.000000 3972.250000 4.000000 4.000000 \n", "max 72.000000 18424.000000 4.000000 4.000000 \n", "\n", " age existing_credits num_dependents \n", "count 1000.000000 1000.000000 1000.000000 \n", "mean 35.546000 1.407000 1.155000 \n", "std 11.375469 0.577654 0.362086 \n", "min 19.000000 1.000000 1.000000 \n", "25% 27.000000 1.000000 1.000000 \n", "50% 33.000000 1.000000 1.000000 \n", "75% 42.000000 2.000000 1.000000 \n", "max 75.000000 4.000000 2.000000 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "id": "b90f3135", "metadata": {}, "source": [ "注意到只有數值型的資料才會列出。若要納入非數值型的資料的話:" ] }, { "cell_type": "code", "execution_count": 19, "id": "458882c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
count10001000.000000100010001000.000000100010001000.0000001000931000.00000010001000.00000018610001000.00000010001000.00000040410001000
unique4NaN510NaN55NaN42NaN4NaN23NaN4NaN122
topno checkingNaNexisting paidradio/tvNaN<1001<=X<4NaNmale singleguarantorNaNcarNaNbankownNaNskilledNaNyesyesgood
freq394NaN530280NaN603339NaN54852NaN332NaN139713NaN630NaN404963700
meanNaN20.903000NaNNaN3271.258000NaNNaN2.973000NaNNaN2.845000NaN35.546000NaNNaN1.407000NaN1.155000NaNNaNNaN
stdNaN12.058814NaNNaN2822.736876NaNNaN1.118715NaNNaN1.103718NaN11.375469NaNNaN0.577654NaN0.362086NaNNaNNaN
minNaN4.000000NaNNaN250.000000NaNNaN1.000000NaNNaN1.000000NaN19.000000NaNNaN1.000000NaN1.000000NaNNaNNaN
25%NaN12.000000NaNNaN1365.500000NaNNaN2.000000NaNNaN2.000000NaN27.000000NaNNaN1.000000NaN1.000000NaNNaNNaN
50%NaN18.000000NaNNaN2319.500000NaNNaN3.000000NaNNaN3.000000NaN33.000000NaNNaN1.000000NaN1.000000NaNNaNNaN
75%NaN24.000000NaNNaN3972.250000NaNNaN4.000000NaNNaN4.000000NaN42.000000NaNNaN2.000000NaN1.000000NaNNaNNaN
maxNaN72.000000NaNNaN18424.000000NaNNaN4.000000NaNNaN4.000000NaN75.000000NaNNaN4.000000NaN2.000000NaNNaNNaN
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose credit_amount \\\n", "count 1000 1000.000000 1000 1000 1000.000000 \n", "unique 4 NaN 5 10 NaN \n", "top no checking NaN existing paid radio/tv NaN \n", "freq 394 NaN 530 280 NaN \n", "mean NaN 20.903000 NaN NaN 3271.258000 \n", "std NaN 12.058814 NaN NaN 2822.736876 \n", "min NaN 4.000000 NaN NaN 250.000000 \n", "25% NaN 12.000000 NaN NaN 1365.500000 \n", "50% NaN 18.000000 NaN NaN 2319.500000 \n", "75% NaN 24.000000 NaN NaN 3972.250000 \n", "max NaN 72.000000 NaN NaN 18424.000000 \n", "\n", " savings_status employment installment_commitment personal_status \\\n", "count 1000 1000 1000.000000 1000 \n", "unique 5 5 NaN 4 \n", "top <100 1<=X<4 NaN male single \n", "freq 603 339 NaN 548 \n", "mean NaN NaN 2.973000 NaN \n", "std NaN NaN 1.118715 NaN \n", "min NaN NaN 1.000000 NaN \n", "25% NaN NaN 2.000000 NaN \n", "50% NaN NaN 3.000000 NaN \n", "75% NaN NaN 4.000000 NaN \n", "max NaN NaN 4.000000 NaN \n", "\n", " other_parties residence_since property_magnitude age \\\n", "count 93 1000.000000 1000 1000.000000 \n", "unique 2 NaN 4 NaN \n", "top guarantor NaN car NaN \n", "freq 52 NaN 332 NaN \n", "mean NaN 2.845000 NaN 35.546000 \n", "std NaN 1.103718 NaN 11.375469 \n", "min NaN 1.000000 NaN 19.000000 \n", "25% NaN 2.000000 NaN 27.000000 \n", "50% NaN 3.000000 NaN 33.000000 \n", "75% NaN 4.000000 NaN 42.000000 \n", "max NaN 4.000000 NaN 75.000000 \n", "\n", " other_payment_plans housing existing_credits job num_dependents \\\n", "count 186 1000 1000.000000 1000 1000.000000 \n", "unique 2 3 NaN 4 NaN \n", "top bank own NaN skilled NaN \n", "freq 139 713 NaN 630 NaN \n", "mean NaN NaN 1.407000 NaN 1.155000 \n", "std NaN NaN 0.577654 NaN 0.362086 \n", "min NaN NaN 1.000000 NaN 1.000000 \n", "25% NaN NaN 1.000000 NaN 1.000000 \n", "50% NaN NaN 1.000000 NaN 1.000000 \n", "75% NaN NaN 2.000000 NaN 1.000000 \n", "max NaN NaN 4.000000 NaN 2.000000 \n", "\n", " own_telephone foreign_worker class \n", "count 404 1000 1000 \n", "unique 1 2 2 \n", "top yes yes good \n", "freq 404 963 700 \n", "mean NaN NaN NaN \n", "std NaN NaN NaN \n", "min NaN NaN NaN \n", "25% NaN NaN NaN \n", "50% NaN NaN NaN \n", "75% NaN NaN NaN \n", "max NaN NaN NaN " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include='all')" ] }, { "cell_type": "markdown", "id": "6de1f5da", "metadata": {}, "source": [ "### 檢查遺漏值\n", "\n", "大致瞭解資料長相之後,可以開始進一步確認資料的細節。資料是否有遺漏值?欄位值有哪些取值?欄位的最大最小值等統計資訊?等等。\n", "\n", "首先先來看跟遺漏值相關的語法:" ] }, { "cell_type": "code", "execution_count": 20, "id": "01bd0dcd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
2FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
3FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
4FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
..................................................................
995FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
996FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalse
997FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
998FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalse
999FalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalse
\n", "

1000 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose credit_amount \\\n", "0 False False False False False \n", "1 False False False False False \n", "2 False False False False False \n", "3 False False False False False \n", "4 False False False False False \n", ".. ... ... ... ... ... \n", "995 False False False False False \n", "996 False False False False False \n", "997 False False False False False \n", "998 False False False False False \n", "999 False False False False False \n", "\n", " savings_status employment installment_commitment personal_status \\\n", "0 False False False False \n", "1 False False False False \n", "2 False False False False \n", "3 False False False False \n", "4 False False False False \n", ".. ... ... ... ... \n", "995 False False False False \n", "996 False False False False \n", "997 False False False False \n", "998 False False False False \n", "999 False False False False \n", "\n", " other_parties residence_since property_magnitude age \\\n", "0 True False False False \n", "1 True False False False \n", "2 True False False False \n", "3 False False False False \n", "4 True False False False \n", ".. ... ... ... ... \n", "995 True False False False \n", "996 True False False False \n", "997 True False False False \n", "998 True False False False \n", "999 True False False False \n", "\n", " other_payment_plans housing existing_credits job num_dependents \\\n", "0 True False False False False \n", "1 True False False False False \n", "2 True False False False False \n", "3 True False False False False \n", "4 True False False False False \n", ".. ... ... ... ... ... \n", "995 True False False False False \n", "996 True False False False False \n", "997 True False False False False \n", "998 True False False False False \n", "999 True False False False False \n", "\n", " own_telephone foreign_worker class \n", "0 False False False \n", "1 True False False \n", "2 True False False \n", "3 True False False \n", "4 True False False \n", ".. ... ... ... \n", "995 True False False \n", "996 False False False \n", "997 True False False \n", "998 False False False \n", "999 True False False \n", "\n", "[1000 rows x 21 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna()" ] }, { "cell_type": "markdown", "id": "c270e214", "metadata": {}, "source": [ "以上這個語法會回傳每個欄位每一列是否為遺漏值的DataFrame,False代表該欄位值沒有遺漏值。True代表有遺漏值。\n", "\n", "但通常我們關心的是哪些欄位有遺漏值?有多少筆資料有遺漏值?\n", "\n", "所以會想要在把上面的資料根據欄位「加總」,因為True代表1,False代表0,加總後就可以知道遺漏值的數量。\n", "\n", "可以這樣寫:" ] }, { "cell_type": "code", "execution_count": 21, "id": "695979d0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "checking_status 0\n", "duration 0\n", "credit_history 0\n", "purpose 0\n", "credit_amount 0\n", "savings_status 0\n", "employment 0\n", "installment_commitment 0\n", "personal_status 0\n", "other_parties 907\n", "residence_since 0\n", "property_magnitude 0\n", "age 0\n", "other_payment_plans 814\n", "housing 0\n", "existing_credits 0\n", "job 0\n", "num_dependents 0\n", "own_telephone 596\n", "foreign_worker 0\n", "class 0\n", "dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna().sum()" ] }, { "cell_type": "markdown", "id": "6b703954", "metadata": {}, "source": [ "上面的語法結構值得說明一下,由於```df.isnull()```回傳的是一個DataFrame,所以我們可以直接接著call其他DataFrame的方法。\n", "\n", "而```.sum()```是一個把DataFrame資料加總的方法,預設是by欄位加總。可以藉由調整參數``axis```來改變加總的方向。\n", "\n", "不指定axis參數的話,預設是```axis=0```,代表沿著axis 0, 把值一個一個加起來,所以是by欄位加總。\n", "\n", "```axis=1```代表by列加總,就可以知道哪幾筆資料有遺漏值。" ] }, { "cell_type": "code", "execution_count": 22, "id": "6e26aace", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2\n", "1 3\n", "2 3\n", "3 2\n", "4 3\n", " ..\n", "995 3\n", "996 2\n", "997 3\n", "998 2\n", "999 3\n", "Length: 1000, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna().sum(axis=1)" ] }, { "cell_type": "markdown", "id": "100de84b", "metadata": {}, "source": [ "如果不關心筆數,也可以使用```.any()```,代表只要有一筆資料為True就會輸出True:" ] }, { "cell_type": "code", "execution_count": 23, "id": "ff4ffb23", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "checking_status False\n", "duration False\n", "credit_history False\n", "purpose False\n", "credit_amount False\n", "savings_status False\n", "employment False\n", "installment_commitment False\n", "personal_status False\n", "other_parties True\n", "residence_since False\n", "property_magnitude False\n", "age False\n", "other_payment_plans True\n", "housing False\n", "existing_credits False\n", "job False\n", "num_dependents False\n", "own_telephone True\n", "foreign_worker False\n", "class False\n", "dtype: bool" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna().any()" ] }, { "cell_type": "markdown", "id": "f02dfb42", "metadata": {}, "source": [ "如何確認資料是否有任何遺漏值?" ] }, { "cell_type": "code", "execution_count": 24, "id": "d6215a40", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna().any().any()" ] }, { "cell_type": "markdown", "id": "6e18fb27", "metadata": {}, "source": [ "反向的語法是:" ] }, { "cell_type": "code", "execution_count": 25, "id": "041fad2f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "checking_status 1000\n", "duration 1000\n", "credit_history 1000\n", "purpose 1000\n", "credit_amount 1000\n", "savings_status 1000\n", "employment 1000\n", "installment_commitment 1000\n", "personal_status 1000\n", "other_parties 93\n", "residence_since 1000\n", "property_magnitude 1000\n", "age 1000\n", "other_payment_plans 186\n", "housing 1000\n", "existing_credits 1000\n", "job 1000\n", "num_dependents 1000\n", "own_telephone 404\n", "foreign_worker 1000\n", "class 1000\n", "dtype: int64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.notna().sum()" ] }, { "cell_type": "markdown", "id": "1440db67", "metadata": {}, "source": [ "### 確認欄位值\n", "\n", "如果想知道特定欄位有哪些取值,通常是要初步看一下資料的內容,以及看欄位有沒有什麼異常值。\n", "\n", "例如想知道客戶來申辦貸款的目的有哪些?可以對purpose這個欄位使用以下操作:" ] }, { "cell_type": "code", "execution_count": 26, "id": "e0f59299", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['radio/tv', 'education', 'furniture/equipment', 'new car',\n", " 'used car', 'business', 'domestic appliance', 'repairs', 'other',\n", " 'retraining'], dtype=object)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['purpose'].unique()" ] }, { "cell_type": "markdown", "id": "79a5f542", "metadata": {}, "source": [ "就可以把資料中purpose實際上出現哪些不同值給列出來。\n", "\n", "其中,```df['purpose']```是篩選出欄位資料的語法。也可以使用:" ] }, { "cell_type": "code", "execution_count": 27, "id": "588f5ce7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 radio/tv\n", "1 radio/tv\n", "2 education\n", "3 furniture/equipment\n", "4 new car\n", " ... \n", "995 furniture/equipment\n", "996 used car\n", "997 radio/tv\n", "998 radio/tv\n", "999 used car\n", "Name: purpose, Length: 1000, dtype: object" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.purpose" ] }, { "cell_type": "markdown", "id": "31d0410f", "metadata": {}, "source": [ "注意到```.purpose```的語法事實上是把欄位當作是DataFrame的「屬性(attribute)」來取用,所以若欄位名稱與DataFrame原本就有的屬性重複的話會無法使用。\n", "\n", "補充說明一下,這裡回傳的東西,其實是pandas Series。Series是pandas的另一個資料結構,DataFrame中每一個欄位都儲存為Series。" ] }, { "cell_type": "code", "execution_count": 28, "id": "1928252a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df.purpose)" ] }, { "cell_type": "markdown", "id": "2a1fbe46", "metadata": {}, "source": [ "可以把Series理解為單一維度的numpy array,與numpy array不同的地方在於Series內含具有標籤的索引(index)。" ] }, { "cell_type": "code", "execution_count": 29, "id": "58ad4973", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a 1\n", "b 2\n", "c 3\n", "d 4\n", "e 5\n", "dtype: int64\n", "Index(['a', 'b', 'c', 'd', 'e'], dtype='object')\n" ] } ], "source": [ "s = pd.Series([1, 2, 3, 4, 5], index=[\"a\", \"b\", \"c\", \"d\", \"e\"])\n", "\n", "print(s)\n", "print(s.index)" ] }, { "cell_type": "markdown", "id": "7d285b8f", "metadata": {}, "source": [ "除了看哪些欄位取值,還可以直接算出該欄位有幾種不同取值:" ] }, { "cell_type": "code", "execution_count": 30, "id": "eec2e29f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['purpose'].nunique()" ] }, { "cell_type": "markdown", "id": "ecf1726a", "metadata": {}, "source": [ "或是每個取值有多少筆資料:" ] }, { "cell_type": "code", "execution_count": 31, "id": "3ace4951", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "purpose\n", "radio/tv 280\n", "new car 234\n", "furniture/equipment 181\n", "used car 103\n", "business 97\n", "education 50\n", "repairs 22\n", "domestic appliance 12\n", "other 12\n", "retraining 9\n", "Name: count, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['purpose'].value_counts()" ] }, { "cell_type": "markdown", "id": "f465e1cb", "metadata": {}, "source": [ "進一步看佔比:" ] }, { "cell_type": "code", "execution_count": 32, "id": "15e65fd8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "purpose\n", "radio/tv 0.280\n", "new car 0.234\n", "furniture/equipment 0.181\n", "used car 0.103\n", "business 0.097\n", "education 0.050\n", "repairs 0.022\n", "domestic appliance 0.012\n", "other 0.012\n", "retraining 0.009\n", "Name: proportion, dtype: float64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['purpose'].value_counts(normalize=True)" ] }, { "cell_type": "markdown", "id": "b648773e", "metadata": {}, "source": [ "注意到```value_counts()```會忽略空值:" ] }, { "cell_type": "code", "execution_count": 33, "id": "2e4d6191", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "other_parties\n", "guarantor 52\n", "co applicant 41\n", "Name: count, dtype: int64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['other_parties'].value_counts() # 明顯不到1000筆資料" ] }, { "cell_type": "markdown", "id": "7b961300", "metadata": {}, "source": [ "可以使用```dropna=False```參數:" ] }, { "cell_type": "code", "execution_count": 34, "id": "f11724af", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "other_parties\n", "NaN 907\n", "guarantor 52\n", "co applicant 41\n", "Name: count, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['other_parties'].value_counts(dropna=False)" ] }, { "cell_type": "markdown", "id": "c22da80e", "metadata": {}, "source": [ "針對數值型欄位可以使用```bins```參數,放入要切分的組數。\n", "這邊組別的切分是依據欄位值的全距去切指定的等份。" ] }, { "cell_type": "code", "execution_count": 35, "id": "d6346f9b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(27.0, 35.0] 297\n", "(18.942999999999998, 27.0] 291\n", "(35.0, 43.0] 194\n", "(43.0, 51.0] 113\n", "(51.0, 59.0] 54\n", "(59.0, 67.0] 41\n", "(67.0, 75.0] 10\n", "Name: count, dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['age'].value_counts(bins=7)" ] }, { "cell_type": "markdown", "id": "8a7d183a", "metadata": {}, "source": [ "如果要看個別欄位的前幾大或倒數前幾大的值是哪些(參數可指定列數):" ] }, { "cell_type": "code", "execution_count": 36, "id": "1845df1f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "677 72.0\n", "29 60.0\n", "134 60.0\n", "255 60.0\n", "332 60.0\n", "Name: duration, dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].nlargest()" ] }, { "cell_type": "code", "execution_count": 37, "id": "68684713", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "234 4.0\n", "314 4.0\n", "527 4.0\n", "734 4.0\n", "837 4.0\n", "Name: duration, dtype: float64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].nsmallest()" ] }, { "cell_type": "markdown", "id": "86bf6a3c", "metadata": {}, "source": [ "或是可以用排序的方法:" ] }, { "cell_type": "code", "execution_count": 38, "id": "2355a264", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "677 72.0\n", "973 60.0\n", "685 60.0\n", "672 60.0\n", "714 60.0\n", "Name: duration, dtype: float64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].sort_values(ascending=False).head()" ] }, { "cell_type": "code", "execution_count": 39, "id": "622984ec", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "314 4.0\n", "734 4.0\n", "527 4.0\n", "943 4.0\n", "837 4.0\n", "Name: duration, dtype: float64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].sort_values().head()" ] }, { "cell_type": "markdown", "id": "273c9406", "metadata": {}, "source": [ "### 資料的統計資訊\n", "\n", "除了使用```.describe()```快速查看資料統計數據外,還可以使用各種方法來依照需求做計算。" ] }, { "cell_type": "code", "execution_count": 40, "id": "1e662fae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.0" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].min()" ] }, { "cell_type": "code", "execution_count": 41, "id": "cce8f254", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "72.0" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].max()" ] }, { "cell_type": "code", "execution_count": 42, "id": "3a19ff44", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20.903" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].mean()" ] }, { "cell_type": "code", "execution_count": 43, "id": "f8bd9819", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18.0" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].median()" ] }, { "cell_type": "code", "execution_count": 44, "id": "bf6888a1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "145.41500600600602" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].var()" ] }, { "cell_type": "code", "execution_count": 45, "id": "ec6d0c6e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12.058814452756375" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].std()" ] }, { "cell_type": "code", "execution_count": 46, "id": "d0207cb3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "30.0" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].quantile(0.8)" ] }, { "cell_type": "code", "execution_count": 47, "id": "cb5ca848", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.10 9.0\n", "0.30 12.0\n", "0.45 18.0\n", "0.60 24.0\n", "0.80 30.0\n", "1.00 72.0\n", "Name: duration, dtype: float64" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['duration'].quantile([0.1,0.3,0.45,0.6,0.8,1])" ] }, { "cell_type": "markdown", "id": "563d13ae", "metadata": {}, "source": [ "還可以很方便算多個欄位的共變數、相關係數:" ] }, { "cell_type": "code", "execution_count": 48, "id": "8cc3d159", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
age129.401285-4.9569951.050523e+03
duration-4.956995145.4150062.127375e+04
credit_amount1050.52265521273.7497767.967843e+06
\n", "
" ], "text/plain": [ " age duration credit_amount\n", "age 129.401285 -4.956995 1.050523e+03\n", "duration -4.956995 145.415006 2.127375e+04\n", "credit_amount 1050.522655 21273.749776 7.967843e+06" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['age', 'duration', 'credit_amount']].cov()" ] }, { "cell_type": "code", "execution_count": 49, "id": "11f19b6b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
age1.000000-0.0361360.032716
duration-0.0361361.0000000.624984
credit_amount0.0327160.6249841.000000
\n", "
" ], "text/plain": [ " age duration credit_amount\n", "age 1.000000 -0.036136 0.032716\n", "duration -0.036136 1.000000 0.624984\n", "credit_amount 0.032716 0.624984 1.000000" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['age', 'duration', 'credit_amount']].corr()" ] }, { "cell_type": "markdown", "id": "99d61223", "metadata": {}, "source": [ "## 查看你的資料\n", "\n", "有了對資料的大致認識後,可能會想對資料的部分欄位或部份樣本做一些後續處理,例如填補遺漏值、對欄位做加工等等。\n", "\n", "但在此之前需要先知道怎麼擷取你想要的特定資料。\n", "\n", "### 抓特定欄位的資料\n", "\n", "首先第一種方法已經介紹過了,就是直接在DataFrame後面加```[想要抓的欄位]```:" ] }, { "cell_type": "code", "execution_count": 50, "id": "ca48209e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 good\n", "1 bad\n", "2 good\n", "3 good\n", "4 bad\n", " ... \n", "995 good\n", "996 good\n", "997 good\n", "998 bad\n", "999 good\n", "Name: class, Length: 1000, dtype: object" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['class']" ] }, { "cell_type": "markdown", "id": "aec525b5", "metadata": {}, "source": [ "當要一次抓取多個欄位時,必須放入一個欄位名稱的list:" ] }, { "cell_type": "code", "execution_count": 51, "id": "ef99951a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
067.06.01169.0
122.048.05951.0
249.012.02096.0
345.042.07882.0
453.024.04870.0
............
99531.012.01736.0
99640.030.03857.0
99738.012.0804.0
99823.045.01845.0
99927.045.04576.0
\n", "

1000 rows × 3 columns

\n", "
" ], "text/plain": [ " age duration credit_amount\n", "0 67.0 6.0 1169.0\n", "1 22.0 48.0 5951.0\n", "2 49.0 12.0 2096.0\n", "3 45.0 42.0 7882.0\n", "4 53.0 24.0 4870.0\n", ".. ... ... ...\n", "995 31.0 12.0 1736.0\n", "996 40.0 30.0 3857.0\n", "997 38.0 12.0 804.0\n", "998 23.0 45.0 1845.0\n", "999 27.0 45.0 4576.0\n", "\n", "[1000 rows x 3 columns]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['age', 'duration', 'credit_amount']]" ] }, { "cell_type": "markdown", "id": "5c242f16", "metadata": {}, "source": [ "另一種方法是使用```.loc[]```,使用方式如下:" ] }, { "cell_type": "code", "execution_count": 52, "id": "581ae9d6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 good\n", "1 bad\n", "2 good\n", "3 good\n", "4 bad\n", " ... \n", "995 good\n", "996 good\n", "997 good\n", "998 bad\n", "999 good\n", "Name: class, Length: 1000, dtype: object" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, 'class']" ] }, { "cell_type": "markdown", "id": "74814172", "metadata": {}, "source": [ "多個欄位的話一樣是要放list:" ] }, { "cell_type": "code", "execution_count": 53, "id": "67188d75", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
067.06.01169.0
122.048.05951.0
249.012.02096.0
345.042.07882.0
453.024.04870.0
............
99531.012.01736.0
99640.030.03857.0
99738.012.0804.0
99823.045.01845.0
99927.045.04576.0
\n", "

1000 rows × 3 columns

\n", "
" ], "text/plain": [ " age duration credit_amount\n", "0 67.0 6.0 1169.0\n", "1 22.0 48.0 5951.0\n", "2 49.0 12.0 2096.0\n", "3 45.0 42.0 7882.0\n", "4 53.0 24.0 4870.0\n", ".. ... ... ...\n", "995 31.0 12.0 1736.0\n", "996 40.0 30.0 3857.0\n", "997 38.0 12.0 804.0\n", "998 23.0 45.0 1845.0\n", "999 27.0 45.0 4576.0\n", "\n", "[1000 rows x 3 columns]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, ['age', 'duration', 'credit_amount']]" ] }, { "cell_type": "markdown", "id": "677790cf", "metadata": {}, "source": [ "與```.loc[]```類似的,```iloc[]```也可以做到篩選,只是要指定的是「第幾個」欄位:" ] }, { "cell_type": "code", "execution_count": 54, "id": "d3e9091b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
067.06.01169.0
122.048.05951.0
249.012.02096.0
345.042.07882.0
453.024.04870.0
............
99531.012.01736.0
99640.030.03857.0
99738.012.0804.0
99823.045.01845.0
99927.045.04576.0
\n", "

1000 rows × 3 columns

\n", "
" ], "text/plain": [ " age duration credit_amount\n", "0 67.0 6.0 1169.0\n", "1 22.0 48.0 5951.0\n", "2 49.0 12.0 2096.0\n", "3 45.0 42.0 7882.0\n", "4 53.0 24.0 4870.0\n", ".. ... ... ...\n", "995 31.0 12.0 1736.0\n", "996 40.0 30.0 3857.0\n", "997 38.0 12.0 804.0\n", "998 23.0 45.0 1845.0\n", "999 27.0 45.0 4576.0\n", "\n", "[1000 rows x 3 columns]" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[:, [12, 1, 4]]" ] }, { "cell_type": "markdown", "id": "425f826c", "metadata": {}, "source": [ "注意到```:```的使用,可以一次指定多個欄位,使用方式跟list的index一樣:" ] }, { "cell_type": "code", "execution_count": 55, "id": "43e9333c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurpose
0<06.0critical/other existing creditradio/tv
10<=X<20048.0existing paidradio/tv
2no checking12.0critical/other existing crediteducation
3<042.0existing paidfurniture/equipment
4<024.0delayed previouslynew car
...............
995no checking12.0existing paidfurniture/equipment
996<030.0existing paidused car
997no checking12.0existing paidradio/tv
998<045.0existing paidradio/tv
9990<=X<20045.0critical/other existing creditused car
\n", "

1000 rows × 4 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "0 <0 6.0 critical/other existing credit \n", "1 0<=X<200 48.0 existing paid \n", "2 no checking 12.0 critical/other existing credit \n", "3 <0 42.0 existing paid \n", "4 <0 24.0 delayed previously \n", ".. ... ... ... \n", "995 no checking 12.0 existing paid \n", "996 <0 30.0 existing paid \n", "997 no checking 12.0 existing paid \n", "998 <0 45.0 existing paid \n", "999 0<=X<200 45.0 critical/other existing credit \n", "\n", " purpose \n", "0 radio/tv \n", "1 radio/tv \n", "2 education \n", "3 furniture/equipment \n", "4 new car \n", ".. ... \n", "995 furniture/equipment \n", "996 used car \n", "997 radio/tv \n", "998 radio/tv \n", "999 used car \n", "\n", "[1000 rows x 4 columns]" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[:, 0:4]" ] }, { "cell_type": "markdown", "id": "9d1382fc", "metadata": {}, "source": [ "```.loc[]```也可以使用```:```,但就必須符合原本的欄位順序:" ] }, { "cell_type": "code", "execution_count": 56, "id": "c502eb8d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['checking_status', 'duration', 'credit_history', 'purpose',\n", " 'credit_amount', 'savings_status', 'employment',\n", " 'installment_commitment', 'personal_status', 'other_parties',\n", " 'residence_since', 'property_magnitude', 'age', 'other_payment_plans',\n", " 'housing', 'existing_credits', 'job', 'num_dependents', 'own_telephone',\n", " 'foreign_worker', 'class'],\n", " dtype='object')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
1
2
3
4
...
995
996
997
998
999
\n", "

1000 rows × 0 columns

\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]\n", "\n", "[1000 rows x 0 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(df.columns)\n", "\n", "# 順序錯了,選不到東西\n", "df.loc[:, \"purpose\":\"duration\"]" ] }, { "cell_type": "code", "execution_count": 57, "id": "0bc08e28", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
durationcredit_historypurpose
06.0critical/other existing creditradio/tv
148.0existing paidradio/tv
212.0critical/other existing crediteducation
342.0existing paidfurniture/equipment
424.0delayed previouslynew car
............
99512.0existing paidfurniture/equipment
99630.0existing paidused car
99712.0existing paidradio/tv
99845.0existing paidradio/tv
99945.0critical/other existing creditused car
\n", "

1000 rows × 3 columns

\n", "
" ], "text/plain": [ " duration credit_history purpose\n", "0 6.0 critical/other existing credit radio/tv\n", "1 48.0 existing paid radio/tv\n", "2 12.0 critical/other existing credit education\n", "3 42.0 existing paid furniture/equipment\n", "4 24.0 delayed previously new car\n", ".. ... ... ...\n", "995 12.0 existing paid furniture/equipment\n", "996 30.0 existing paid used car\n", "997 12.0 existing paid radio/tv\n", "998 45.0 existing paid radio/tv\n", "999 45.0 critical/other existing credit used car\n", "\n", "[1000 rows x 3 columns]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, \"duration\":\"purpose\"]" ] }, { "cell_type": "markdown", "id": "db88aa15", "metadata": {}, "source": [ "再來一種方法是使用```.filter()```搭配```items=[欄位名稱]```的參數:" ] }, { "cell_type": "code", "execution_count": 58, "id": "6e8e2a26", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agedurationcredit_amount
067.06.01169.0
122.048.05951.0
249.012.02096.0
345.042.07882.0
453.024.04870.0
............
99531.012.01736.0
99640.030.03857.0
99738.012.0804.0
99823.045.01845.0
99927.045.04576.0
\n", "

1000 rows × 3 columns

\n", "
" ], "text/plain": [ " age duration credit_amount\n", "0 67.0 6.0 1169.0\n", "1 22.0 48.0 5951.0\n", "2 49.0 12.0 2096.0\n", "3 45.0 42.0 7882.0\n", "4 53.0 24.0 4870.0\n", ".. ... ... ...\n", "995 31.0 12.0 1736.0\n", "996 40.0 30.0 3857.0\n", "997 38.0 12.0 804.0\n", "998 23.0 45.0 1845.0\n", "999 27.0 45.0 4576.0\n", "\n", "[1000 rows x 3 columns]" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter(items=['age', 'duration', 'credit_amount'])" ] }, { "cell_type": "markdown", "id": "3343400e", "metadata": {}, "source": [ "根據資料型態抓欄位:" ] }, { "cell_type": "code", "execution_count": 59, "id": "a1a378e2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
durationcredit_amountinstallment_commitmentresidence_sinceageexisting_creditsnum_dependents
06.01169.04.04.067.02.01.0
148.05951.02.02.022.01.01.0
212.02096.02.03.049.01.02.0
342.07882.02.04.045.01.02.0
424.04870.03.04.053.02.02.0
........................
99512.01736.03.04.031.01.01.0
99630.03857.04.04.040.01.01.0
99712.0804.04.04.038.01.01.0
99845.01845.04.04.023.01.01.0
99945.04576.03.04.027.01.01.0
\n", "

1000 rows × 7 columns

\n", "
" ], "text/plain": [ " duration credit_amount installment_commitment residence_since age \\\n", "0 6.0 1169.0 4.0 4.0 67.0 \n", "1 48.0 5951.0 2.0 2.0 22.0 \n", "2 12.0 2096.0 2.0 3.0 49.0 \n", "3 42.0 7882.0 2.0 4.0 45.0 \n", "4 24.0 4870.0 3.0 4.0 53.0 \n", ".. ... ... ... ... ... \n", "995 12.0 1736.0 3.0 4.0 31.0 \n", "996 30.0 3857.0 4.0 4.0 40.0 \n", "997 12.0 804.0 4.0 4.0 38.0 \n", "998 45.0 1845.0 4.0 4.0 23.0 \n", "999 45.0 4576.0 3.0 4.0 27.0 \n", "\n", " existing_credits num_dependents \n", "0 2.0 1.0 \n", "1 1.0 1.0 \n", "2 1.0 2.0 \n", "3 1.0 2.0 \n", "4 2.0 2.0 \n", ".. ... ... \n", "995 1.0 1.0 \n", "996 1.0 1.0 \n", "997 1.0 1.0 \n", "998 1.0 1.0 \n", "999 1.0 1.0 \n", "\n", "[1000 rows x 7 columns]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.select_dtypes(include='float')" ] }, { "cell_type": "markdown", "id": "55c093bf", "metadata": {}, "source": [ "根據欄位名稱的一些特徵來抓資料:" ] }, { "cell_type": "code", "execution_count": 60, "id": "4dfd4397", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statuscredit_historycredit_amountsavings_statusinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeother_payment_plansexisting_creditsnum_dependentsown_telephoneforeign_worker
0<0critical/other existing credit1169.0no known savings4.0male singleNaN4.0real estateNaN2.01.0yesyes
10<=X<200existing paid5951.0<1002.0female div/dep/marNaN2.0real estateNaN1.01.0NaNyes
2no checkingcritical/other existing credit2096.0<1002.0male singleNaN3.0real estateNaN1.02.0NaNyes
3<0existing paid7882.0<1002.0male singleguarantor4.0life insuranceNaN1.02.0NaNyes
4<0delayed previously4870.0<1003.0male singleNaN4.0no known propertyNaN2.02.0NaNyes
.............................................
995no checkingexisting paid1736.0<1003.0female div/dep/marNaN4.0real estateNaN1.01.0NaNyes
996<0existing paid3857.0<1004.0male div/sepNaN4.0life insuranceNaN1.01.0yesyes
997no checkingexisting paid804.0<1004.0male singleNaN4.0carNaN1.01.0NaNyes
998<0existing paid1845.0<1004.0male singleNaN4.0no known propertyNaN1.01.0yesyes
9990<=X<200critical/other existing credit4576.0100<=X<5003.0male singleNaN4.0carNaN1.01.0NaNyes
\n", "

1000 rows × 14 columns

\n", "
" ], "text/plain": [ " checking_status credit_history credit_amount \\\n", "0 <0 critical/other existing credit 1169.0 \n", "1 0<=X<200 existing paid 5951.0 \n", "2 no checking critical/other existing credit 2096.0 \n", "3 <0 existing paid 7882.0 \n", "4 <0 delayed previously 4870.0 \n", ".. ... ... ... \n", "995 no checking existing paid 1736.0 \n", "996 <0 existing paid 3857.0 \n", "997 no checking existing paid 804.0 \n", "998 <0 existing paid 1845.0 \n", "999 0<=X<200 critical/other existing credit 4576.0 \n", "\n", " savings_status installment_commitment personal_status \\\n", "0 no known savings 4.0 male single \n", "1 <100 2.0 female div/dep/mar \n", "2 <100 2.0 male single \n", "3 <100 2.0 male single \n", "4 <100 3.0 male single \n", ".. ... ... ... \n", "995 <100 3.0 female div/dep/mar \n", "996 <100 4.0 male div/sep \n", "997 <100 4.0 male single \n", "998 <100 4.0 male single \n", "999 100<=X<500 3.0 male single \n", "\n", " other_parties residence_since property_magnitude other_payment_plans \\\n", "0 NaN 4.0 real estate NaN \n", "1 NaN 2.0 real estate NaN \n", "2 NaN 3.0 real estate NaN \n", "3 guarantor 4.0 life insurance NaN \n", "4 NaN 4.0 no known property NaN \n", ".. ... ... ... ... \n", "995 NaN 4.0 real estate NaN \n", "996 NaN 4.0 life insurance NaN \n", "997 NaN 4.0 car NaN \n", "998 NaN 4.0 no known property NaN \n", "999 NaN 4.0 car NaN \n", "\n", " existing_credits num_dependents own_telephone foreign_worker \n", "0 2.0 1.0 yes yes \n", "1 1.0 1.0 NaN yes \n", "2 1.0 2.0 NaN yes \n", "3 1.0 2.0 NaN yes \n", "4 2.0 2.0 NaN yes \n", ".. ... ... ... ... \n", "995 1.0 1.0 NaN yes \n", "996 1.0 1.0 yes yes \n", "997 1.0 1.0 NaN yes \n", "998 1.0 1.0 yes yes \n", "999 1.0 1.0 NaN yes \n", "\n", "[1000 rows x 14 columns]" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter(like='_')" ] }, { "cell_type": "markdown", "id": "1da0f7a8", "metadata": {}, "source": [ "若要把DataFrame欄位根據指定的順序做排序的話,就必須先設定好欄位順序在list內,然後再把資料抓出來:" ] }, { "cell_type": "code", "execution_count": 61, "id": "a3c69997", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classcredit_amountagedurationexisting_credits
0good1169.067.06.02.0
1bad5951.022.048.01.0
2good2096.049.012.01.0
3good7882.045.042.01.0
4bad4870.053.024.02.0
..................
995good1736.031.012.01.0
996good3857.040.030.01.0
997good804.038.012.01.0
998bad1845.023.045.01.0
999good4576.027.045.01.0
\n", "

1000 rows × 5 columns

\n", "
" ], "text/plain": [ " class credit_amount age duration existing_credits\n", "0 good 1169.0 67.0 6.0 2.0\n", "1 bad 5951.0 22.0 48.0 1.0\n", "2 good 2096.0 49.0 12.0 1.0\n", "3 good 7882.0 45.0 42.0 1.0\n", "4 bad 4870.0 53.0 24.0 2.0\n", ".. ... ... ... ... ...\n", "995 good 1736.0 31.0 12.0 1.0\n", "996 good 3857.0 40.0 30.0 1.0\n", "997 good 804.0 38.0 12.0 1.0\n", "998 bad 1845.0 23.0 45.0 1.0\n", "999 good 4576.0 27.0 45.0 1.0\n", "\n", "[1000 rows x 5 columns]" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ordered_cols = ['class', 'credit_amount', 'age', 'duration', 'existing_credits']\n", "\n", "df[ordered_cols]" ] }, { "cell_type": "markdown", "id": "1f699cad", "metadata": {}, "source": [ "### 抓特定列的資料\n", "\n", "直接根據列名稱去抓資料的情境雖然比較少,但一樣可以透過```.loc[]```或```.iloc[]```做到:" ] }, { "cell_type": "code", "execution_count": 62, "id": "3cc0b9c4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
70<=X<20036.0existing paidused car6948.0<1001<=X<42.0male singleNaN2.0car35.0NaNrent1.0high qualif/self emp/mgmt1.0yesyesgood
8no checking12.0existing paidradio/tv3059.0>=10004<=X<72.0male div/sepNaN4.0real estate61.0NaNown1.0unskilled resident1.0NaNyesgood
90<=X<20030.0critical/other existing creditnew car5234.0<100unemployed4.0male mar/widNaN2.0car28.0NaNown2.0high qualif/self emp/mgmt1.0NaNyesbad
100<=X<20012.0existing paidnew car1295.0<100<13.0female div/dep/marNaN1.0car25.0NaNrent1.0skilled1.0NaNyesbad
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "7 0<=X<200 36.0 existing paid used car \n", "8 no checking 12.0 existing paid radio/tv \n", "9 0<=X<200 30.0 critical/other existing credit new car \n", "10 0<=X<200 12.0 existing paid new car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "7 6948.0 <100 1<=X<4 2.0 \n", "8 3059.0 >=1000 4<=X<7 2.0 \n", "9 5234.0 <100 unemployed 4.0 \n", "10 1295.0 <100 <1 3.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "7 male single NaN 2.0 car \n", "8 male div/sep NaN 4.0 real estate \n", "9 male mar/wid NaN 2.0 car \n", "10 female div/dep/mar NaN 1.0 car \n", "\n", " age other_payment_plans housing existing_credits \\\n", "7 35.0 NaN rent 1.0 \n", "8 61.0 NaN own 1.0 \n", "9 28.0 NaN own 2.0 \n", "10 25.0 NaN rent 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "7 high qualif/self emp/mgmt 1.0 yes yes \n", "8 unskilled resident 1.0 NaN yes \n", "9 high qualif/self emp/mgmt 1.0 NaN yes \n", "10 skilled 1.0 NaN yes \n", "\n", " class \n", "7 good \n", "8 good \n", "9 bad \n", "10 bad " ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[7:10, :]" ] }, { "cell_type": "markdown", "id": "3cee3e00", "metadata": {}, "source": [ "在這邊,注意到因為index的順序跟label名稱一樣,但```.loc[]```的行為是判斷label名稱,```.iloc[]```則是類似list做indexing的概念,並不會包含到指定的最後一個index:" ] }, { "cell_type": "code", "execution_count": 63, "id": "b6dd09f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
70<=X<20036.0existing paidused car6948.0<1001<=X<42.0male singleNaN2.0car35.0NaNrent1.0high qualif/self emp/mgmt1.0yesyesgood
8no checking12.0existing paidradio/tv3059.0>=10004<=X<72.0male div/sepNaN4.0real estate61.0NaNown1.0unskilled resident1.0NaNyesgood
90<=X<20030.0critical/other existing creditnew car5234.0<100unemployed4.0male mar/widNaN2.0car28.0NaNown2.0high qualif/self emp/mgmt1.0NaNyesbad
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "7 0<=X<200 36.0 existing paid used car \n", "8 no checking 12.0 existing paid radio/tv \n", "9 0<=X<200 30.0 critical/other existing credit new car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "7 6948.0 <100 1<=X<4 2.0 \n", "8 3059.0 >=1000 4<=X<7 2.0 \n", "9 5234.0 <100 unemployed 4.0 \n", "\n", " personal_status other_parties residence_since property_magnitude age \\\n", "7 male single NaN 2.0 car 35.0 \n", "8 male div/sep NaN 4.0 real estate 61.0 \n", "9 male mar/wid NaN 2.0 car 28.0 \n", "\n", " other_payment_plans housing existing_credits job \\\n", "7 NaN rent 1.0 high qualif/self emp/mgmt \n", "8 NaN own 1.0 unskilled resident \n", "9 NaN own 2.0 high qualif/self emp/mgmt \n", "\n", " num_dependents own_telephone foreign_worker class \n", "7 1.0 yes yes good \n", "8 1.0 NaN yes good \n", "9 1.0 NaN yes bad " ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[7:10, :]" ] }, { "cell_type": "markdown", "id": "30883681", "metadata": {}, "source": [ "### 依條件判斷抓取資料\n", "\n", "最直接的方法,例如要抓age > 40以上的資料:" ] }, { "cell_type": "code", "execution_count": 64, "id": "e7275ea8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0<06.0critical/other existing creditradio/tv1169.0no known savings>=74.0male singleNaN4.0real estate67.0NaNown2.0skilled1.0yesyesgood
2no checking12.0critical/other existing crediteducation2096.0<1004<=X<72.0male singleNaN3.0real estate49.0NaNown1.0unskilled resident2.0NaNyesgood
3<042.0existing paidfurniture/equipment7882.0<1004<=X<72.0male singleguarantor4.0life insurance45.0NaNfor free1.0skilled2.0NaNyesgood
4<024.0delayed previouslynew car4870.0<1001<=X<43.0male singleNaN4.0no known property53.0NaNfor free2.0skilled2.0NaNyesbad
6no checking24.0existing paidfurniture/equipment2835.0500<=X<1000>=73.0male singleNaN4.0life insurance53.0NaNown1.0skilled1.0NaNyesgood
..................................................................
978no checking24.0delayed previouslynew car2538.0<100>=74.0male singleNaN4.0car47.0NaNown2.0unskilled resident2.0NaNyesbad
9800<=X<20030.0critical/other existing creditfurniture/equipment8386.0<1004<=X<72.0male singleNaN2.0life insurance49.0NaNown1.0skilled1.0NaNyesbad
987no checking13.0existing paidradio/tv1409.0100<=X<500unemployed2.0female div/dep/marNaN4.0real estate64.0NaNown1.0skilled1.0NaNyesgood
9890<=X<20024.0critical/other existing creditradio/tv1743.0<100>=74.0male singleNaN2.0life insurance48.0NaNown2.0unskilled resident1.0NaNyesgood
994no checking12.0existing paidnew car2390.0no known savings>=74.0male singleNaN3.0car50.0NaNown1.0skilled1.0yesyesgood
\n", "

274 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "0 <0 6.0 critical/other existing credit \n", "2 no checking 12.0 critical/other existing credit \n", "3 <0 42.0 existing paid \n", "4 <0 24.0 delayed previously \n", "6 no checking 24.0 existing paid \n", ".. ... ... ... \n", "978 no checking 24.0 delayed previously \n", "980 0<=X<200 30.0 critical/other existing credit \n", "987 no checking 13.0 existing paid \n", "989 0<=X<200 24.0 critical/other existing credit \n", "994 no checking 12.0 existing paid \n", "\n", " purpose credit_amount savings_status employment \\\n", "0 radio/tv 1169.0 no known savings >=7 \n", "2 education 2096.0 <100 4<=X<7 \n", "3 furniture/equipment 7882.0 <100 4<=X<7 \n", "4 new car 4870.0 <100 1<=X<4 \n", "6 furniture/equipment 2835.0 500<=X<1000 >=7 \n", ".. ... ... ... ... \n", "978 new car 2538.0 <100 >=7 \n", "980 furniture/equipment 8386.0 <100 4<=X<7 \n", "987 radio/tv 1409.0 100<=X<500 unemployed \n", "989 radio/tv 1743.0 <100 >=7 \n", "994 new car 2390.0 no known savings >=7 \n", "\n", " installment_commitment personal_status other_parties \\\n", "0 4.0 male single NaN \n", "2 2.0 male single NaN \n", "3 2.0 male single guarantor \n", "4 3.0 male single NaN \n", "6 3.0 male single NaN \n", ".. ... ... ... \n", "978 4.0 male single NaN \n", "980 2.0 male single NaN \n", "987 2.0 female div/dep/mar NaN \n", "989 4.0 male single NaN \n", "994 4.0 male single NaN \n", "\n", " residence_since property_magnitude age other_payment_plans housing \\\n", "0 4.0 real estate 67.0 NaN own \n", "2 3.0 real estate 49.0 NaN own \n", "3 4.0 life insurance 45.0 NaN for free \n", "4 4.0 no known property 53.0 NaN for free \n", "6 4.0 life insurance 53.0 NaN own \n", ".. ... ... ... ... ... \n", "978 4.0 car 47.0 NaN own \n", "980 2.0 life insurance 49.0 NaN own \n", "987 4.0 real estate 64.0 NaN own \n", "989 2.0 life insurance 48.0 NaN own \n", "994 3.0 car 50.0 NaN own \n", "\n", " existing_credits job num_dependents own_telephone \\\n", "0 2.0 skilled 1.0 yes \n", "2 1.0 unskilled resident 2.0 NaN \n", "3 1.0 skilled 2.0 NaN \n", "4 2.0 skilled 2.0 NaN \n", "6 1.0 skilled 1.0 NaN \n", ".. ... ... ... ... \n", "978 2.0 unskilled resident 2.0 NaN \n", "980 1.0 skilled 1.0 NaN \n", "987 1.0 skilled 1.0 NaN \n", "989 2.0 unskilled resident 1.0 NaN \n", "994 1.0 skilled 1.0 yes \n", "\n", " foreign_worker class \n", "0 yes good \n", "2 yes good \n", "3 yes good \n", "4 yes bad \n", "6 yes good \n", ".. ... ... \n", "978 yes bad \n", "980 yes bad \n", "987 yes good \n", "989 yes good \n", "994 yes good \n", "\n", "[274 rows x 21 columns]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['age'] > 40]" ] }, { "cell_type": "markdown", "id": "5e1e0378", "metadata": {}, "source": [ "也可以利用```.loc[]```:" ] }, { "cell_type": "code", "execution_count": 65, "id": "11697736", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0<06.0critical/other existing creditradio/tv1169.0no known savings>=74.0male singleNaN4.0real estate67.0NaNown2.0skilled1.0yesyesgood
2no checking12.0critical/other existing crediteducation2096.0<1004<=X<72.0male singleNaN3.0real estate49.0NaNown1.0unskilled resident2.0NaNyesgood
3<042.0existing paidfurniture/equipment7882.0<1004<=X<72.0male singleguarantor4.0life insurance45.0NaNfor free1.0skilled2.0NaNyesgood
4<024.0delayed previouslynew car4870.0<1001<=X<43.0male singleNaN4.0no known property53.0NaNfor free2.0skilled2.0NaNyesbad
6no checking24.0existing paidfurniture/equipment2835.0500<=X<1000>=73.0male singleNaN4.0life insurance53.0NaNown1.0skilled1.0NaNyesgood
..................................................................
978no checking24.0delayed previouslynew car2538.0<100>=74.0male singleNaN4.0car47.0NaNown2.0unskilled resident2.0NaNyesbad
9800<=X<20030.0critical/other existing creditfurniture/equipment8386.0<1004<=X<72.0male singleNaN2.0life insurance49.0NaNown1.0skilled1.0NaNyesbad
987no checking13.0existing paidradio/tv1409.0100<=X<500unemployed2.0female div/dep/marNaN4.0real estate64.0NaNown1.0skilled1.0NaNyesgood
9890<=X<20024.0critical/other existing creditradio/tv1743.0<100>=74.0male singleNaN2.0life insurance48.0NaNown2.0unskilled resident1.0NaNyesgood
994no checking12.0existing paidnew car2390.0no known savings>=74.0male singleNaN3.0car50.0NaNown1.0skilled1.0yesyesgood
\n", "

274 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "0 <0 6.0 critical/other existing credit \n", "2 no checking 12.0 critical/other existing credit \n", "3 <0 42.0 existing paid \n", "4 <0 24.0 delayed previously \n", "6 no checking 24.0 existing paid \n", ".. ... ... ... \n", "978 no checking 24.0 delayed previously \n", "980 0<=X<200 30.0 critical/other existing credit \n", "987 no checking 13.0 existing paid \n", "989 0<=X<200 24.0 critical/other existing credit \n", "994 no checking 12.0 existing paid \n", "\n", " purpose credit_amount savings_status employment \\\n", "0 radio/tv 1169.0 no known savings >=7 \n", "2 education 2096.0 <100 4<=X<7 \n", "3 furniture/equipment 7882.0 <100 4<=X<7 \n", "4 new car 4870.0 <100 1<=X<4 \n", "6 furniture/equipment 2835.0 500<=X<1000 >=7 \n", ".. ... ... ... ... \n", "978 new car 2538.0 <100 >=7 \n", "980 furniture/equipment 8386.0 <100 4<=X<7 \n", "987 radio/tv 1409.0 100<=X<500 unemployed \n", "989 radio/tv 1743.0 <100 >=7 \n", "994 new car 2390.0 no known savings >=7 \n", "\n", " installment_commitment personal_status other_parties \\\n", "0 4.0 male single NaN \n", "2 2.0 male single NaN \n", "3 2.0 male single guarantor \n", "4 3.0 male single NaN \n", "6 3.0 male single NaN \n", ".. ... ... ... \n", "978 4.0 male single NaN \n", "980 2.0 male single NaN \n", "987 2.0 female div/dep/mar NaN \n", "989 4.0 male single NaN \n", "994 4.0 male single NaN \n", "\n", " residence_since property_magnitude age other_payment_plans housing \\\n", "0 4.0 real estate 67.0 NaN own \n", "2 3.0 real estate 49.0 NaN own \n", "3 4.0 life insurance 45.0 NaN for free \n", "4 4.0 no known property 53.0 NaN for free \n", "6 4.0 life insurance 53.0 NaN own \n", ".. ... ... ... ... ... \n", "978 4.0 car 47.0 NaN own \n", "980 2.0 life insurance 49.0 NaN own \n", "987 4.0 real estate 64.0 NaN own \n", "989 2.0 life insurance 48.0 NaN own \n", "994 3.0 car 50.0 NaN own \n", "\n", " existing_credits job num_dependents own_telephone \\\n", "0 2.0 skilled 1.0 yes \n", "2 1.0 unskilled resident 2.0 NaN \n", "3 1.0 skilled 2.0 NaN \n", "4 2.0 skilled 2.0 NaN \n", "6 1.0 skilled 1.0 NaN \n", ".. ... ... ... ... \n", "978 2.0 unskilled resident 2.0 NaN \n", "980 1.0 skilled 1.0 NaN \n", "987 1.0 skilled 1.0 NaN \n", "989 2.0 unskilled resident 1.0 NaN \n", "994 1.0 skilled 1.0 yes \n", "\n", " foreign_worker class \n", "0 yes good \n", "2 yes good \n", "3 yes good \n", "4 yes bad \n", "6 yes good \n", ".. ... ... \n", "978 yes bad \n", "980 yes bad \n", "987 yes good \n", "989 yes good \n", "994 yes good \n", "\n", "[274 rows x 21 columns]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['age'] > 40, :]" ] }, { "cell_type": "markdown", "id": "d08036d0", "metadata": {}, "source": [ "或是使用```.query()```方法:" ] }, { "cell_type": "code", "execution_count": 66, "id": "982b62ac", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
0<06.0critical/other existing creditradio/tv1169.0no known savings>=74.0male singleNaN4.0real estate67.0NaNown2.0skilled1.0yesyesgood
2no checking12.0critical/other existing crediteducation2096.0<1004<=X<72.0male singleNaN3.0real estate49.0NaNown1.0unskilled resident2.0NaNyesgood
3<042.0existing paidfurniture/equipment7882.0<1004<=X<72.0male singleguarantor4.0life insurance45.0NaNfor free1.0skilled2.0NaNyesgood
4<024.0delayed previouslynew car4870.0<1001<=X<43.0male singleNaN4.0no known property53.0NaNfor free2.0skilled2.0NaNyesbad
6no checking24.0existing paidfurniture/equipment2835.0500<=X<1000>=73.0male singleNaN4.0life insurance53.0NaNown1.0skilled1.0NaNyesgood
..................................................................
978no checking24.0delayed previouslynew car2538.0<100>=74.0male singleNaN4.0car47.0NaNown2.0unskilled resident2.0NaNyesbad
9800<=X<20030.0critical/other existing creditfurniture/equipment8386.0<1004<=X<72.0male singleNaN2.0life insurance49.0NaNown1.0skilled1.0NaNyesbad
987no checking13.0existing paidradio/tv1409.0100<=X<500unemployed2.0female div/dep/marNaN4.0real estate64.0NaNown1.0skilled1.0NaNyesgood
9890<=X<20024.0critical/other existing creditradio/tv1743.0<100>=74.0male singleNaN2.0life insurance48.0NaNown2.0unskilled resident1.0NaNyesgood
994no checking12.0existing paidnew car2390.0no known savings>=74.0male singleNaN3.0car50.0NaNown1.0skilled1.0yesyesgood
\n", "

274 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "0 <0 6.0 critical/other existing credit \n", "2 no checking 12.0 critical/other existing credit \n", "3 <0 42.0 existing paid \n", "4 <0 24.0 delayed previously \n", "6 no checking 24.0 existing paid \n", ".. ... ... ... \n", "978 no checking 24.0 delayed previously \n", "980 0<=X<200 30.0 critical/other existing credit \n", "987 no checking 13.0 existing paid \n", "989 0<=X<200 24.0 critical/other existing credit \n", "994 no checking 12.0 existing paid \n", "\n", " purpose credit_amount savings_status employment \\\n", "0 radio/tv 1169.0 no known savings >=7 \n", "2 education 2096.0 <100 4<=X<7 \n", "3 furniture/equipment 7882.0 <100 4<=X<7 \n", "4 new car 4870.0 <100 1<=X<4 \n", "6 furniture/equipment 2835.0 500<=X<1000 >=7 \n", ".. ... ... ... ... \n", "978 new car 2538.0 <100 >=7 \n", "980 furniture/equipment 8386.0 <100 4<=X<7 \n", "987 radio/tv 1409.0 100<=X<500 unemployed \n", "989 radio/tv 1743.0 <100 >=7 \n", "994 new car 2390.0 no known savings >=7 \n", "\n", " installment_commitment personal_status other_parties \\\n", "0 4.0 male single NaN \n", "2 2.0 male single NaN \n", "3 2.0 male single guarantor \n", "4 3.0 male single NaN \n", "6 3.0 male single NaN \n", ".. ... ... ... \n", "978 4.0 male single NaN \n", "980 2.0 male single NaN \n", "987 2.0 female div/dep/mar NaN \n", "989 4.0 male single NaN \n", "994 4.0 male single NaN \n", "\n", " residence_since property_magnitude age other_payment_plans housing \\\n", "0 4.0 real estate 67.0 NaN own \n", "2 3.0 real estate 49.0 NaN own \n", "3 4.0 life insurance 45.0 NaN for free \n", "4 4.0 no known property 53.0 NaN for free \n", "6 4.0 life insurance 53.0 NaN own \n", ".. ... ... ... ... ... \n", "978 4.0 car 47.0 NaN own \n", "980 2.0 life insurance 49.0 NaN own \n", "987 4.0 real estate 64.0 NaN own \n", "989 2.0 life insurance 48.0 NaN own \n", "994 3.0 car 50.0 NaN own \n", "\n", " existing_credits job num_dependents own_telephone \\\n", "0 2.0 skilled 1.0 yes \n", "2 1.0 unskilled resident 2.0 NaN \n", "3 1.0 skilled 2.0 NaN \n", "4 2.0 skilled 2.0 NaN \n", "6 1.0 skilled 1.0 NaN \n", ".. ... ... ... ... \n", "978 2.0 unskilled resident 2.0 NaN \n", "980 1.0 skilled 1.0 NaN \n", "987 1.0 skilled 1.0 NaN \n", "989 2.0 unskilled resident 1.0 NaN \n", "994 1.0 skilled 1.0 yes \n", "\n", " foreign_worker class \n", "0 yes good \n", "2 yes good \n", "3 yes good \n", "4 yes bad \n", "6 yes good \n", ".. ... ... \n", "978 yes bad \n", "980 yes bad \n", "987 yes good \n", "989 yes good \n", "994 yes good \n", "\n", "[274 rows x 21 columns]" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query('age > 40')" ] }, { "cell_type": "markdown", "id": "2f22b065", "metadata": {}, "source": [ "邏輯條件可以放多個:" ] }, { "cell_type": "code", "execution_count": 67, "id": "3d9486f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
29<060.0delayed previouslybusiness6836.0<100>=73.0male singleNaN4.0no known property63.0NaNown2.0skilled1.0yesyesbad
950<=X<20054.0no credits/all paidbusiness15945.0<100<13.0male singleNaN4.0no known property58.0NaNrent1.0skilled1.0yesyesbad
373no checking60.0critical/other existing creditnew car13756.0no known savings>=72.0male singleNaN4.0no known property63.0bankfor free1.0high qualif/self emp/mgmt1.0yesyesgood
3740<=X<20060.0all paidother14782.0100<=X<500>=73.0female div/dep/marNaN4.0no known property60.0bankfor free2.0high qualif/self emp/mgmt1.0yesyesbad
672no checking60.0existing paidnew car10366.0<100>=72.0male singleNaN4.0life insurance42.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
9380<=X<20060.0existing paideducation6288.0<1001<=X<44.0male singleNaN4.0no known property42.0NaNfor free1.0skilled1.0NaNyesbad
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "29 <0 60.0 delayed previously business \n", "95 0<=X<200 54.0 no credits/all paid business \n", "373 no checking 60.0 critical/other existing credit new car \n", "374 0<=X<200 60.0 all paid other \n", "672 no checking 60.0 existing paid new car \n", "938 0<=X<200 60.0 existing paid education \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "29 6836.0 <100 >=7 3.0 \n", "95 15945.0 <100 <1 3.0 \n", "373 13756.0 no known savings >=7 2.0 \n", "374 14782.0 100<=X<500 >=7 3.0 \n", "672 10366.0 <100 >=7 2.0 \n", "938 6288.0 <100 1<=X<4 4.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "29 male single NaN 4.0 no known property \n", "95 male single NaN 4.0 no known property \n", "373 male single NaN 4.0 no known property \n", "374 female div/dep/mar NaN 4.0 no known property \n", "672 male single NaN 4.0 life insurance \n", "938 male single NaN 4.0 no known property \n", "\n", " age other_payment_plans housing existing_credits \\\n", "29 63.0 NaN own 2.0 \n", "95 58.0 NaN rent 1.0 \n", "373 63.0 bank for free 1.0 \n", "374 60.0 bank for free 2.0 \n", "672 42.0 NaN own 1.0 \n", "938 42.0 NaN for free 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "29 skilled 1.0 yes yes \n", "95 skilled 1.0 yes yes \n", "373 high qualif/self emp/mgmt 1.0 yes yes \n", "374 high qualif/self emp/mgmt 1.0 yes yes \n", "672 high qualif/self emp/mgmt 1.0 yes yes \n", "938 skilled 1.0 NaN yes \n", "\n", " class \n", "29 bad \n", "95 bad \n", "373 good \n", "374 bad \n", "672 good \n", "938 bad " ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['age'] > 40) & (df['duration'] > 50)]" ] }, { "cell_type": "code", "execution_count": 68, "id": "3a82d10a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
29<060.0delayed previouslybusiness6836.0<100>=73.0male singleNaN4.0no known property63.0NaNown2.0skilled1.0yesyesbad
950<=X<20054.0no credits/all paidbusiness15945.0<100<13.0male singleNaN4.0no known property58.0NaNrent1.0skilled1.0yesyesbad
373no checking60.0critical/other existing creditnew car13756.0no known savings>=72.0male singleNaN4.0no known property63.0bankfor free1.0high qualif/self emp/mgmt1.0yesyesgood
3740<=X<20060.0all paidother14782.0100<=X<500>=73.0female div/dep/marNaN4.0no known property60.0bankfor free2.0high qualif/self emp/mgmt1.0yesyesbad
672no checking60.0existing paidnew car10366.0<100>=72.0male singleNaN4.0life insurance42.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
9380<=X<20060.0existing paideducation6288.0<1001<=X<44.0male singleNaN4.0no known property42.0NaNfor free1.0skilled1.0NaNyesbad
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "29 <0 60.0 delayed previously business \n", "95 0<=X<200 54.0 no credits/all paid business \n", "373 no checking 60.0 critical/other existing credit new car \n", "374 0<=X<200 60.0 all paid other \n", "672 no checking 60.0 existing paid new car \n", "938 0<=X<200 60.0 existing paid education \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "29 6836.0 <100 >=7 3.0 \n", "95 15945.0 <100 <1 3.0 \n", "373 13756.0 no known savings >=7 2.0 \n", "374 14782.0 100<=X<500 >=7 3.0 \n", "672 10366.0 <100 >=7 2.0 \n", "938 6288.0 <100 1<=X<4 4.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "29 male single NaN 4.0 no known property \n", "95 male single NaN 4.0 no known property \n", "373 male single NaN 4.0 no known property \n", "374 female div/dep/mar NaN 4.0 no known property \n", "672 male single NaN 4.0 life insurance \n", "938 male single NaN 4.0 no known property \n", "\n", " age other_payment_plans housing existing_credits \\\n", "29 63.0 NaN own 2.0 \n", "95 58.0 NaN rent 1.0 \n", "373 63.0 bank for free 1.0 \n", "374 60.0 bank for free 2.0 \n", "672 42.0 NaN own 1.0 \n", "938 42.0 NaN for free 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "29 skilled 1.0 yes yes \n", "95 skilled 1.0 yes yes \n", "373 high qualif/self emp/mgmt 1.0 yes yes \n", "374 high qualif/self emp/mgmt 1.0 yes yes \n", "672 high qualif/self emp/mgmt 1.0 yes yes \n", "938 skilled 1.0 NaN yes \n", "\n", " class \n", "29 bad \n", "95 bad \n", "373 good \n", "374 bad \n", "672 good \n", "938 bad " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query('age > 40 & duration > 50')" ] }, { "cell_type": "markdown", "id": "3edce8ae", "metadata": {}, "source": [ "為了避免條件太多,造成閱讀困難,可以做一些前處理:" ] }, { "cell_type": "code", "execution_count": 69, "id": "b7e5f5b6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
950<=X<20054.0no credits/all paidbusiness15945.0<100<13.0male singleNaN4.0no known property58.0NaNrent1.0skilled1.0yesyesbad
373no checking60.0critical/other existing creditnew car13756.0no known savings>=72.0male singleNaN4.0no known property63.0bankfor free1.0high qualif/self emp/mgmt1.0yesyesgood
3740<=X<20060.0all paidother14782.0100<=X<500>=73.0female div/dep/marNaN4.0no known property60.0bankfor free2.0high qualif/self emp/mgmt1.0yesyesbad
672no checking60.0existing paidnew car10366.0<100>=72.0male singleNaN4.0life insurance42.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "95 0<=X<200 54.0 no credits/all paid business \n", "373 no checking 60.0 critical/other existing credit new car \n", "374 0<=X<200 60.0 all paid other \n", "672 no checking 60.0 existing paid new car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "95 15945.0 <100 <1 3.0 \n", "373 13756.0 no known savings >=7 2.0 \n", "374 14782.0 100<=X<500 >=7 3.0 \n", "672 10366.0 <100 >=7 2.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "95 male single NaN 4.0 no known property \n", "373 male single NaN 4.0 no known property \n", "374 female div/dep/mar NaN 4.0 no known property \n", "672 male single NaN 4.0 life insurance \n", "\n", " age other_payment_plans housing existing_credits \\\n", "95 58.0 NaN rent 1.0 \n", "373 63.0 bank for free 1.0 \n", "374 60.0 bank for free 2.0 \n", "672 42.0 NaN own 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "95 skilled 1.0 yes yes \n", "373 high qualif/self emp/mgmt 1.0 yes yes \n", "374 high qualif/self emp/mgmt 1.0 yes yes \n", "672 high qualif/self emp/mgmt 1.0 yes yes \n", "\n", " class \n", "95 bad \n", "373 good \n", "374 bad \n", "672 good " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = df['age'] > 40\n", "c2 = df['duration'] > 50\n", "c3 = df['credit_amount'] > 10000\n", "\n", "cond = c1 & c2 & c3\n", "\n", "df[cond]" ] }, { "cell_type": "markdown", "id": "be889b47", "metadata": {}, "source": [ "或是搭配f-string以及```.query()```:" ] }, { "cell_type": "code", "execution_count": 70, "id": "10857339", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
950<=X<20054.0no credits/all paidbusiness15945.0<100<13.0male singleNaN4.0no known property58.0NaNrent1.0skilled1.0yesyesbad
373no checking60.0critical/other existing creditnew car13756.0no known savings>=72.0male singleNaN4.0no known property63.0bankfor free1.0high qualif/self emp/mgmt1.0yesyesgood
3740<=X<20060.0all paidother14782.0100<=X<500>=73.0female div/dep/marNaN4.0no known property60.0bankfor free2.0high qualif/self emp/mgmt1.0yesyesbad
672no checking60.0existing paidnew car10366.0<100>=72.0male singleNaN4.0life insurance42.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "95 0<=X<200 54.0 no credits/all paid business \n", "373 no checking 60.0 critical/other existing credit new car \n", "374 0<=X<200 60.0 all paid other \n", "672 no checking 60.0 existing paid new car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "95 15945.0 <100 <1 3.0 \n", "373 13756.0 no known savings >=7 2.0 \n", "374 14782.0 100<=X<500 >=7 3.0 \n", "672 10366.0 <100 >=7 2.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "95 male single NaN 4.0 no known property \n", "373 male single NaN 4.0 no known property \n", "374 female div/dep/mar NaN 4.0 no known property \n", "672 male single NaN 4.0 life insurance \n", "\n", " age other_payment_plans housing existing_credits \\\n", "95 58.0 NaN rent 1.0 \n", "373 63.0 bank for free 1.0 \n", "374 60.0 bank for free 2.0 \n", "672 42.0 NaN own 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "95 skilled 1.0 yes yes \n", "373 high qualif/self emp/mgmt 1.0 yes yes \n", "374 high qualif/self emp/mgmt 1.0 yes yes \n", "672 high qualif/self emp/mgmt 1.0 yes yes \n", "\n", " class \n", "95 bad \n", "373 good \n", "374 bad \n", "672 good " ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = 'age > 40'\n", "c2 = 'duration > 50'\n", "c3 = 'credit_amount > 10000'\n", "\n", "cond = f'{c1} & {c2} & {c3}'\n", "\n", "df.query(cond)" ] }, { "cell_type": "markdown", "id": "ea02d1fe", "metadata": {}, "source": [ "其他根據欄位值篩選,常用的方法有:" ] }, { "cell_type": "code", "execution_count": 71, "id": "bdddcea4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
5no checking36.0existing paideducation9055.0no known savings1<=X<42.0male singleNaN4.0no known property35.0NaNfor free1.0unskilled resident2.0yesyesgood
70<=X<20036.0existing paidused car6948.0<1001<=X<42.0male singleNaN2.0car35.0NaNrent1.0high qualif/self emp/mgmt1.0yesyesgood
15<024.0existing paidradio/tv1282.0100<=X<5001<=X<44.0female div/dep/marNaN2.0car32.0NaNown1.0unskilled resident1.0NaNyesbad
19no checking24.0existing paidradio/tv3430.0500<=X<1000>=73.0male singleNaN2.0car31.0NaNown1.0skilled2.0yesyesgood
280<=X<2007.0existing paidradio/tv2415.0<1001<=X<43.0male singleguarantor2.0real estate34.0NaNown1.0skilled1.0NaNyesgood
..................................................................
984no checking24.0critical/other existing creditfurniture/equipment2028.0<1004<=X<72.0male singleNaN2.0life insurance30.0NaNown2.0unskilled resident1.0NaNyesgood
986>=20042.0no credits/all paidbusiness6289.0<100<12.0male div/sepNaN1.0life insurance33.0NaNown2.0skilled1.0NaNyesgood
991no checking15.0all paidradio/tv1569.0100<=X<500>=74.0male singleNaN4.0car34.0bankown1.0unskilled resident2.0NaNyesgood
993<036.0existing paidfurniture/equipment3959.0<100unemployed4.0male singleNaN3.0life insurance30.0NaNown1.0high qualif/self emp/mgmt1.0yesyesgood
995no checking12.0existing paidfurniture/equipment1736.0<1004<=X<73.0female div/dep/marNaN4.0real estate31.0NaNown1.0unskilled resident1.0NaNyesgood
\n", "

217 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "5 no checking 36.0 existing paid \n", "7 0<=X<200 36.0 existing paid \n", "15 <0 24.0 existing paid \n", "19 no checking 24.0 existing paid \n", "28 0<=X<200 7.0 existing paid \n", ".. ... ... ... \n", "984 no checking 24.0 critical/other existing credit \n", "986 >=200 42.0 no credits/all paid \n", "991 no checking 15.0 all paid \n", "993 <0 36.0 existing paid \n", "995 no checking 12.0 existing paid \n", "\n", " purpose credit_amount savings_status employment \\\n", "5 education 9055.0 no known savings 1<=X<4 \n", "7 used car 6948.0 <100 1<=X<4 \n", "15 radio/tv 1282.0 100<=X<500 1<=X<4 \n", "19 radio/tv 3430.0 500<=X<1000 >=7 \n", "28 radio/tv 2415.0 <100 1<=X<4 \n", ".. ... ... ... ... \n", "984 furniture/equipment 2028.0 <100 4<=X<7 \n", "986 business 6289.0 <100 <1 \n", "991 radio/tv 1569.0 100<=X<500 >=7 \n", "993 furniture/equipment 3959.0 <100 unemployed \n", "995 furniture/equipment 1736.0 <100 4<=X<7 \n", "\n", " installment_commitment personal_status other_parties \\\n", "5 2.0 male single NaN \n", "7 2.0 male single NaN \n", "15 4.0 female div/dep/mar NaN \n", "19 3.0 male single NaN \n", "28 3.0 male single guarantor \n", ".. ... ... ... \n", "984 2.0 male single NaN \n", "986 2.0 male div/sep NaN \n", "991 4.0 male single NaN \n", "993 4.0 male single NaN \n", "995 3.0 female div/dep/mar NaN \n", "\n", " residence_since property_magnitude age other_payment_plans housing \\\n", "5 4.0 no known property 35.0 NaN for free \n", "7 2.0 car 35.0 NaN rent \n", "15 2.0 car 32.0 NaN own \n", "19 2.0 car 31.0 NaN own \n", "28 2.0 real estate 34.0 NaN own \n", ".. ... ... ... ... ... \n", "984 2.0 life insurance 30.0 NaN own \n", "986 1.0 life insurance 33.0 NaN own \n", "991 4.0 car 34.0 bank own \n", "993 3.0 life insurance 30.0 NaN own \n", "995 4.0 real estate 31.0 NaN own \n", "\n", " existing_credits job num_dependents \\\n", "5 1.0 unskilled resident 2.0 \n", "7 1.0 high qualif/self emp/mgmt 1.0 \n", "15 1.0 unskilled resident 1.0 \n", "19 1.0 skilled 2.0 \n", "28 1.0 skilled 1.0 \n", ".. ... ... ... \n", "984 2.0 unskilled resident 1.0 \n", "986 2.0 skilled 1.0 \n", "991 1.0 unskilled resident 2.0 \n", "993 1.0 high qualif/self emp/mgmt 1.0 \n", "995 1.0 unskilled resident 1.0 \n", "\n", " own_telephone foreign_worker class \n", "5 yes yes good \n", "7 yes yes good \n", "15 NaN yes bad \n", "19 yes yes good \n", "28 NaN yes good \n", ".. ... ... ... \n", "984 NaN yes good \n", "986 NaN yes good \n", "991 NaN yes good \n", "993 yes yes good \n", "995 NaN yes good \n", "\n", "[217 rows x 21 columns]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['age'].between(30, 35)]" ] }, { "cell_type": "markdown", "id": "15a64eb7", "metadata": {}, "source": [ "或是根據欄位值清單篩選:" ] }, { "cell_type": "code", "execution_count": 72, "id": "3ac61905", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
4<024.0delayed previouslynew car4870.0<1001<=X<43.0male singleNaN4.0no known property53.0NaNfor free2.0skilled2.0NaNyesbad
90<=X<20030.0critical/other existing creditnew car5234.0<100unemployed4.0male mar/widNaN2.0car28.0NaNown2.0high qualif/self emp/mgmt1.0NaNyesbad
100<=X<20012.0existing paidnew car1295.0<100<13.0female div/dep/marNaN1.0car25.0NaNrent1.0skilled1.0NaNyesbad
13<024.0critical/other existing creditnew car1199.0<100>=74.0male singleNaN4.0car60.0NaNown2.0unskilled resident1.0NaNyesbad
14<015.0existing paidnew car1403.0<1001<=X<42.0female div/dep/marNaN4.0car28.0NaNrent1.0skilled1.0NaNyesgood
..................................................................
972<024.0all paidnew car1193.0<100unemployed1.0female div/dep/marco applicant4.0no known property29.0NaNrent2.0unemp/unskilled non res1.0NaNyesbad
978no checking24.0delayed previouslynew car2538.0<100>=74.0male singleNaN4.0car47.0NaNown2.0unskilled resident2.0NaNyesbad
9790<=X<20015.0all paidnew car1264.0100<=X<5001<=X<42.0male mar/widNaN2.0life insurance25.0NaNrent1.0skilled1.0NaNyesbad
982>=20021.0existing paidnew car2923.0100<=X<5001<=X<41.0female div/dep/marNaN1.0car28.0bankown1.0high qualif/self emp/mgmt1.0yesyesgood
994no checking12.0existing paidnew car2390.0no known savings>=74.0male singleNaN3.0car50.0NaNown1.0skilled1.0yesyesgood
\n", "

234 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history purpose \\\n", "4 <0 24.0 delayed previously new car \n", "9 0<=X<200 30.0 critical/other existing credit new car \n", "10 0<=X<200 12.0 existing paid new car \n", "13 <0 24.0 critical/other existing credit new car \n", "14 <0 15.0 existing paid new car \n", ".. ... ... ... ... \n", "972 <0 24.0 all paid new car \n", "978 no checking 24.0 delayed previously new car \n", "979 0<=X<200 15.0 all paid new car \n", "982 >=200 21.0 existing paid new car \n", "994 no checking 12.0 existing paid new car \n", "\n", " credit_amount savings_status employment installment_commitment \\\n", "4 4870.0 <100 1<=X<4 3.0 \n", "9 5234.0 <100 unemployed 4.0 \n", "10 1295.0 <100 <1 3.0 \n", "13 1199.0 <100 >=7 4.0 \n", "14 1403.0 <100 1<=X<4 2.0 \n", ".. ... ... ... ... \n", "972 1193.0 <100 unemployed 1.0 \n", "978 2538.0 <100 >=7 4.0 \n", "979 1264.0 100<=X<500 1<=X<4 2.0 \n", "982 2923.0 100<=X<500 1<=X<4 1.0 \n", "994 2390.0 no known savings >=7 4.0 \n", "\n", " personal_status other_parties residence_since property_magnitude \\\n", "4 male single NaN 4.0 no known property \n", "9 male mar/wid NaN 2.0 car \n", "10 female div/dep/mar NaN 1.0 car \n", "13 male single NaN 4.0 car \n", "14 female div/dep/mar NaN 4.0 car \n", ".. ... ... ... ... \n", "972 female div/dep/mar co applicant 4.0 no known property \n", "978 male single NaN 4.0 car \n", "979 male mar/wid NaN 2.0 life insurance \n", "982 female div/dep/mar NaN 1.0 car \n", "994 male single NaN 3.0 car \n", "\n", " age other_payment_plans housing existing_credits \\\n", "4 53.0 NaN for free 2.0 \n", "9 28.0 NaN own 2.0 \n", "10 25.0 NaN rent 1.0 \n", "13 60.0 NaN own 2.0 \n", "14 28.0 NaN rent 1.0 \n", ".. ... ... ... ... \n", "972 29.0 NaN rent 2.0 \n", "978 47.0 NaN own 2.0 \n", "979 25.0 NaN rent 1.0 \n", "982 28.0 bank own 1.0 \n", "994 50.0 NaN own 1.0 \n", "\n", " job num_dependents own_telephone foreign_worker \\\n", "4 skilled 2.0 NaN yes \n", "9 high qualif/self emp/mgmt 1.0 NaN yes \n", "10 skilled 1.0 NaN yes \n", "13 unskilled resident 1.0 NaN yes \n", "14 skilled 1.0 NaN yes \n", ".. ... ... ... ... \n", "972 unemp/unskilled non res 1.0 NaN yes \n", "978 unskilled resident 2.0 NaN yes \n", "979 skilled 1.0 NaN yes \n", "982 high qualif/self emp/mgmt 1.0 yes yes \n", "994 skilled 1.0 yes yes \n", "\n", " class \n", "4 bad \n", "9 bad \n", "10 bad \n", "13 bad \n", "14 good \n", ".. ... \n", "972 bad \n", "978 bad \n", "979 bad \n", "982 good \n", "994 good \n", "\n", "[234 rows x 21 columns]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['purpose'].isin(['new car', 'used card'])]" ] }, { "cell_type": "markdown", "id": "072e8269", "metadata": {}, "source": [ "### 排序資料\n", "\n", "要排序資料可以使用前面用過的```.sort_values()```語法,參數```by```指定欄位,```ascending```指定方向:" ] }, { "cell_type": "code", "execution_count": 73, "id": "05b690f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
checking_statusdurationcredit_historypurposecredit_amountsavings_statusemploymentinstallment_commitmentpersonal_statusother_partiesresidence_sinceproperty_magnitudeageother_payment_planshousingexisting_creditsjobnum_dependentsown_telephoneforeign_workerclass
9150<=X<20048.0no credits/all paidother18424.0<1001<=X<41.0female div/dep/marNaN2.0life insurance32.0bankown1.0high qualif/self emp/mgmt1.0yesnobad
950<=X<20054.0no credits/all paidbusiness15945.0<100<13.0male singleNaN4.0no known property58.0NaNrent1.0skilled1.0yesyesbad
818<036.0existing paidother15857.0<100unemployed2.0male div/sepco applicant3.0car43.0NaNown1.0high qualif/self emp/mgmt1.0NaNyesgood
8870<=X<20048.0existing paidbusiness15672.0<1001<=X<42.0male singleNaN2.0car23.0NaNown1.0skilled1.0yesyesbad
637no checking60.0delayed previouslyradio/tv15653.0<1004<=X<72.0male singleNaN4.0car21.0NaNown2.0skilled1.0yesyesgood
..................................................................
458<06.0existing paiddomestic appliance343.0<100<14.0female div/dep/marNaN1.0real estate27.0NaNown1.0skilled1.0NaNyesgood
157<012.0all paidretraining339.0<100>=74.0male mar/widNaN1.0car45.0bankown1.0unskilled resident1.0NaNyesgood
177<06.0critical/other existing creditradio/tv338.0500<=X<1000>=74.0male singleNaN4.0car52.0NaNown2.0skilled1.0NaNyesgood
3090<=X<2009.0existing paidnew car276.0<1001<=X<44.0male mar/widNaN4.0real estate22.0NaNrent1.0unskilled resident1.0NaNyesgood
725no checking6.0critical/other existing creditnew car250.0>=10001<=X<42.0female div/dep/marNaN2.0real estate41.0bankown2.0unskilled resident1.0NaNyesgood
\n", "

1000 rows × 21 columns

\n", "
" ], "text/plain": [ " checking_status duration credit_history \\\n", "915 0<=X<200 48.0 no credits/all paid \n", "95 0<=X<200 54.0 no credits/all paid \n", "818 <0 36.0 existing paid \n", "887 0<=X<200 48.0 existing paid \n", "637 no checking 60.0 delayed previously \n", ".. ... ... ... \n", "458 <0 6.0 existing paid \n", "157 <0 12.0 all paid \n", "177 <0 6.0 critical/other existing credit \n", "309 0<=X<200 9.0 existing paid \n", "725 no checking 6.0 critical/other existing credit \n", "\n", " purpose credit_amount savings_status employment \\\n", "915 other 18424.0 <100 1<=X<4 \n", "95 business 15945.0 <100 <1 \n", "818 other 15857.0 <100 unemployed \n", "887 business 15672.0 <100 1<=X<4 \n", "637 radio/tv 15653.0 <100 4<=X<7 \n", ".. ... ... ... ... \n", "458 domestic appliance 343.0 <100 <1 \n", "157 retraining 339.0 <100 >=7 \n", "177 radio/tv 338.0 500<=X<1000 >=7 \n", "309 new car 276.0 <100 1<=X<4 \n", "725 new car 250.0 >=1000 1<=X<4 \n", "\n", " installment_commitment personal_status other_parties \\\n", "915 1.0 female div/dep/mar NaN \n", "95 3.0 male single NaN \n", "818 2.0 male div/sep co applicant \n", "887 2.0 male single NaN \n", "637 2.0 male single NaN \n", ".. ... ... ... \n", "458 4.0 female div/dep/mar NaN \n", "157 4.0 male mar/wid NaN \n", "177 4.0 male single NaN \n", "309 4.0 male mar/wid NaN \n", "725 2.0 female div/dep/mar NaN \n", "\n", " residence_since property_magnitude age other_payment_plans housing \\\n", "915 2.0 life insurance 32.0 bank own \n", "95 4.0 no known property 58.0 NaN rent \n", "818 3.0 car 43.0 NaN own \n", "887 2.0 car 23.0 NaN own \n", "637 4.0 car 21.0 NaN own \n", ".. ... ... ... ... ... \n", "458 1.0 real estate 27.0 NaN own \n", "157 1.0 car 45.0 bank own \n", "177 4.0 car 52.0 NaN own \n", "309 4.0 real estate 22.0 NaN rent \n", "725 2.0 real estate 41.0 bank own \n", "\n", " existing_credits job num_dependents \\\n", "915 1.0 high qualif/self emp/mgmt 1.0 \n", "95 1.0 skilled 1.0 \n", "818 1.0 high qualif/self emp/mgmt 1.0 \n", "887 1.0 skilled 1.0 \n", "637 2.0 skilled 1.0 \n", ".. ... ... ... \n", "458 1.0 skilled 1.0 \n", "157 1.0 unskilled resident 1.0 \n", "177 2.0 skilled 1.0 \n", "309 1.0 unskilled resident 1.0 \n", "725 2.0 unskilled resident 1.0 \n", "\n", " own_telephone foreign_worker class \n", "915 yes no bad \n", "95 yes yes bad \n", "818 NaN yes good \n", "887 yes yes bad \n", "637 yes yes good \n", ".. ... ... ... \n", "458 NaN yes good \n", "157 NaN yes good \n", "177 NaN yes good \n", "309 NaN yes good \n", "725 NaN yes good \n", "\n", "[1000 rows x 21 columns]" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by='credit_amount', ascending=False)" ] }, { "cell_type": "markdown", "id": "f0312953", "metadata": {}, "source": [ "另外,若要根據index排序,可以使用```.sort_index()```。例如,想將資料根據年齡統計筆數後,查看年紀最小的幾個樣本數:" ] }, { "cell_type": "code", "execution_count": 74, "id": "4c9cbfc3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "age\n", "19.0 2\n", "20.0 14\n", "21.0 14\n", "22.0 27\n", "23.0 48\n", "24.0 44\n", "25.0 41\n", "26.0 50\n", "27.0 51\n", "28.0 43\n", "Name: count, dtype: int64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['age'].value_counts().sort_index().head(10)" ] } ], "metadata": { "jupytext": { "formats": "md:myst", "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.16.0" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" }, "source_map": [ 13, 35, 37, 41, 55, 59, 61, 67, 69, 73, 75, 79, 81, 91, 93, 97, 101, 106, 108, 127, 131, 139, 141, 145, 147, 151, 153, 157, 159, 165, 167, 171, 173, 177, 179, 185, 187, 191, 193, 201, 203, 213, 215, 225, 227, 231, 233, 237, 239, 243, 245, 253, 255, 261, 263, 269, 271, 275, 280, 284, 286, 290, 292, 296, 298, 302, 304, 308, 310, 315, 317, 321, 325, 327, 331, 335, 337, 343, 347, 351, 355, 359, 363, 367, 371, 373, 377, 381, 383, 395, 397, 401, 403, 407, 409, 413, 415, 419, 421, 425, 427, 431, 438, 440, 444, 446, 450, 452, 456, 458, 462, 466, 472, 474, 478, 480, 486, 488, 492, 494, 498, 500, 504, 508, 510, 514, 522, 526, 534, 538, 540, 544, 546, 554, 556, 560 ] }, "nbformat": 4, "nbformat_minor": 5 }