{ "cells": [ { "cell_type": "markdown", "id": "7ee4ca52", "metadata": {}, "source": [ "# Pandas常用操作\n", "\n", "上一章的內容主要介紹如何擷取資料的基本資訊,例如資料的維度、大略地看資料內容、統計值、遺漏值、欄位有無異常值等等。\n", "\n", "本節開始要介紹對DataFrame或是欄位的一些操作及轉換。\n", "\n", "同樣使用相同的資料,首先讀入資料:" ] }, { "cell_type": "code", "execution_count": 1, "id": "ce23df29", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('./data/credit_customers.csv')" ] }, { "cell_type": "markdown", "id": "4eff1ad3", "metadata": {}, "source": [ "## 基本操作\n", "\n", "**改變欄位名稱**\n", "\n", "由於其中一個欄位名稱\"class\"與python關鍵字相同,故建議是更改名稱,避免後許使用的困擾。\n", "\n", "此外,剛好該欄位是該資料集用來預測是否違約的標籤,因此可以命名為\"label\"。" ] }, { "cell_type": "code", "execution_count": 2, "id": "43ac8314", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | checking_status | \n", "duration | \n", "credit_history | \n", "purpose | \n", "credit_amount | \n", "savings_status | \n", "employment | \n", "installment_commitment | \n", "personal_status | \n", "other_parties | \n", "... | \n", "property_magnitude | \n", "age | \n", "other_payment_plans | \n", "housing | \n", "existing_credits | \n", "job | \n", "num_dependents | \n", "own_telephone | \n", "foreign_worker | \n", "label | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "<0 | \n", "6.0 | \n", "critical/other existing credit | \n", "radio/tv | \n", "1169.0 | \n", "no known savings | \n", ">=7 | \n", "4.0 | \n", "male single | \n", "NaN | \n", "... | \n", "real estate | \n", "67.0 | \n", "NaN | \n", "own | \n", "2.0 | \n", "skilled | \n", "1.0 | \n", "yes | \n", "yes | \n", "good | \n", "
1 | \n", "0<=X<200 | \n", "48.0 | \n", "existing paid | \n", "radio/tv | \n", "5951.0 | \n", "<100 | \n", "1<=X<4 | \n", "2.0 | \n", "female div/dep/mar | \n", "NaN | \n", "... | \n", "real estate | \n", "22.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "skilled | \n", "1.0 | \n", "NaN | \n", "yes | \n", "bad | \n", "
2 | \n", "no checking | \n", "12.0 | \n", "critical/other existing credit | \n", "education | \n", "2096.0 | \n", "<100 | \n", "4<=X<7 | \n", "2.0 | \n", "male single | \n", "NaN | \n", "... | \n", "real estate | \n", "49.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "unskilled resident | \n", "2.0 | \n", "NaN | \n", "yes | \n", "good | \n", "
3 | \n", "<0 | \n", "42.0 | \n", "existing paid | \n", "furniture/equipment | \n", "7882.0 | \n", "<100 | \n", "4<=X<7 | \n", "2.0 | \n", "male single | \n", "guarantor | \n", "... | \n", "life insurance | \n", "45.0 | \n", "NaN | \n", "for free | \n", "1.0 | \n", "skilled | \n", "2.0 | \n", "NaN | \n", "yes | \n", "good | \n", "
4 | \n", "<0 | \n", "24.0 | \n", "delayed previously | \n", "new car | \n", "4870.0 | \n", "<100 | \n", "1<=X<4 | \n", "3.0 | \n", "male single | \n", "NaN | \n", "... | \n", "no known property | \n", "53.0 | \n", "NaN | \n", "for free | \n", "2.0 | \n", "skilled | \n", "2.0 | \n", "NaN | \n", "yes | \n", "bad | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "no checking | \n", "12.0 | \n", "existing paid | \n", "furniture/equipment | \n", "1736.0 | \n", "<100 | \n", "4<=X<7 | \n", "3.0 | \n", "female div/dep/mar | \n", "NaN | \n", "... | \n", "real estate | \n", "31.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "unskilled resident | \n", "1.0 | \n", "NaN | \n", "yes | \n", "good | \n", "
996 | \n", "<0 | \n", "30.0 | \n", "existing paid | \n", "used car | \n", "3857.0 | \n", "<100 | \n", "1<=X<4 | \n", "4.0 | \n", "male div/sep | \n", "NaN | \n", "... | \n", "life insurance | \n", "40.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "high qualif/self emp/mgmt | \n", "1.0 | \n", "yes | \n", "yes | \n", "good | \n", "
997 | \n", "no checking | \n", "12.0 | \n", "existing paid | \n", "radio/tv | \n", "804.0 | \n", "<100 | \n", ">=7 | \n", "4.0 | \n", "male single | \n", "NaN | \n", "... | \n", "car | \n", "38.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "skilled | \n", "1.0 | \n", "NaN | \n", "yes | \n", "good | \n", "
998 | \n", "<0 | \n", "45.0 | \n", "existing paid | \n", "radio/tv | \n", "1845.0 | \n", "<100 | \n", "1<=X<4 | \n", "4.0 | \n", "male single | \n", "NaN | \n", "... | \n", "no known property | \n", "23.0 | \n", "NaN | \n", "for free | \n", "1.0 | \n", "skilled | \n", "1.0 | \n", "yes | \n", "yes | \n", "bad | \n", "
999 | \n", "0<=X<200 | \n", "45.0 | \n", "critical/other existing credit | \n", "used car | \n", "4576.0 | \n", "100<=X<500 | \n", "unemployed | \n", "3.0 | \n", "male single | \n", "NaN | \n", "... | \n", "car | \n", "27.0 | \n", "NaN | \n", "own | \n", "1.0 | \n", "skilled | \n", "1.0 | \n", "NaN | \n", "yes | \n", "good | \n", "
1000 rows × 21 columns
\n", "\n", " | label | \n", "label_new | \n", "
---|---|---|
0 | \n", "good | \n", "NaN | \n", "
1 | \n", "bad | \n", "1.0 | \n", "
2 | \n", "good | \n", "NaN | \n", "
3 | \n", "good | \n", "NaN | \n", "
4 | \n", "bad | \n", "1.0 | \n", "
... | \n", "... | \n", "... | \n", "
995 | \n", "good | \n", "NaN | \n", "
996 | \n", "good | \n", "NaN | \n", "
997 | \n", "good | \n", "NaN | \n", "
998 | \n", "bad | \n", "1.0 | \n", "
999 | \n", "good | \n", "NaN | \n", "
1000 rows × 2 columns
\n", "