PyTorch-26H-5

主页：https://www.freecodecamp.org/news/learn-pytorch-for-deep-learning-in-day/

youtub：https://youtu.be/V_xro1bcAuA

github：https://github.com/mrdbourke/pytorch-deep-learning

Learn PyTorch for Deep Learning: Zero to Mastery book：https://www.learnpytorch.io/

PyTorch documentation：https://pytorch.org/docs/stable/index.html

找到一个数据集，将数据集转换为数字，建立一个模型（或找到一个现有模型）以在这些数字中找到可用于预测的模式。

PyTorch 有许多内置数据集，可用于广泛的机器学习基准测试，但是，您通常希望使用自己的自定义数据集。

What is a custom dataset? 什么是自定义数据集？

自定义数据集是与您正在处理的特定问题相关的数据集合。

本质上，自定义数据集几乎可以包含任何内容。

例如，如果我们正在构建像Nutrify这样的食物图像分类应用程序，我们的自定义数据集可能是食物图像。

或者，如果我们尝试建立一个模型来对网站上的基于文本的评论是正面的还是负面的进行分类，我们的自定义数据集可能是现有客户评论及其评级的示例。

或者，如果我们尝试构建声音分类应用程序，我们的自定义数据集可能是声音样本及其样本标签。

或者，如果我们尝试为在我们的网站上购买商品的客户建立推荐系统，我们的自定义数据集可能是其他人购买过的产品的示例。

PyTorch 包含许多现有函数，可加载 TorchVision, TorchText, TorchAudio and TorchRec 域库中的各种自定义数据集。

但有时这些现有的功能可能还不够。

在这种情况下，我们总是可以 torch.utils.data.Dataset 根据自己的喜好对其进行子类化和定制。

What we’re going to cover? 将要讨论的内容

使用 torchvision.datasets 以及我们自己的自定义 Dataset 类来加载食物图像，然后构建一个 PyTorch 计算机视觉模型，希望能够对它们进行分类。

话题	内容
0. 导入 PyTorch 并设置与设备无关的代码	加载 PyTorch，代码设置为与设备无关。
1. 获取数据	使用自己的披萨、牛排和寿司图像数据集
2. 与数据融为一体（数据准备）	在开始任何新的机器学习问题时，了解正在处理的数据至关重要。一些步骤来弄清楚我们拥有哪些数据。
3. 转换数据	通常，获得的数据并不能 100% 地用于机器学习模型，在这里我们将介绍可以采取的一些步骤来转换图像，以便它们可以用于模型。
4. 使用 ImageFolder 加载数据（选项 1）	PyTorch 有许多针对常见数据类型的内置数据加载函数。如果我们的图像是标准图像分类格式，ImageFolder 会很有用。
5. 使用自定义数据集加载图像数据	如果 PyTorch 没有内置函数来加载数据怎么办？这时我们可以构建自己的 torch.utils.data.Dataset 自定义子类。
6. 其他形式的变换（数据增强）	数据增强是扩展训练数据多样性的常用技术。探索 torchvision 的一些内置数据增强功能。
7. 模型 0：未进行数据增强的 TinyVGG	已经准备好数据，建立一个能够拟合它的模型。创建一些训练和测试函数来训练和评估我们的模型。
8. 探索损失曲线	损失曲线是查看模型如何随时间训练/改进的好方法。它也是查看模型是欠拟合还是过拟合的好方法。
9. 模型 1：具有数据增强的 TinyVGG	已经尝试了一个没有数据增强的模型，那尝试一个有数据增强的模型怎么样？
10. 比较模型结果	比较不同模型的损失曲线，看看哪个表现更好，并讨论一些提高性能的选项。
11. 对自定义图像进行预测	模型是在披萨、牛排和寿司图像的数据集上进行训练的。介绍如何使用我们训练过的模型来预测现有数据集之外的图像。

0. Importing PyTorch and setting up device-agnostic code 导入 PyTorch 并设置与设备无关的代码

import torch
from torch import nn

# Note: this notebook requires torch >= 1.10.0
torch.__version__

'2.4.1'

1
2
3

# Setup device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

1. Get data

机器学习是一个迭代过程，从小处着手，逐渐取得成效，并在必要时不断增强。

使用的数据是Food101 数据集的一个子集。
Food101 是流行的计算机视觉基准，它包含 101 种不同食物的 1000 张图像，总计 101,000 张图像（75,750 张训练集和 25,250 张测试集）。
不会从 101 个食物类别开始，而是从 3 个开始：披萨、牛排和寿司。
我们不是每个类别有 1,000 张图像，而是从随机的 10% 开始（从小处开始，必要时增加）。

原始Food101 数据集和论文网站。
torchvision.datasets.Food101- 我为这本笔记本下载的数据版本。
extras/04_custom_data_creation.ipynb- 我用来格式化 Food101 数据集以供此笔记本使用的笔记本。
data/pizza_steak_sushi.zip- 使用上面链接的笔记本创建的 Food101 披萨、牛排和寿司图片的 zip 档案。

注意：即将使用的数据集已预先格式化，以适应我们的用途。但是，无论你正在处理什么问题，你通常都必须格式化自己的数据集。这是机器学习领域的常规做法。

需要去google Colab下载

import requests
import zipfile
from pathlib import Path

# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it... 
if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)
    
    # Download pizza, steak, sushi data
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
        request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
        print("Downloading pizza, steak, sushi data...")
        f.write(request.content)

    # Unzip pizza, steak, sushi data
    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
        print("Unzipping pizza, steak, sushi data...") 
        zip_ref.extractall(image_path)

1
2
3

Did not find data/pizza_steak_sushi directory, creating one...
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...

2. Become one with the data (data preparation) 与数据融为一体（数据准备）

在开始一个项目或建立任何类型的模型之前，了解正在处理的数据非常重要。

在我案例中，有标准图像分类格式的披萨、牛排和寿司图像。
图像分类格式包含位于单独目录中的不同类别的图像，这些类别以特定的类名命名。
例如，所有pizza图像都包含在pizza/目录中。
这种格式在许多不同的图像分类基准中很流行，包括ImageNet（最流行的计算机视觉基准数据集）。
可以在下面看到存储格式的示例，图像数量是任意的。

pizza_steak_sushi/ <- overall dataset folder
    train/ <- training images
        pizza/ <- class name as folder name
            image01.jpeg
            image02.jpeg
            ...
        steak/
            image24.jpeg
            image25.jpeg
            ...
        sushi/
            image37.jpeg
            ...

    test/ <- testing images
        pizza/
            image101.jpeg
            image102.jpeg
            ...
        steak/
            image154.jpeg
            image155.jpeg
            ...
        sushi/
            image167.jpeg
            ...

采用这种数据存储结构并将其转换为可供 PyTorch 使用的数据集。

注意：您处理的数据结构将根据您正在处理的问题而有所不同。但前提仍然存在：与数据融为一体，然后找到一种最佳方法将其转换为与 PyTorch 兼容的数据集。

通过编写一个小辅助函数来遍历每个子目录并计算存在的文件数量，从而检查数据目录中的内容。

使用 Python 的内置os.walk()。

import os
def walk_through_dir(dir_path):
  """
  Walks through dir_path returning its contents.
  Args:
    dir_path (str or pathlib.Path): target directory
  
  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

1	walk_through_dir(image_path)

There are 2 directories and 0 images in 'data\pizza_steak_sushi'.
There are 3 directories and 0 images in 'data\pizza_steak_sushi\test'.
There are 0 directories and 25 images in 'data\pizza_steak_sushi\test\pizza'.
There are 0 directories and 19 images in 'data\pizza_steak_sushi\test\steak'.
There are 0 directories and 31 images in 'data\pizza_steak_sushi\test\sushi'.
There are 3 directories and 0 images in 'data\pizza_steak_sushi\train'.
There are 0 directories and 78 images in 'data\pizza_steak_sushi\train\pizza'.
There are 0 directories and 75 images in 'data\pizza_steak_sushi\train\steak'.
There are 0 directories and 72 images in 'data\pizza_steak_sushi\train\sushi'.

每个训练类大约有 75 张图像，每个测试类大约有 25 张图像。图像是原始 Food101 数据集的子集。

设置一下训练和测试路径。

# Setup train and testing paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

1 2	(WindowsPath('data/pizza_steak_sushi/train'), WindowsPath('data/pizza_steak_sushi/test'))

2.1 Visualize an image 可视化图像

使用 pathlib.Path.glob() 获取所有图像路径，以查找所有以 .jpg 结尾的文件。
使用 Python 的 random.choice().选择一个随机图像路径。
使用 pathlib.Path.parent.stem. 获取图像类名。
由于我们正在处理图像，我们将使用 PIL.Image.open()（PIL 代表 Python 图像库）打开随机图像路径。
然后我们将显示图像并打印一些元数据。

import random
from PIL import Image

# Set seed
random.seed(42) # <- try changing this and see what happens

# 1. Get all image paths (* means "any combination") 获取所有图像路径（* 表示“任意组合”）
image_path_list = list(image_path.glob("*/*/*.jpg"))

# 2. Get random image path 获取随机图像路径
random_image_path = random.choice(image_path_list)

# 3. Get image class from path name (the image class is the name of the directory where the image is stored) 从路径名获取图像类（图像类是存储图像的目录名称）
image_class = random_image_path.parent.stem

# 4. Open image 打开图片
img = Image.open(random_image_path)

# 5. Print metadata 打印元数据
print(f"Random image path: {random_image_path}")
print(f"Image class: {image_class}")
print(f"Image height: {img.height}") 
print(f"Image width: {img.width}")
img

Random image path: data\pizza_steak_sushi\test\sushi\2394442.jpg
Image class: sushi
Image height: 408
Image width: 512

我们可以使用 matplotlib.pyplot.imshow() 执行相同操作，但我们必须先将图像转换为 NumPy 数组。

import numpy as np
import matplotlib.pyplot as plt

# Turn the image into an array
img_as_array = np.asarray(img)

# Plot the image with matplotlib
plt.figure(figsize=(10, 7))
plt.imshow(img_as_array)
plt.title(f"Image class: {image_class} | Image shape: {img_as_array.shape} -> [height, width, color_channels]")
plt.axis(False);

3. Transforming data 转换数据

使用 PyTorch 使用图像数据之前，我们需要：

将其转换为张量（我们图像的数值表示）。
将其变成torch.utils.data.Dataset和随后的torch.utils.data.DataLoader，我们简称为Dataset和DataLoader。

PyTorch 有几种不同类型的预构建数据集和数据集加载器，具体取决于您正在处理的问题。

问题空间	预建数据集和函数
视觉 Vision	`torchvision.datasets`
音频 Audio	`torchaudio.datasets`
文本 Text	`torchtext.datasets`
推荐系统 Recommendation system	`torchrec.datasets`

由于我们正在处理视觉问题，因此我们将研究torchvision.datasets数据加载功能以及torchvision.transforms如何准备数据。

引入基本库

1
2
3

import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

3.1 Transforming data with `torchvision.transforms`

有图像文件夹，但在使用 PyTorch 之前，我们需要将它们转换为张量。
可以做到这一点的方法之一是使用 torchvision.transforms 模块。
torchvision.transforms包含许多预先构建的方法，用于格式化图像、将它们转换为张量，甚至操纵它们以进行数据增强（改变数据以使模型更难学习的做法，我们稍后会看到）目的。

编写一系列转换步骤：

使用 transforms.Resize() 调整图像大小（从大约 512x512 到 64x64，与 CNN Explainer website 网站上的图像形状相同）。
使用 transforms.RandomHorizontalFlip() 在水平方向上随机翻转图像（这可以被视为一种数据增强形式，因为它会人为地改变我们的图像数据）。
使用 transforms.ToTensor()将图像从 PIL 图像转换为 PyTorch 张量。

我们可以使用 torchvision.transforms.Compose() 编译所有这些步骤。

# Write transform for image
# 编写图像变换
data_transform = transforms.Compose([
    # Resize the images to 64x64
    # 将图像大小调整为 64x64
    transforms.Resize(size=(64, 64)),
    # Flip the images randomly on the horizontal
    # 水平随机翻转图像
    transforms.RandomHorizontalFlip(p=0.5), # p = probability of flip, 0.5 = 50% chance## p = 翻转概率，0.5 = 50% 概率
    # Turn the image into a torch.Tensor
    # 将图像转换为 torch.Tensor
    transforms.ToTensor() # this also converts all pixel values from 0 to 255 to be between 0.0 and 1.0  # 这还将所有像素值从 0 到 255 转换为 0.0 到 1.0 之间
])

现在我们已经有了变换的组合，让我们编写一个函数来在各种图像上尝试它们。

def plot_transformed_images(image_paths, transform, n=3, seed=42):
    """Plots a series of random images from image_paths.

    Will open n image paths from image_paths, transform them
    with transform and plot them side by side.

    Args:
        image_paths (list): List of target image paths. 
        transform (PyTorch Transforms): Transforms to apply to images.
        n (int, optional): Number of images to plot. Defaults to 3.
        seed (int, optional): Random seed for the random generator. Defaults to 42.
    """
    random.seed(seed)
    random_image_paths = random.sample(image_paths, k=n)
    for image_path in random_image_paths:
        with Image.open(image_path) as f:
            fig, ax = plt.subplots(1, 2)
            ax[0].imshow(f) 
            ax[0].set_title(f"Original \nSize: {f.size}")
            ax[0].axis("off")

            # Transform and plot image
            # Note: permute() will change shape of image to suit matplotlib 
            # (PyTorch default is [C, H, W] but Matplotlib is [H, W, C])
            transformed_image = transform(f).permute(1, 2, 0) 
            ax[1].imshow(transformed_image) 
            ax[1].set_title(f"Transformed \nSize: {transformed_image.shape}")
            ax[1].axis("off")

            fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)

plot_transformed_images(image_path_list, 
                        transform=data_transform, 
                        n=3)

可以使用 torchvision.transforms 将图像转换为张量。

如果需要，还会操纵它们的大小和方向（某些模型更喜欢不同大小和形状的图像）。
一般来说，图像的形状越大，模型能够恢复的信息就越多。
例如，大小为 [256, 256, 3] 的图像的像素数将比大小为 [64, 64, 3] 的图像多 16 倍 ((256 256 3) / (64 64 3) = 16)。
代价是像素越多，计算量就越大。

注释掉 data_transform 中的一个转换并再次运行绘图函数 plot_transformed_images()，会发生什么？

4. Option 1: Loading Image Data Using `ImageFolder`

是时候将我们的图像数据转换为能够与 PyTorch 一起使用的数据集了。
由于数据是标准图像分类格式，可以使用类torchvision.datasets.ImageFolder。
可以在其中传递目标图像目录的文件路径以及我们想要对图像执行的一系列转换。
在数据文件夹 train_dir 和 test_dir 上进行测试，传入 transform=data_transform 以将图像转换为张量。

# Use ImageFolder to create dataset(s)
from torchvision import datasets
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir, 
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 225
    Root location: data\pizza_steak_sushi\train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 75
    Root location: data\pizza_steak_sushi\test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
           )

通过检查 classes 和 class_to_idx 属性以及训练和测试集的长度来检查它们。

1
2
3

# Get class names as a list
class_names = train_data.classes
class_names

1	['pizza', 'steak', 'sushi']

1
2
3

# Can also get class names as a dict
class_dict = train_data.class_to_idx
class_dict

1	{'pizza': 0, 'steak': 1, 'sushi': 2}

1 2	# Check the lengths len(train_data), len(test_data)

(225, 75)

在 train_data 和 test_data 数据集上建立索引来查找样本及其目标标签。

img, label = train_data[0][0], train_data[0][1]
print(f"Image tensor:\n{img}")
print(f"Image shape: {img.shape}")
print(f"Image datatype: {img.dtype}")
print(f"Image label: {label}")
print(f"Label datatype: {type(label)}")

Image tensor:
tensor([[[0.1176, 0.1216, 0.1255,  ..., 0.0980, 0.1020, 0.1137],
         [0.1294, 0.1294, 0.1294,  ..., 0.0980, 0.0980, 0.1059],
         [0.1333, 0.1333, 0.1333,  ..., 0.0941, 0.0980, 0.1020],
         ...,
         [0.1686, 0.1647, 0.1686,  ..., 0.1255, 0.1098, 0.1098],
         [0.1686, 0.1647, 0.1686,  ..., 0.1098, 0.0941, 0.0863],
         [0.1647, 0.1647, 0.1686,  ..., 0.0980, 0.0863, 0.0863]],

        [[0.0588, 0.0588, 0.0588,  ..., 0.0745, 0.0706, 0.0745],
         [0.0627, 0.0627, 0.0627,  ..., 0.0745, 0.0706, 0.0706],
         [0.0706, 0.0706, 0.0706,  ..., 0.0745, 0.0745, 0.0706],
         ...,
         [0.2392, 0.2392, 0.2510,  ..., 0.1373, 0.1333, 0.1255],
         [0.2314, 0.2392, 0.2510,  ..., 0.1255, 0.1176, 0.1098],
         [0.2275, 0.2353, 0.2431,  ..., 0.1137, 0.1059, 0.1020]],

        [[0.0196, 0.0196, 0.0196,  ..., 0.0902, 0.0902, 0.0941],
         [0.0196, 0.0157, 0.0196,  ..., 0.0902, 0.0863, 0.0902],
         [0.0196, 0.0157, 0.0157,  ..., 0.0902, 0.0902, 0.0902],
         ...,
         [0.1804, 0.1882, 0.1961,  ..., 0.1490, 0.1333, 0.1294],
         [0.1804, 0.1843, 0.1922,  ..., 0.1255, 0.1137, 0.1098],
         [0.1765, 0.1804, 0.1843,  ..., 0.1059, 0.1020, 0.1059]]])
Image shape: torch.Size([3, 64, 64])
Image datatype: torch.float32
Image label: 0
Label datatype: <class 'int'>

图像现在采用张量的形式（形状为 [3, 64, 64]），标签采用与特定类相关的整数形式（由 class_to_idx 属性引用）。

我们如何使用 matplotlib 绘制单个图像张量？
我们首先必须进行置换（重新排列其维度的顺序）以使其兼容。
现在我们的图像尺寸采用 CHW（颜色通道、高度、宽度）格式，但 matplotlib 更喜欢 HWC（高度、宽度、颜色通道）。

# Rearrange the order of dimensions
img_permute = img.permute(1, 2, 0)

# Print out different shapes (before and after permute)
print(f"Original shape: {img.shape} -> [color_channels, height, width]")
print(f"Image permute shape: {img_permute.shape} -> [height, width, color_channels]")

# Plot the image
plt.figure(figsize=(10, 7))
plt.imshow(img.permute(1, 2, 0))
plt.axis("off")
plt.title(class_names[label], fontsize=14);

1 2	Original shape: torch.Size([3, 64, 64]) -> [color_channels, height, width] Image permute shape: torch.Size([64, 64, 3]) -> [height, width, color_channels]

请注意，图像现在更加像素化（质量更低）。

这是因为图像的大小从 512x512 调整为 64x64 像素。

这里的直觉是，如果认为图像更难识别发生了什么，那么模型也可能会发现它更难理解。

4.1 Turn loaded images into DataLoader’s

我们已经将图像作为 PyTorch 数据集，但现在让我们将它们转换为 DataLoader。

我们将使用 torch.utils.data.DataLoader 进行此操作。

将数据集转换为 DataLoader 使它们可迭代，因此模型可以遍历并了解样本和目标（特征和标签）之间的关系。

为简单起见，我们将使用 batch_size=1 和 num_workers=1。

num_workers 是什么：它定义了将创建多少个子进程来加载您的数据。
num_workers 设置的值越高，PyTorch 将使用越多的计算能力来加载您的数据。
通常通过 Python 的 os.cpu_count() 将其设置为我机器上的 CPU 总数。
这可确保 DataLoader 使用尽可能多的核心来加载数据。

注意：使用PyTorch documentation中的 torch.utils.data.DataLoader 熟悉更多参数。

# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader
train_dataloader = DataLoader(dataset=train_data, 
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              shuffle=True) # shuffle the data?

test_dataloader = DataLoader(dataset=test_data, 
                             batch_size=1, 
                             num_workers=1, 
                             shuffle=False) # don't usually need to shuffle testing data

train_dataloader, test_dataloader

1 2	(<torch.utils.data.dataloader.DataLoader at 0x277ca8a84c0>, <torch.utils.data.dataloader.DataLoader at 0x277ca8a83a0>)

现在我们的数据是可迭代的，让我们尝试一下并检查形状。

img, label = next(iter(train_dataloader))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label.shape}")

1 2	Image shape: torch.Size([1, 3, 64, 64]) -> [batch_size, color_channels, height, width] Label shape: torch.Size([1])

现在可以将这些 DataLoader 与训练和测试循环一起使用来训练模型。

但在此之前，先看看加载图像（或几乎任何其他类型的数据）的另一种选择。

5. Option 2: Loading Image Data with a Custom `Dataset` 使用自定义数据集

如果没有预先构建的 Dataset 创建器（如 torchvision.datasets.ImageFolder()）怎么办？

或者没有针对特定问题的数据集创建器？自己构建。
创建自己的 Custom Dataset 加载方式有什么优缺点？

创建自定义数据集的优点	创建自定义数据集的缺点
几乎任何东西都可以创建数据集。	尽管你可以用几乎任何东西创建数据集，但这并不意味着它会起作用。
不限于 PyTorch 预构建的 Dataset 函数。	使用自定义数据集通常会导致编写更多代码，这很容易出现错误或性能问题。

要查看实际效果，让我们通过子类化 torch.utils.data.Dataset（PyTorch 中所有 Dataset 的基类）来复制 torchvision.datasets.ImageFolder()。

导入所需的模块：

用于处理目录的 Python os（我们的数据存储在目录中）。
用于处理文件路径的 Python pathlib（我们的每张图片都有一个唯一的文件路径）。
用于所有 PyTorch 内容的 torch。
用于加载图像的 PIL Image类。
用于子类化并创建我们自己的自定义 Dataset 的 torchvision.transforms 将我们的图像转换为张量。
来自 Python 的 typing 模块的各种类型，用于将类型提示添加到我们的代码中。

注意：您可以根据自己的数据集自定义以下步骤。前提是：编写代码以您想要的格式加载数据。

import os
import pathlib
import torch

from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
from typing import Tuple, Dict, List

还记得我们的 torchvision.datasets.ImageFolder() 实例如何允许我们使用 classes 和 class_to_idx 属性吗？

1 2	# Instance of torchvision.datasets.ImageFolder() train_data.classes, train_data.class_to_idx

1	(['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2})

5.1 Creating a helper function to get class names 创建一个辅助函数来获取类

编写一个辅助函数，该函数能够在给定目录路径的情况下创建类名列表和类名及其索引的字典。

使用 os.scandir() 遍历目标目录（理想情况下，目录采用标准图像分类格式）获取类名。
如果未找到类名，则引发错误（如果发生这种情况，则目录结构可能有问题）。
将类名转换为数字标签字典，每个类一个。

在编写完整函数之前，让我们先看第 1 步的一个小例子。

# Setup path for target directory
target_directory = train_dir
print(f"Target directory: {target_directory}")

# Get the class names from the target directory
class_names_found = sorted([entry.name for entry in list(os.scandir(image_path / "train"))])
print(f"Class names found: {class_names_found}")

1 2	Target directory: data\pizza_steak_sushi\train Class names found: ['pizza', 'steak', 'sushi']

如何将它变成一个完整的功能？

# Make function to find classes in target directory
def find_classes(directory: str) -> Tuple[List[str], Dict[str, int]]:
    """Finds the class folder names in a target directory.
    
    Assumes target directory is in standard image classification format.

    Args:
        directory (str): target directory to load classnames from.

    Returns:
        Tuple[List[str], Dict[str, int]]: (list_of_class_names, dict(class_name: idx...))
    
    Example:
        find_classes("food_images/train")
        >>> (["class_1", "class_2"], {"class_1": 0, ...})
    """
    # 1. Get the class names by scanning the target directory
    # 1. 通过扫描目标目录获取类名
    classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
    
    # 2. Raise an error if class names not found
    # 2. 如果找不到类名则抛出错误
    if not classes:
        raise FileNotFoundError(f"Couldn't find any classes in {directory}.")
        
    # 3. Create a dictionary of index labels (computers prefer numerical rather than string labels)
    # 3. 创建索引标签词典（计算机更喜欢数字而不是字符串标签）
    class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
    return classes, class_to_idx

1	find_classes(train_dir)

1	(['pizza', 'steak', 'sushi'], {'pizza': 0, 'steak': 1, 'sushi': 2})

5.2 Create a custom `Dataset` to replicate `ImageFolder` 创建自定义Dataset来复制 ImageFolder

准备好构建自己的自定义Dataset。
构建一个数据集来复制 torchvision.datasets.ImageFolder() 的功能。
这将是一个很好的做法，此外，它还会揭示创建自己的自定义数据集所需的一些步骤。

实现步骤：

torch.utils.data.Dataset 的子类。
使用 targ_dir 参数（目标数据目录）和 transform 参数初始化我们的子类（这样我们就可以选择在需要时转换数据）。
为paths（目标图像的路径）、transform（我们可能想要使用的转换，可以是 None）、classes 和 class_to_idx（来自我们的 find_classes() 函数）创建几个属性。
创建一个函数来从文件加载图像并返回它们，这可以使用 PIL 或 torchvision.io（用于视觉数据的输入/输出）。
覆盖 torch.utils.data.Dataset 的 __len__ 方法以返回 Dataset 中的样本数，这是推荐的，但不是必需的。这样就可以调用 len(Dataset)。
覆盖 torch.utils.data.Dataset 的__getitem__ 方法以从数据集返回单个样本，这是必需的。

# Write a custom dataset class (inherits from torch.utils.data.Dataset)
# 编写自定义数据集类（继承自 torch.utils.data.Dataset）
from torch.utils.data import Dataset

# 1. Subclass torch.utils.data.Dataset
# 1. 子类 torch.utils.data.Dataset
class ImageFolderCustom(Dataset):
    
    # 2. Initialize with a targ_dir and transform (optional) parameter
    # 2. 使用 targ_dir 和 transform （可选）参数进行初始化
    def __init__(self, targ_dir: str, transform=None) -> None:
        
        # 3. Create class attributes
        # 3. 创建类属性
        # Get all image paths
        # 获取所有图片路径
        self.paths = list(pathlib.Path(targ_dir).glob("*/*.jpg")) # note: you'd have to update this if you've got .png's or .jpeg's # 注意：如果你有 .png 或 .jpeg 则必须更新它
        
        # Setup transforms
        # 设置变换
        self.transform = transform
        
        # Create classes and class_to_idx attributes
        # 创建类和 class_to_idx 属性
        self.classes, self.class_to_idx = find_classes(targ_dir)

    # 4. Make function to load images
    # 4. 制作加载图像的函数
    def load_image(self, index: int) -> Image.Image:
        "Opens an image via a path and returns it."
        image_path = self.paths[index]
        return Image.open(image_path) 
    
    # 5. Overwrite the __len__() method (optional but recommended for subclasses of torch.utils.data.Dataset)
    # 5. 重写__len__()方法（可选，但建议用于torch.utils.data.Dataset的子类）
    def __len__(self) -> int:
        "Returns the total number of samples."
        return len(self.paths)
    
    # 6. Overwrite the __getitem__() method (required for subclasses of torch.utils.data.Dataset)
    # 6. 重写 __getitem__() 方法（torch.utils.data.Dataset 子类所需）
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
        "Returns one sample of data, data and label (X, y)."
        img = self.load_image(index)
        class_name  = self.paths[index].parent.name # expects path in data_folder/class_name/image.jpeg # 期望路径在 data_folder/class_name/image.jpeg 中
        class_idx = self.class_to_idx[class_name]

        # Transform if necessary
        # 必要时进行转换
        if self.transform:
            return self.transform(img), class_idx # return data, label (X, y)
        else:
            return img, class_idx # return data, label (X, y)

加载图像需要一大堆代码。这是创建自定义Dataset的缺点之一。

但是，现在我们已经编写了一次，我们可以将其与其他一些有用的数据函数一起移到 .py 文件中，例如 data_loader.py，并在以后重复使用。

在测试新的 ImageFolderCustom 类之前，让我们创建一些转换来准备图像。

# Augment train data
train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])

# Don't augment test data, only reshape
test_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])

使用自己的 ImageFolderCustom 类将我们的训练图像（包含在 train_dir 中）和测试图像（包含在 test_dir 中）转换为数据集。

train_data_custom = ImageFolderCustom(targ_dir=train_dir, 
                                      transform=train_transforms)
test_data_custom = ImageFolderCustom(targ_dir=test_dir, 
                                     transform=test_transforms)
train_data_custom, test_data_custom

1 2	(<__main__.ImageFolderCustom at 0x277ca8b76a0>, <__main__.ImageFolderCustom at 0x277ca8b7d90>)

让我们尝试在新的数据集上调用 len() 并找到 classes 和 class_to_idx 属性。

1	len(train_data_custom), len(test_data_custom)

(225, 75)

1	train_data_custom.classes

1	['pizza', 'steak', 'sushi']

1	train_data_custom.class_to_idx

1	{'pizza': 0, 'steak': 1, 'sushi': 2}

len(test_data_custom) == len(test_data) and len(test_data_custom) == len(test_data)

我们也可以检查与由 torchvision.datasets.ImageFolder() 类创建的 Dataset 是否相等。

# Check for equality amongst our custom Dataset and ImageFolder Dataset
print((len(train_data_custom) == len(train_data)) & (len(test_data_custom) == len(test_data)))
print(train_data_custom.classes == train_data.classes)
print(train_data_custom.class_to_idx == train_data.class_to_idx)

1
2
3

True
True
True

我们如何进一步绘制一些随机图像来测试我们的__getitem__覆写？

5.3 Create a function to display random images 创建一个显示随机图像的函数

创建一个名为的辅助函数display_random_images()，帮助我们在Dataset‘s 中可视化图像。

接收一个 Dataset 和许多其他参数，例如 classes （我们的目标类的名称）、要显示的图像数量（n）和随机种子。
为了防止显示失控，我们将 n 限制为 10 张图像。
设置可重现图的随机种子（如果设置了seed）。
获取随机样本索引列表（我们可以使用 Python 的 random.sample() 进行绘制）。
设置 matplotlib 图。
循环遍历步骤 4 中找到的随机样本索引并使用 matplotlib 绘制它们。
确保样本图像的形状为 HWC（高度、宽度、颜色通道），以便我们可以绘制它们。

# 1. Take in a Dataset as well as a list of class names
# 1. 获取数据集以及类名列表
def display_random_images(dataset: torch.utils.data.dataset.Dataset,
                          classes: List[str] = None,
                          n: int = 10,
                          display_shape: bool = True,
                          seed: int = None):
    
    # 2. Adjust display if n too high
    # 2. 如果 n 太高则调整显示
    if n > 10:
        n = 10
        display_shape = False
        print(f"For display purposes, n shouldn't be larger than 10, setting to 10 and removing shape display.")
    
    # 3. Set random seed
    # 3. 设置随机种子
    if seed:
        random.seed(seed)

    # 4. Get random sample indexes
    # 4. 获取随机样本索引
    random_samples_idx = random.sample(range(len(dataset)), k=n)

    # 5. Setup plot
    # 5. 设置plot
    plt.figure(figsize=(16, 8))

    # 6. Loop through samples and display random samples 
    # 6. 循环遍历样本并显示随机样本
    for i, targ_sample in enumerate(random_samples_idx):
        targ_image, targ_label = dataset[targ_sample][0], dataset[targ_sample][1]

        # 7. Adjust image tensor shape for plotting: [color_channels, height, width] -> [color_channels, height, width]
        # 7. 调整图像张量形状以便绘图：[color_channels，height，width] -> [color_channels，height，width]
        targ_image_adjust = targ_image.permute(1, 2, 0)

        # Plot adjusted samples
        # 绘制调整后的样本
        plt.subplot(1, n, i+1)
        plt.imshow(targ_image_adjust)
        plt.axis("off")
        if classes:
            title = f"class: {classes[targ_label]}"
            if display_shape:
                title = title + f"\nshape: {targ_image_adjust.shape}"
        plt.title(title)

先使用用 torchvision.datasets.ImageFolder() 创建的数据集进行测试。

# Display random images from ImageFolder created Dataset
display_random_images(train_data, 
                      n=5, 
                      classes=class_names,
                      seed=None)

使用自己的 ImageFolderCustom 创建的数据集。

# Display random images from ImageFolderCustom Dataset
display_random_images(train_data_custom, 
                      n=12, 
                      classes=class_names,
                      seed=None) # Try setting the seed for reproducible images

5.4 Turn custom loaded images into DataLoader’s 将自定义加载的图像转换DataLoader

有一种方法可以通过 ImageFolderCustom 类将原始图像转换为 Dataset（将特征映射到标签或将 X 映射到 y）。
如何将自定义 Dataset 转换为 DataLoader？
使用 torch.utils.data.DataLoader()
由于自定义 Dataset 是 torch.utils.data.Dataset 的子类，因此可以直接通过 torch.utils.data.DataLoader() 使用。
可以使用与之前非常相似的步骤，只是这次我们将使用我们自定义创建的 Dataset。

# Turn train and test custom Dataset's into DataLoader's
from torch.utils.data import DataLoader
train_dataloader_custom = DataLoader(dataset=train_data_custom, # use custom created train Dataset
                                     batch_size=1, # how many samples per batch?
                                     num_workers=0, # how many subprocesses to use for data loading? (higher = more)
                                     shuffle=True) # shuffle the data?

test_dataloader_custom = DataLoader(dataset=test_data_custom, # use custom created test Dataset
                                    batch_size=1, 
                                    num_workers=0, 
                                    shuffle=False) # don't usually need to shuffle testing data

train_dataloader_custom, test_dataloader_custom

1 2	(<torch.utils.data.dataloader.DataLoader at 0x277cc3e9580>, <torch.utils.data.dataloader.DataLoader at 0x277cc3e94c0>)

样本的形状看起来是否相同？

# Get image and label from custom DataLoader
img_custom, label_custom = next(iter(train_dataloader_custom))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img_custom.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label_custom.shape}")

1 2	Image shape: torch.Size([1, 3, 64, 64]) -> [batch_size, color_channels, height, width] Label shape: torch.Size([1])

现在让我们看一下其他形式的数据转换。

6. Other forms of transforms (data augmentation) 其他形式的变换（数据增强）

还有很多数据上的变换，参考 torchvision.transforms documentation。

变换的目的是以某种方式改变图像。将图像变成张量，或者裁剪它或随机删除一部分或随机旋转它们，进行这些类型的转换通常称为数据增强。

数据增强是以人为增加训练集多样性的方式改变数据的过程。
在这个人工改变的数据集上训练模型有望产生一个能够更好地泛化的模型（它学习的模式对未来看不见的示例更具鲁棒性）。

在 Illustration of Transforms example.中看到使用 torchvision.transforms 对图像执行数据增强的许多不同示例。

机器学习就是要利用随机性的力量，研究表明，随机变换（如 t transforms.RandAugment()和 transforms.TrivialAugmentWide()）通常比手工挑选的变换表现更好。背后的想法是：TrivialAugment

您有一组变换，您可以随机选择其中的一些变换在图像上执行，并在给定范围内以随机幅度执行（幅度越大，强度越大）。

PyTorch 团队甚至使用used TrivialAugment it to train their latest state-of-the-art vision models.来训练他们最新的先进视觉模型。

TrivialAugment 是最近对各种 PyTorch 视觉模型进行先进训练升级时使用的要素之一。

我们如何在我们自己的一些图像上测试它？

在 transforms.TrivialAugmentWide() 中要注意的主要参数是 num_magnitude_bins=31。它定义了将选择强度值的范围以应用特定变换，0 表示无范围，31 表示最大范围（最高强度的可能性最高）。
我们可以将 transforms.TrivialAugmentWide() 合并到 transforms.Compose() 中。

from torchvision import transforms

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.TrivialAugmentWide(num_magnitude_bins=31), # how intense 
    transforms.ToTensor() # use ToTensor() last to get everything between 0 & 1
])

# Don't need to perform augmentation on the test data
test_transforms = transforms.Compose([
    transforms.Resize((224, 224)), 
    transforms.ToTensor()
])

注意：通常不会对测试集执行数据增强。数据增强的理念是人为地增加训练集的多样性，以便更好地预测测试集。
但是，需要确保将测试集图像转换为张量。将测试图像的大小调整为与训练图像相同的大小，但是，如果需要，可以对不同大小的图像进行推理（尽管这可能会改变性能）。

目前有了训练转换（有数据增强）和测试转换（没有数据增强）。

测试一下数据增强：

# Get all image paths
image_path_list = list(image_path.glob("*/*/*.jpg"))

# Plot random images
plot_transformed_images(
    image_paths=image_path_list,
    transform=train_transforms,
    n=3,
    seed=None
)

尝试运行上述单元格几次，并观察原始图像在转换过程中如何变化。

7. Model 0: TinyVGG without data augmentation

已经了解了如何将数据从文件夹中的图像转换为转换后的张量。

构建一个计算机视觉模型，看看能否对图像是披萨、牛排还是寿司进行分类。

首先，将从一个简单的转换开始，只将图像的大小调整为 (64, 64) 并将它们转换为张量。

7.1 Creating transforms and loading data for Model 0 为模型 0 创建变换并加载数据

# Create simple transform
simple_transform = transforms.Compose([ 
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

有了一个简单的转换：

加载数据，首先使用 torchvision.datasets.ImageFolder() 将我们的每个训练和测试文件夹转换为数据集。
然后使用 torch.utils.data.DataLoader() 转换为 DataLoader。
我们将 batch_size=32 和 num_workers 设置为机器上的 CPU 数量（这取决于使用的机器）。

# 1. Load and transform data
# 1. 加载和转换数据
from torchvision import datasets
train_data_simple = datasets.ImageFolder(root=train_dir, transform=simple_transform)
test_data_simple = datasets.ImageFolder(root=test_dir, transform=simple_transform)

# 2. Turn data into DataLoaders
# 2. 将数据转换为 DataLoaders
import os
from torch.utils.data import DataLoader

# Setup batch size and number of workers 
# 设置批次大小和工人数量
BATCH_SIZE = 32
NUM_WORKERS = os.cpu_count()
print(f"Creating DataLoader's with batch size {BATCH_SIZE} and {NUM_WORKERS} workers.")

# Create DataLoader's
# 创建 DataLoader
train_dataloader_simple = DataLoader(train_data_simple, 
                                     batch_size=BATCH_SIZE, 
                                     shuffle=True, 
                                     num_workers=NUM_WORKERS)

test_dataloader_simple = DataLoader(test_data_simple, 
                                    batch_size=BATCH_SIZE, 
                                    shuffle=False, 
                                    num_workers=NUM_WORKERS)

train_dataloader_simple, test_dataloader_simple

1
2
3

Creating DataLoader's with batch size 32 and 96 workers.
(<torch.utils.data.dataloader.DataLoader at 0x277ce4fc220>,
 <torch.utils.data.dataloader.DataLoader at 0x277ce4fcca0>)

7.2 Create TinyVGG model class

在上一节中，我们使用了 CNN Explainer website上的 TinyVGG 模型。

让我们重新创建相同的模型，但这次我们将使用彩色图像而不是灰度图像（对于 RGB 像素，in_channels=3 而不是 in_channels=1）。

class TinyVGG(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, 
                      out_channels=hidden_units, 
                      kernel_size=3, # how big is the square that's going over the image?
                      stride=1, # default
                      padding=1), # options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number 
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, 
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2) # default stride value is same as kernel_size
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our input data.
            nn.Linear(in_features=hidden_units*16*16,
                      out_features=output_shape)
        )
    
    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        # print(x.shape)
        x = self.conv_block_2(x)
        # print(x.shape)
        x = self.classifier(x)
        # print(x.shape)
        return x
        # return self.classifier(self.conv_block_2(self.conv_block_1(x))) # <- leverage the benefits of operator fusion

torch.manual_seed(42)
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
                  hidden_units=10, 
                  output_shape=len(train_data.classes)).to(device)
model_0

TinyVGG(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

注意：加速深度学习模型在 GPU 上的计算的方法之一是利用运算符融合operator fusion。
这意味着在我们上述模型的 forward() 方法中，我们不是每次都调用一个层块并重新分配 x，而是连续调用每个块（请参阅上述模型中 forward() 方法的最后一行作为示例）。
这节省了重新分配 x 所花费的时间（占用大量内存），并且只专注于计算 x。
请参阅 Horace He 的《从第一原理开始让深度学习变得很棒 Making Deep Learning Go Brrrr From First Principles》，了解更多有关如何加速机器学习模型的方法。

现在，这是一个漂亮的模型！用单张图片的前向传递来测试一下怎么样？

7.3 Try a forward pass on a single image (to test the model) 尝试在单个图像上进行前向传递（以测试模型）

测试模型的一个好方法是对单个数据进行前向传递，这也是测试不同层的输入和输出形状的便捷方法。

要对单个图像进行前向传递：

从 DataLoader 获取一批图像和标签。
从批次中获取单个图像并 unsqueeze() 该图像，使其批次大小为 1（因此其形状适合模型）。
对单个图像执行推理（确保将图像发送到目标设备）。
打印出正在发生的事情并使用 torch.softmax() 将模型的原始输出 logits 转换为预测概率（因为我们正在处理多类数据），并使用 torch.argmax() 将预测概率转换为预测标签。

# 1. Get a batch of images and labels from the DataLoader
# 1.从DataLoader获取一批图像和标签
img_batch, label_batch = next(iter(train_dataloader_simple))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
# 2. 从批次中获取单个图像并解压图像，使其形状适合模型
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
# 3. 对单个图像执行前向传递
model_0.eval()
with torch.inference_mode():
    pred = model_0(img_single.to(device))
    
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
# 4. 打印出正在发生的事情并转换模型 logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")

Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[0.0578, 0.0634, 0.0352]], device='cuda:0')

Output prediction probabilities:
tensor([[0.3352, 0.3371, 0.3277]], device='cuda:0')

Output prediction label:
tensor([1], device='cuda:0')

Actual label:
2

太棒了，看起来我们的模型输出的正是我们期望它输出的内容。
可以运行上面的单元格几次，每次预测不同的图像。
会注意到预测经常是错误的。
这是可以预料到的，因为模型尚未经过训练，它本质上是使用随机权重进行猜测。

7.4 Use torchinfo to get an idea of the shapes going through our model 使用 torchinfo 了解模型中的形状

使用 print(model) 打印出我们的模型可以让我们了解模型的运行情况。

我们可以在整个 forward() 方法中打印出数据的形状。

但是，从模型中获取信息的一个有用方法是使用 torchinfo。

torchinfo 附带一个 summary() 方法，该方法采用 PyTorch 模型以及 input_shape 并返回张量在模型中移动时发生的情况。

# Install torchinfo if it's not available, import it if it is
try: 
    import torchinfo
except:
    !pip install torchinfo
    import torchinfo

# pip install torchinfo    
from torchinfo import summary
summary(model_0, input_size=[1, 3, 64, 64]) # do a test pass through of an example input size

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
TinyVGG                                  [1, 3]                    --
├─Sequential: 1-1                        [1, 10, 32, 32]           --
│    └─Conv2d: 2-1                       [1, 10, 64, 64]           280
│    └─ReLU: 2-2                         [1, 10, 64, 64]           --
│    └─Conv2d: 2-3                       [1, 10, 64, 64]           910
│    └─ReLU: 2-4                         [1, 10, 64, 64]           --
│    └─MaxPool2d: 2-5                    [1, 10, 32, 32]           --
├─Sequential: 1-2                        [1, 10, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 10, 32, 32]           910
│    └─ReLU: 2-7                         [1, 10, 32, 32]           --
│    └─Conv2d: 2-8                       [1, 10, 32, 32]           910
│    └─ReLU: 2-9                         [1, 10, 32, 32]           --
│    └─MaxPool2d: 2-10                   [1, 10, 16, 16]           --
├─Sequential: 1-3                        [1, 3]                    --
│    └─Flatten: 2-11                     [1, 2560]                 --
│    └─Linear: 2-12                      [1, 3]                    7,683
==========================================================================================
Total params: 10,693
Trainable params: 10,693
Non-trainable params: 0
Total mult-adds (M): 6.75
==========================================================================================
Input size (MB): 0.05
Forward/backward pass size (MB): 0.82
Params size (MB): 0.04
Estimated Total Size (MB): 0.91
==========================================================================================

torchinfo.summary() 的输出为我们提供了有关模型的大量信息。
例如Total params、模型中的参数总数、Estimated Total Size (MB)，即模型的大小。
当特定 input_size 的数据在我们的模型中移动时，您还可以看到输入和输出形状的变化。
目前，我们的参数数量和模型总大小很低。
这是因为我们从一个小模型开始。

7.5 Create train & test loop functions 创建训练和测试循环函数

我们有数据，也有模型。

现在让我们制作一些训练和测试循环函数，以便在训练数据上训练我们的模型，并在测试数据上评估我们的模型。

为了确保我们可以再次使用这些训练和测试循环，我们将对它们进行函数化。

具体来说，我们将制作三个函数：

train_step() - 接受一个模型、一个 DataLoader、一个损失函数和一个优化器，并在 DataLoader 上训练模型。
test_step() - 接受一个模型、一个 DataLoader 和一个损失函数，并在 DataLoader 上评估模型。
train() - 在给定的周期数内同时执行 1. 和 2.，并返回结果字典。

注意：我们在笔记本 01 中介绍了 PyTorch 优化循环中的步骤，以及非官方 PyTorch 优化循环歌曲，并且我们在笔记本 03 中构建了类似的功能。

train_step()

因为在 DataLoader 中处理批次，所以会在训练期间累积模型损失和准确度值（通过为每个批次将它们相加），然后在最后调整它们，然后再返回它们。

def train_step(model: torch.nn.Module, 
               dataloader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               optimizer: torch.optim.Optimizer):
    # Put model in train mode
    # 将模型置于训练模式
    model.train()
    
    # Setup train loss and train accuracy values
    # 设置训练损失和训练准确度值
    train_loss, train_acc = 0, 0
    
    # Loop through data loader data batches
    # 循环遍历数据加载器数据批次
    for batch, (X, y) in enumerate(dataloader):
        # Send data to target device
        # 发送数据到目标设备
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        # 1. 前向传递
        y_pred = model(X)

        # 2. Calculate and accumulate loss
        # 2. 计算并累计损失
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 

        # 3. Optimizer zero grad
        # 3. 优化器零梯度
        optimizer.zero_grad()

        # 4. Loss backward
        # 4. 损失反向
        loss.backward()

        # 5. Optimizer step
        # 5. 优化器步骤
        optimizer.step()

        # Calculate and accumulate accuracy metrics across all batches
        # 计算并累积所有批次的准确度指标
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    # Adjust metrics to get average loss and accuracy per batch
    # 调整指标以获得每批的平均损失和准确率
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

test_step()

这里的主要区别在于 test_step() 不会采用优化器，因此不会执行梯度下降。
但由于我们要进行推理，因此我们将确保打开 torch.inference_mode() 上下文管理器来进行预测。

def test_step(model: torch.nn.Module, 
              dataloader: torch.utils.data.DataLoader, 
              loss_fn: torch.nn.Module):
    # Put model in eval mode
    # 将模型置于评估模式
    model.eval() 
    
    # Setup test loss and test accuracy values
    # 设置测试损失和测试准确度值
    test_loss, test_acc = 0, 0
    
    # Turn on inference context manager
    # 开启推理上下文管理器
    with torch.inference_mode():
        # Loop through DataLoader batches
        # 循环遍历 DataLoader 批次
        for batch, (X, y) in enumerate(dataloader):
            # Send data to target device
            # 发送数据到目标设备
            X, y = X.to(device), y.to(device)
    
            # 1. Forward pass
            # 1. 前向传递
            test_pred_logits = model(X)

            # 2. Calculate and accumulate loss
            # 2. 计算并累计损失
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()
            
            # Calculate and accumulate accuracy
            # 计算并累计准确率
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
            
    # Adjust metrics to get average loss and accuracy per batch 
    # 调整指标以获得每批的平均损失和准确率
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

train | Creating a train() function to combine train_step() and test_step() 训练 | 创建一个 train() 函数来结合 train_step() 和 test_step()

现在我们需要一种方法来将 train_step() 和 test_step() 函数放在一起。
为此，我们将它们打包在 train() 函数中。
此函数将训练模型并对其进行评估。

具体来说：

接收一个模型、一个用于训练和测试集的 DataLoader、一个优化器、一个损失函数以及每个训练和测试步骤要执行多少个 epoch。
为 train_loss、train_acc、test_loss 和 test_acc 值创建一个空的结果字典（我们可以在训练过程中填充它）。
循环执行多个 epoch 的训练和测试步骤函数。
打印出每个 epoch 结束时发生的情况。
使用每个 epoch 更新后的指标更新空的结果字典。
返回已填充的。

为了跟踪我们经历过的 epochs 数，让我们从 tqdm.auto 导入 tqdm（tqdm 是 Python 最流行的进度条库之一，tqdm.auto 会自动决定哪种进度条最适合的计算环境，例如 Jupyter Notebook 与 Python 脚本）。

# conda install tqdm
from tqdm.auto import tqdm

# 1. Take in various parameters required for training and test steps
# 1. 接受训练和测试步骤所需的各种参数
def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5):
    
    # 2. Create empty results dictionary
    # 2. 创建空的结果字典
    results = {"train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    # 3. Loop through training and testing steps for a number of epochs
    # 3. 循环进行多个 epoch 的训练和测试步骤
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer)
        test_loss, test_acc = test_step(model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn)
        
        # 4. Print out what's happening
        # 4. 打印出正在发生的事情
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        # 5. Update results dictionary
        # 5. 更新结果字典
        # Ensure all data is moved to CPU and converted to float for storage
        # 确保所有数据都移至 CPU 并转换为浮点数进行存储
        results["train_loss"].append(train_loss.item() if isinstance(train_loss, torch.Tensor) else train_loss)
        results["train_acc"].append(train_acc.item() if isinstance(train_acc, torch.Tensor) else train_acc)
        results["test_loss"].append(test_loss.item() if isinstance(test_loss, torch.Tensor) else test_loss)
        results["test_acc"].append(test_acc.item() if isinstance(test_acc, torch.Tensor) else test_acc)

    # 6. Return the filled results at the end of the epochs
    # 6. 返回 epoch 结束时的填充结果
    return results

7.6 Train and Evaluate Model 0 训练和评估模型 0

目前已经拥有了训练和评估模型所需的所有要素。
将 TinyVGG 模型、DataLoader 和 train() 函数放在一起，看看我们是否可以构建一个能够区分披萨、牛排和寿司的模型！
让我们重新创建 model_0（不需要，但为了完整起见我们会这样做），然后调用 train() 函数并传入必要的参数。
为了让我们的实验快速进行，我们将训练我们的模型 5 epochs（可以根据需要增加这个时期）。
至于 optimizer 优化器和 loss function 损失函数，我们将分别使用 torch.nn.CrossEntropyLoss()（因为我们正在处理多类分类数据）和 torch.optim.Adam()，学习率为 1e-3。
为了了解需要多长时间，我们将导入 Python 的 timeit.default_timer() 方法来计算训练时间。

# Set random seeds
torch.manual_seed(42) 
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 5

# Recreate an instance of TinyVGG
model_0 = TinyVGG(input_shape=3, # number of color channels (3 for RGB) 
                  hidden_units=10, 
                  output_shape=len(train_data.classes)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer 
start_time = timer()

# Train model_0 
model_0_results = train(model=model_0, 
                        train_dataloader=train_dataloader_simple,
                        test_dataloader=test_dataloader_simple,
                        optimizer=optimizer,
                        loss_fn=loss_fn, 
                        epochs=NUM_EPOCHS)

# End the timer and print out how long it took
end_time = timer()
print(f"Total training time: {end_time-start_time:.3f} seconds")

模型表现相当糟糕。有哪些方法可以改进它？

注意：查看 Improving a model (from a model perspective) section in notebook 02笔记本 02 中的改进模型（从模型角度）部分，了解有关改进我们的 TinyVGG 模型的想法。

7.7 Plot the loss curves of Model 0 绘制模型 0 的损失曲线

从我们 model_0 训练的打印输出来看，它似乎表现不太好。

但我们可以通过绘制模型的 loss curves 损失曲线来进一步评估它。

Loss curves 损失曲线显示了模型随时间的变化结果。

它们是查看模型在不同数据集（例如训练和测试）上的表现的好方法。

让我们创建一个函数来绘制 model_0_results 字典中的值。

1 2	# Check the model_0_results keys model_0_results.keys()

Epoch: 1 | train_loss: 1.1078 | train_acc: 0.2578 | test_loss: 1.1360 | test_acc: 0.2604
Epoch: 2 | train_loss: 1.0847 | train_acc: 0.4258 | test_loss: 1.1620 | test_acc: 0.1979
Epoch: 3 | train_loss: 1.1157 | train_acc: 0.2930 | test_loss: 1.1697 | test_acc: 0.1979
Epoch: 4 | train_loss: 1.0955 | train_acc: 0.4141 | test_loss: 1.1385 | test_acc: 0.1979
Epoch: 5 | train_loss: 1.0985 | train_acc: 0.2930 | test_loss: 1.1430 | test_acc: 0.1979
Total training time: 199.209 seconds

我们需要提取每个key并将它们转换成一个图。

1 2	# Check the model_0_results keys model_0_results.keys()

1	dict_keys(['train_loss', 'train_acc', 'test_loss', 'test_acc'])

def plot_loss_curves(results: Dict[str, List[float]]):
    """Plots training curves of a results dictionary.

    Args:
        results (dict): dictionary containing list of values, e.g.
            {"train_loss": [...],
             "train_acc": [...],
             "test_loss": [...],
             "test_acc": [...]}
    """
    
    # Get the loss values of the results dictionary (training and test)
    # 获取结果字典（训练和测试）的损失值
    loss = results['train_loss']
    test_loss = results['test_loss']

    # Get the accuracy values of the results dictionary (training and test)
    # 获取结果字典（训练和测试）的准确度值
    accuracy = results['train_acc']
    test_accuracy = results['test_acc']

    # Figure out how many epochs there were
    # 计算出有多少个 epoch
    epochs = range(len(results['train_loss']))

    # Setup a plot 
    # 设置情节
    plt.figure(figsize=(15, 7))

    # Plot loss
    # 绘制损失
    plt.subplot(1, 2, 1)
    plt.plot(epochs, loss, label='train_loss')
    plt.plot(epochs, test_loss, label='test_loss')
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.legend()

    # Plot accuracy
    # 绘制准确度
    plt.subplot(1, 2, 2)
    plt.plot(epochs, accuracy, label='train_accuracy')
    plt.plot(epochs, test_accuracy, label='test_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

测试一下plot_loss_curves() 函数。

1	plot_loss_curves(model_0_results)

看起来一切都乱七八糟……
但我们大概知道这一点，因为我们的模型在训练期间打印出来的结果并没有显示出太大的希望。
您可以尝试更长时间地训练模型，看看在更长的时间范围内绘制损失曲线时会发生什么。

8. What should an ideal loss curve look like?理想的损失曲线应该是什么样的？

查看训练和测试损失曲线是查看模型是否 overfitting 过度拟合的好方法。
过度拟合模型是指在训练集上的表现优于（通常相差很大）验证/测试集的模型。
如果您的训练损失远低于测试损失，则您的模型 overfitting 过度拟合。
也就是说，它在训练中学习模式太好，而这些模式并没有推广到测试数据。
另一方面，当您的训练和测试损失没有您想要的那么低时，这被认为是 underfitting 欠拟合。
训练和测试损失曲线的理想位置是它们彼此紧密对齐。

左图：如果您的训练和测试损失曲线没有您想要的那么低，则被视为欠拟合。
中图：当您的测试/验证损失高于训练损失时，这被视为过度拟合。
右图：理想的情况是您的训练和测试损失曲线随时间排列整齐。这意味着您的模型具有良好的泛化能力。
损失曲线可以有更多组合和不同功能，有关更多信息，请参阅 Google 的Interpreting Loss Curves guide。

8.1 How to deal with overfitting 如何处理过度拟合

由于过度拟合的主要问题在于您的模型与训练数据的拟合度过高，因此您需要使用技术来“控制它”。

防止过度拟合的一种常见技术称为正则化。

我喜欢将其视为“使我们的模型更加规则”，即能够适应更多种类的数据。

几种防止过度拟合的方法：

防止过拟合的方法	是什么？
获取更多数据	拥有更多数据可以为模型提供更多机会来学习模式，这些模式可能更适用于新的例子。
简化模型	如果当前模型已经过度拟合训练数据，则可能是模型过于复杂。这意味着它对数据模式的学习太好，无法很好地推广到未见过的数据。简化模型的一种方法是减少其使用的层数或减少每层中的隐藏单元数量。
使用数据增强	数据增强会以某种方式操纵训练数据，使模型更难学习，因为它会人为地增加数据的多样性。如果模型能够学习增强数据中的模式，那么该模型可能能够更好地推广到未见过的数据。
使用迁移学习	迁移学习涉及利用模型已学会的模式（也称为预训练权重）作为您自己的任务的基础。在我们的案例中，我们可以使用一个在大量图像上预训练的计算机视觉模型，然后对其进行稍微调整，使其更专门用于食物图像。
使用 dropout 层	Dropout 层会随机移除神经网络中隐藏层之间的连接，从而有效简化模型，同时改善剩余连接。`torch.nn.Dropout()`更多信息请参见。
使用学习率衰减	这里的想法是随着模型的训练慢慢降低学习率。这类似于伸手去拿沙发后面的硬币。你越接近，你的步子就越小。学习率也是一样，你越接近收敛，你就越希望你的体重更新越小。
使用早期停止	提前停止会在模型开始过度拟合之前停止训练。例如，假设模型的损失在过去 10 个时期（这个数字是任意的）已经停止下降，您可能希望在这里停止模型训练，并使用损失最低的模型权重（10 个时期之前）。

还有更多处理过度拟合的方法，但这些是一些主要方法。
当你开始构建越来越多的深度模型时，你会发现，由于深度学习非常擅长学习数据中的模式，处理过度拟合是深度学习的主要问题之一。

8.2 How to deal with underfitting 如何处理欠拟合

当模型拟合不足时，它被认为对训练和测试集的预测能力较差。

本质上，欠拟合模型将无法将损失值降低到所需的水平。

现在，查看我们当前的损失曲线，我认为我们的TinyVGG模型model_0对数据拟合不足。

处理欠拟合背后的主要思想是提高模型的预测能力。

有几种方法可以做到这一点：

防止欠拟合的方法	它是什么？
向模型添加更多层/单元	如果模型拟合不足，则它可能没有足够的能力来学习所需的数据模式/权重/表示 patterns/weights/representations 以实现预测。增加模型预测能力的一种方法是增加这些层中的隐藏层/单元的数量。
调整学习率	也许模型的学习率一开始就太高了。它试图在每个时期更新太多权重，结果什么都没学到。在这种情况下，你可以降低学习率，看看会发生什么。
使用迁移学习	迁移学习能够防止过度拟合和欠拟合。它涉及使用以前工作模型中的模式并根据自己的问题进行调整。
训练更长时间	有时模型只是需要更多时间来学习数据表示。如果在较小的实验中，模型没有学到任何东西，那么让它训练更多的时期可能会带来更好的性能。
减少正则化	也许模型拟合不足，是因为人为试图防止过度拟合。抑制正则化技术可以帮助模型更好地拟合数据。

8.3 The balance between overfitting and underfitting 过度拟合与欠拟合之间的平衡

上面讨论的方法都不是灵丹妙药，也就是说，它们并不总是有效。

防止过度拟合和欠拟合可能是机器学习研究最活跃的领域。

由于每个人都希望他们的模型拟合得更好（更少的欠拟合），但又不是太好，所以它们不能很好地概括并在现实世界中表现不佳（更少的过度拟合）。

过度拟合与欠拟合之间存在着一线之隔。

因为任何一种因素过多都可能引发另一种因素。

当涉及到解决过度拟合和欠拟合的问题时，迁移学习可能是最强大的技术之一。

迁移学习不需要手工制作不同的过度拟合和欠拟合技术，而是可以让你采用与你的问题空间类似的问题空间中已经可以工作的模型（比如来自paperswithcode.com/sota或Hugging Face 模型的模型）并将其应用到你自己的数据集中。

我们将在后面的笔记本中看到迁移学习的威力。

9. Model 1: TinyVGG with Data Augmentation 模型1：具有数据增强的TinyVGG

是时候尝试另一个模型了！

这次，让我们加载数据并使用数据增强来看看它是否可以改善我们的结果。

首先，我们将编写一个训练变换来包含transforms.TrivialAugmentWide()、调整图像大小并将其转换为张量。

我们将对测试转换执行相同的操作，只是没有数据增强。

9.1 Create transform with data augmentation 使用数据增强创建变换

# Create training transform with TrivialAugment
train_transform_trivial_augment = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.TrivialAugmentWide(num_magnitude_bins=31),
    transforms.ToTensor() 
])

# Create testing transform (no data augmentation)
test_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])

使用 torchvision.datasets.ImageFolder() 将图像转换为 Dataset，然后使用 torch.utils.data.DataLoader() 将其转换为 DataLoader。

9.2 Create train and test Dataset’s and DataLoader’s 创建训练和测试数据集和数据加载器

我们将确保训练 Dataset 使用 train_transform_trivial_augment，而测试 Dataset 使用test_transform。

# Turn image folders into Datasets
train_data_augmented = datasets.ImageFolder(train_dir, transform=train_transform_trivial_augment)
test_data_simple = datasets.ImageFolder(test_dir, transform=test_transform)

train_data_augmented, test_data_simple

(Dataset ImageFolder
     Number of datapoints: 225
     Root location: data\pizza_steak_sushi\train
     StandardTransform
 Transform: Compose(
                Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
                TrivialAugmentWide(num_magnitude_bins=31, interpolation=InterpolationMode.NEAREST, fill=None)
                ToTensor()
            ),
 Dataset ImageFolder
     Number of datapoints: 75
     Root location: data\pizza_steak_sushi\test
     StandardTransform
 Transform: Compose(
                Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=True)
                ToTensor()
            ))

我们将 DataLoader 的 batch_size 设置为 32，并将 num_workers 设置为我们机器上可用的 CPU 数量（我们可以使用 Python 的 os.cpu_count() 获取该数量）。

# Turn Datasets into DataLoader's
import os
BATCH_SIZE = 32
NUM_WORKERS = os.cpu_count()

torch.manual_seed(42)
train_dataloader_augmented = DataLoader(train_data_augmented, 
                                        batch_size=BATCH_SIZE, 
                                        shuffle=True,
                                        num_workers=NUM_WORKERS)

test_dataloader_simple = DataLoader(test_data_simple, 
                                    batch_size=BATCH_SIZE, 
                                    shuffle=False, 
                                    num_workers=NUM_WORKERS)

train_dataloader_augmented, test_dataloader

1 2	(<torch.utils.data.dataloader.DataLoader at 0x277ca662370>, <torch.utils.data.dataloader.DataLoader at 0x277ca8a83a0>)

9.3 Construct and train Model 1

构建下一个模型 model_1，我们可以重用之前的 TinyVGG 类。
我们将确保将其发送到目标设备。

# Create model_1 and send it to the target device
torch.manual_seed(42)
model_1 = TinyVGG(
    input_shape=3,
    hidden_units=10,
    output_shape=len(train_data_augmented.classes)).to(device)
model_1

TinyVGG(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=2560, out_features=3, bias=True)
  )
)

由于我们已经获得了训练循环 (train_step()) 和测试循环 (test_step()) 的函数以及将它们组合在 train() 中的函数，因此让我们重复使用它们。

我们将使用与 model_0 相同的设置，只有 train_dataloader参数有所不同：

训练 5 个时期。
使用 train_dataloader=train_dataloader_augmented 作为 train() 中的训练数据。
使用 torch.nn.CrossEntropyLoss() 作为损失函数（因为我们处理的是多类分类）。
使用 torch.optim.Adam() 并以 lr=0.001 作为优化器的学习率。

# Set random seeds
torch.manual_seed(42) 
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 5

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_1.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer 
start_time = timer()

# Train model_1
model_1_results = train(model=model_1, 
                        train_dataloader=train_dataloader_augmented,
                        test_dataloader=test_dataloader_simple,
                        optimizer=optimizer,
                        loss_fn=loss_fn, 
                        epochs=NUM_EPOCHS)

# End the timer and print out how long it took
end_time = timer()
print(f"Total training time: {end_time-start_time:.3f} seconds")

Epoch: 1 | train_loss: 1.1074 | train_acc: 0.2500 | test_loss: 1.1058 | test_acc: 0.2604
Epoch: 2 | train_loss: 1.0791 | train_acc: 0.4258 | test_loss: 1.1381 | test_acc: 0.2604
Epoch: 3 | train_loss: 1.0800 | train_acc: 0.4258 | test_loss: 1.1693 | test_acc: 0.2604
Epoch: 4 | train_loss: 1.1284 | train_acc: 0.3047 | test_loss: 1.1616 | test_acc: 0.2604
Epoch: 5 | train_loss: 1.0882 | train_acc: 0.4258 | test_loss: 1.1471 | test_acc: 0.2604
Total training time: 199.736 seconds

效果依旧很差。

9.4 Plot the loss curves of Model 1

由于已将 model_1 的结果保存在结果字典 model_1_results 中，因此可以使用 plot_loss_curves() 绘制。

1	plot_loss_curves(model_1_results)

模型是欠拟合还是过拟合？还是两者兼而有之？
理想情况下，希望它具有更高的准确度和更低的损失。

10. Compare model results 比较模型结果

尽管我们的模型表现很差，我们仍然可以编写代码来比较它们。

让我们首先将模型结果转换为 pandas DataFrames。

import pandas as pd
model_0_df = pd.DataFrame(model_0_results)
model_1_df = pd.DataFrame(model_1_results)
model_0_df

	train_loss	train_acc	test_loss	test_acc
0	1.107833	0.257812	1.136041	0.260417
1	1.084713	0.425781	1.162014	0.197917
2	1.115697	0.292969	1.169704	0.197917
3	1.095564	0.414062	1.138373	0.197917
4	1.098520	0.292969	1.142631	0.197917

现在我们可以使用 matplotlib 编写一些绘图代码来一起可视化 model_0 和 model_1 的结果。

# Setup a plot 
plt.figure(figsize=(15, 10))

# Get number of epochs
epochs = range(len(model_0_df))

# Plot train loss
plt.subplot(2, 2, 1)
plt.plot(epochs, model_0_df["train_loss"], label="Model 0")
plt.plot(epochs, model_1_df["train_loss"], label="Model 1")
plt.title("Train Loss")
plt.xlabel("Epochs")
plt.legend()

# Plot test loss
plt.subplot(2, 2, 2)
plt.plot(epochs, model_0_df["test_loss"], label="Model 0")
plt.plot(epochs, model_1_df["test_loss"], label="Model 1")
plt.title("Test Loss")
plt.xlabel("Epochs")
plt.legend()

# Plot train accuracy
plt.subplot(2, 2, 3)
plt.plot(epochs, model_0_df["train_acc"], label="Model 0")
plt.plot(epochs, model_1_df["train_acc"], label="Model 1")
plt.title("Train Accuracy")
plt.xlabel("Epochs")
plt.legend()

# Plot test accuracy
plt.subplot(2, 2, 4)
plt.plot(epochs, model_0_df["test_acc"], label="Model 0")
plt.plot(epochs, model_1_df["test_acc"], label="Model 1")
plt.title("Test Accuracy")
plt.xlabel("Epochs")
plt.legend();

看起来我们的模型表现同样糟糕，而且有点不稳定（指标急剧上升和下降）。

11. Make a prediction on a custom image 对自定义图像进行预测

如果您已经在某个数据集上训练了模型，那么您很可能想要对自己的自定义数据进行预测。

在我们的例子中，由于我们已经在披萨、牛排和寿司图像上训练了模型，那么我们如何使用我们的模型对我们自己的一张图像进行预测呢？

为此，我们可以加载图像，然后以与我们的模型训练的数据类型相匹配的方式对其进行预处理。

换句话说，我们必须将我们自己的自定义图像转换为张量，并确保它具有正确的数据类型，然后再将其传递给我们的模型。

让我们从下载自定义图像开始。

由于我们的模型可以预测图像中是否包含披萨、牛排或寿司，所以我们从学习 PyTorch 进行深度学习 GitHub 下载一张我爸爸对大披萨竖起两个大拇指的照片。

我们使用 Python 的模块下载图像requests。

注意：如果您使用的是 Google Colab，您也可以通过转到左侧菜单 -> 文件 -> 上传到会话存储将图像上传到当前会话。但请注意，当您的 Google Colab 会话结束时，此图像将被删除。

# Download custom image
import requests

# Setup custom image path
custom_image_path = data_path / "04-pizza-dad.jpeg"

# Download the image if it doesn't already exist
if not custom_image_path.is_file():
    with open(custom_image_path, "wb") as f:
        # When downloading from GitHub, need to use the "raw" file link
        request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/04-pizza-dad.jpeg")
        print(f"Downloading {custom_image_path}...")
        f.write(request.content)
else:
    print(f"{custom_image_path} already exists, skipping download.")

1	data/04-pizza-dad.jpeg already exists, skipping download.

11.1 Loading in a custom image with PyTorch 使用 PyTorch 加载自定义图像

看起来我们已经下载了一个自定义图像，并准备在 data/04-pizza-dad.jpeg 中使用。

是时候加载它了。

PyTorch 的 torchvision 有几种输入和输出（简称“IO”或“io”）方法，用于在torchvision.io 中读取和写入图像和视频。

由于我们要加载图像，我们将使用 torchvision.io.read_image()。

此方法将读取 JPEG 或 PNG 图像并将其转换为 3 维 RGB 或灰度 torch.Tensor，其数据类型为 uint8，值在 [0, 255] 范围内。

import torchvision

# Read in custom image
custom_image_uint8 = torchvision.io.read_image(str(custom_image_path))

# Print out image data
print(f"Custom image tensor:\n{custom_image_uint8}\n")
print(f"Custom image shape: {custom_image_uint8.shape}\n")
print(f"Custom image dtype: {custom_image_uint8.dtype}")

Custom image tensor:
tensor([[[154, 173, 181,  ...,  21,  18,  14],
         [146, 165, 181,  ...,  21,  18,  15],
         [124, 146, 172,  ...,  18,  17,  15],
         ...,
         [ 72,  59,  45,  ..., 152, 150, 148],
         [ 64,  55,  41,  ..., 150, 147, 144],
         [ 64,  60,  46,  ..., 149, 146, 143]],

        [[171, 190, 193,  ...,  22,  19,  15],
         [163, 182, 193,  ...,  22,  19,  16],
         [141, 163, 184,  ...,  19,  18,  16],
         ...,
         [ 55,  42,  28,  ..., 107, 104, 103],
         [ 47,  38,  24,  ..., 108, 104, 102],
         [ 47,  43,  29,  ..., 107, 104, 101]],

        [[119, 138, 147,  ...,  17,  14,  10],
         [111, 130, 145,  ...,  17,  14,  11],
         [ 87, 111, 136,  ...,  14,  13,  11],
         ...,
         [ 35,  22,   8,  ...,  52,  52,  48],
         [ 27,  18,   4,  ...,  50,  49,  44],
         [ 27,  23,   9,  ...,  49,  46,  43]]], dtype=torch.uint8)

Custom image shape: torch.Size([3, 4032, 3024])

Custom image dtype: torch.uint8

很好！看起来我们的图像是张量格式，但是，这种图像格式与我们的模型兼容吗？

我们的 custom_image 张量的数据类型是 torch.uint8，其值介于 [0, 255] 之间。

但是我们的模型采用数据类型为 torch.float32 的图像张量，其值介于 [0, 1] 之间。

因此，在将自定义图像与模型一起使用之前，我们需要将其转换为与模型训练数据相同的格式。

如果我们不这样做，我们的模型就会出错。

# Try to make a prediction on image in uint8 format (this will error)
model_1.eval()
with torch.inference_mode():
    model_1(custom_image_uint8.to(device))

1	RuntimeError: Input type (unsigned char) and bias type (float) should be the same

如果我们尝试对与我们的模型训练不同的数据类型的图像进行预测，我们会收到如下错误：

1	RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.FloatTensor) should be the same

让我们通过将自定义图像转换为与我们的模型训练相同的数据类型来解决这个问题（torch.float32）。

# Load in custom image and convert the tensor values to float32
custom_image = torchvision.io.read_image(str(custom_image_path)).type(torch.float32)

# Divide the image pixel values by 255 to get them between [0, 1]
custom_image = custom_image / 255. 

# Print out image data
print(f"Custom image tensor:\n{custom_image}\n")
print(f"Custom image shape: {custom_image.shape}\n")
print(f"Custom image dtype: {custom_image.dtype}")

Custom image tensor:
tensor([[[0.6039, 0.6784, 0.7098,  ..., 0.0824, 0.0706, 0.0549],
         [0.5725, 0.6471, 0.7098,  ..., 0.0824, 0.0706, 0.0588],
         [0.4863, 0.5725, 0.6745,  ..., 0.0706, 0.0667, 0.0588],
         ...,
         [0.2824, 0.2314, 0.1765,  ..., 0.5961, 0.5882, 0.5804],
         [0.2510, 0.2157, 0.1608,  ..., 0.5882, 0.5765, 0.5647],
         [0.2510, 0.2353, 0.1804,  ..., 0.5843, 0.5725, 0.5608]],

        [[0.6706, 0.7451, 0.7569,  ..., 0.0863, 0.0745, 0.0588],
         [0.6392, 0.7137, 0.7569,  ..., 0.0863, 0.0745, 0.0627],
         [0.5529, 0.6392, 0.7216,  ..., 0.0745, 0.0706, 0.0627],
         ...,
         [0.2157, 0.1647, 0.1098,  ..., 0.4196, 0.4078, 0.4039],
         [0.1843, 0.1490, 0.0941,  ..., 0.4235, 0.4078, 0.4000],
         [0.1843, 0.1686, 0.1137,  ..., 0.4196, 0.4078, 0.3961]],

        [[0.4667, 0.5412, 0.5765,  ..., 0.0667, 0.0549, 0.0392],
         [0.4353, 0.5098, 0.5686,  ..., 0.0667, 0.0549, 0.0431],
         [0.3412, 0.4353, 0.5333,  ..., 0.0549, 0.0510, 0.0431],
         ...,
         [0.1373, 0.0863, 0.0314,  ..., 0.2039, 0.2039, 0.1882],
         [0.1059, 0.0706, 0.0157,  ..., 0.1961, 0.1922, 0.1725],
         [0.1059, 0.0902, 0.0353,  ..., 0.1922, 0.1804, 0.1686]]])

Custom image shape: torch.Size([3, 4032, 3024])

Custom image dtype: torch.float32

11.2 Predicting on custom images with a trained PyTorch model 使用训练好的 PyTorch 模型对自定义图像进行预测

看起来我们的图像数据现在与我们的模型训练的格式相同。

除了一件事……

它是shape。

我们的模型是在具有形状的图像上进行训练的[3, 64, 64]，而我们的自定义图像目前是[3, 4032, 3024]。

我们如何确保我们的自定义图像与我们的模型训练的图像具有相同的形状？

有沒有任何torchvision.transforms可以幫助的？

在回答这个问题之前，让我们先绘制图像matplotlib以确保它看起来不错，记住我们必须将尺寸从CHW到HWC排列以满足的matplotlib要求。

# Plot custom image
plt.imshow(custom_image.permute(1, 2, 0)) # need to permute image dimensions from CHW -> HWC otherwise matplotlib will error
plt.title(f"Image shape: {custom_image.shape}")
plt.axis(False);

竖起两个大拇指！

现在我们如何才能让我们的图像与我们的模型训练的图像具有相同的大小？

其中一种方法是使用torchvision.transforms.Resize()。

让我们编写一个转换管道transform pipeline来实现这一点。

# Create transform pipleine to resize image
custom_image_transform = transforms.Compose([
    transforms.Resize((64, 64)),
])

# Transform target image
custom_image_transformed = custom_image_transform(custom_image)

# Print out original shape and new shape
print(f"Original shape: {custom_image.shape}")
print(f"New shape: {custom_image_transformed.shape}")

1 2	Original shape: torch.Size([3, 4032, 3024]) New shape: torch.Size([3, 64, 64])

最后我们来对我们自己的自定义图像进行预测。

1
2
3

model_1.eval()
with torch.inference_mode():
    custom_image_pred = model_1(custom_image_transformed)

1	RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)

尽管我们做好了准备，但我们的自定义图像和模型仍在不同的设备上。

让我们通过将我们的放到custom_image_transformed目标设备上来解决这个问题。

1
2
3

model_1.eval()
with torch.inference_mode():
    custom_image_pred = model_1(custom_image_transformed.to(device))

1	RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x256 and 2560x3)

形状错误。

我们将自定义图像转换为与我们的模型训练图像相同的尺寸……

哦，等等……

我们忘记了一个维度。

批次大小。

我们的模型期望图像张量在开始时具有批量大小维度（NCHW其中N是批量大小）。

除了我们的自定义图像目前只有CHW。

我们可以添加批量大小维度torch.unsqueeze(dim=0)来为图像添加额外的维度，最后做出预测。

本质上，我们将告诉我们的模型根据单个图像（1个batch_size=1的图像）进行预测。

model_1.eval()
with torch.inference_mode():
    # Add an extra dimension to image
    custom_image_transformed_with_batch_size = custom_image_transformed.unsqueeze(dim=0)
    
    # Print out different shapes
    print(f"Custom image transformed shape: {custom_image_transformed.shape}")
    print(f"Unsqueezed custom image shape: {custom_image_transformed_with_batch_size.shape}")
    
    # Make a prediction on image with an extra dimension
    custom_image_pred = model_1(custom_image_transformed.unsqueeze(dim=0).to(device))

1 2	Custom image transformed shape: torch.Size([3, 64, 64]) Unsqueezed custom image shape: torch.Size([1, 3, 64, 64])

注意：我们刚刚经历了三个经典且最常见的深度学习和 PyTorch 问题：

错误的数据类型：我们的模型需要 torch.float32，而我们原始的自定义图像是 uint8。
错误的设备：我们的模型在目标设备上（在我们的例子中是 GPU），而我们的目标数据尚未移动到目标设备。
错误的形状：我们的模型需要形状为 [N, C, H, W] 或 [batch_size, color_channels, height, width] 的输入图像，而我们的自定义图像张量的形状为 [color_channels, height, width]。

请记住，这些错误不仅仅用于预测自定义图像。
它们存在于您处理的几乎所有数据类型（文本、音频、结构化数据）和问题中。

现在让我们来看看我们的模型的预测。

1	custom_image_pred

1	tensor([[ 0.1188, 0.0250, -0.1444]], device='cuda:0')

好的，这些仍然是logit 形式（模型的原始输出称为 logit）。

让我们将它们从日志->预测概率->预测标签进行转换。

# Print out prediction logits
print(f"Prediction logits: {custom_image_pred}")

# Convert logits -> prediction probabilities (using torch.softmax() for multi-class classification)
custom_image_pred_probs = torch.softmax(custom_image_pred, dim=1)
print(f"Prediction probabilities: {custom_image_pred_probs}")

# Convert prediction probabilities -> prediction labels
custom_image_pred_label = torch.argmax(custom_image_pred_probs, dim=1)
print(f"Prediction label: {custom_image_pred_label}")

1
2
3

Prediction logits: tensor([[ 0.1188,  0.0250, -0.1444]], device='cuda:0')
Prediction probabilities: tensor([[0.3733, 0.3398, 0.2869]], device='cuda:0')
Prediction label: tensor([0], device='cuda:0')

但当然我们的预测标签仍然是索引/张量形式。

我们可以通过在 class_names 列表上建立索引将其转换为字符串类名预测。

1
2
3

# Find the predicted label
custom_image_pred_class = class_names[custom_image_pred_label.cpu()] # put pred label to CPU, otherwise will error
custom_image_pred_class

'pizza'

尽管根据我们的评估指标，模型的表现很差，但它似乎预测正确。

注意：无论给出什么图像，当前形式的模型都会预测“披萨”、“牛排”或“寿司”。如果您希望模型预测不同的类别，则必须对其进行训练。

但如果我们检查一下custom_image_pred_probs，我们会注意到模型给予每个类别几乎相同的权重（值相似）。

1 2	# The values of the prediction probabilities are quite similar custom_image_pred_probs

1	tensor([[0.3733, 0.3398, 0.2869]], device='cuda:0')

具有如此相似的预测概率可能意味着几件事：

该模型试图同时预测所有三个类别（可能有一张包含披萨、牛排和寿司的图像）。
该模型实际上并不知道它想要预测什么，而只是为每个类别分配相似的值。

我们的案例是 2，由于我们的模型训练不佳，所以它基本上是在猜测预测。

11.3 Putting custom image prediction together: building a function 将自定义图像预测整合在一起：构建函数

每次您想要对自定义图像进行预测时执行上述所有步骤很快就会变得乏味。

所以让我们将它们放在一起，形成一个可以轻松反复使用的函数。

具体来说，让我们创建一个函数：

获取目标图像路径并转换为适合我们模型的正确数据类型（torch.float32）。
确保目标图像像素值在范围内[0, 1]。
如果有必要的话，变换目标图像。
确保模型在目标设备上。
使用训练好的模型对目标图像进行预测（确保图像大小正确且与模型位于同一设备上）。
将模型的输出逻辑转换为预测概率。
将预测概率转换为预测标签。
绘制目标图像以及模型预测和预测概率。

def pred_and_plot_image(model: torch.nn.Module, 
                        image_path: str, 
                        class_names: List[str] = None, 
                        transform=None,
                        device: torch.device = device):
    """Makes a prediction on a target image and plots the image with its prediction."""
    
    # 1. Load in image and convert the tensor values to float32
    target_image = torchvision.io.read_image(str(image_path)).type(torch.float32)
    
    # 2. Divide the image pixel values by 255 to get them between [0, 1]
    target_image = target_image / 255. 
    
    # 3. Transform if necessary
    if transform:
        target_image = transform(target_image)
    
    # 4. Make sure the model is on the target device
    model.to(device)
    
    # 5. Turn on model evaluation mode and inference mode
    model.eval()
    with torch.inference_mode():
        # Add an extra dimension to the image
        target_image = target_image.unsqueeze(dim=0)
    
        # Make a prediction on image with an extra dimension and send it to the target device
        target_image_pred = model(target_image.to(device))
        
    # 6. Convert logits -> prediction probabilities (using torch.softmax() for multi-class classification)
    target_image_pred_probs = torch.softmax(target_image_pred, dim=1)

    # 7. Convert prediction probabilities -> prediction labels
    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
    
    # 8. Plot the image alongside the prediction and prediction probability
    plt.imshow(target_image.squeeze().permute(1, 2, 0)) # make sure it's the right size for matplotlib
    if class_names:
        title = f"Pred: {class_names[target_image_pred_label.cpu()]} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    else: 
        title = f"Pred: {target_image_pred_label} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    plt.title(title)
    plt.axis(False);

# Pred on our custom image
pred_and_plot_image(model=model_1,
                    image_path=custom_image_path,
                    class_names=class_names,
                    transform=custom_image_transform,
                    device=device)

看起来我们的模型仅凭猜测就得到了正确的预测。

但其他图像并不总是如此……

图像也像素化了，因为我们[64, 64]使用调整了它的大小custom_image_transform。

Main takeaways

PyTorch 有许多内置函数来处理各种数据，从视觉到文本到音频到推荐系统。
如果 PyTorch 的内置数据加载函数不符合您的要求，您可以编写代码通过子类化来创建自己的自定义数据集torch.utils.data.Dataset。
torch.utils.data.DataLoaderPyTorch 中的帮助将您的转变Dataset为可在训练和测试模型时使用的可迭代对象。
许多机器学习都在处理过度拟合和欠拟合之间的平衡（我们针对上述情况讨论了不同的方法，因此一个很好的练习是进行更多研究并编写代码来尝试不同的技术）。
只要你将数据格式化为与模型训练时类似的格式，就可以使用经过训练的模型预测你自己的自定义数据。确保处理好 PyTorch 和深度学习的三大错误：
错误的数据类型–torch.float32当您的数据为时，您的模型是预期的torch.uint8。
错误的数据形状[batch_size, color_channels, height, width]-当您的数据为时，您的模型是预期的[color_channels, height, width]。
错误的设备- 您的模型在 GPU 上，但您的数据在 CPU 上。

Exercises

所有练习都集中于练习以上部分中的代码。

您应该能够通过参考每个部分或按照链接的资源来完成它们。

所有练习都应使用与设备无关的代码来完成。

资源：

04 年练习模板笔记本
04 示例解决方案笔记本（在查看之前先尝试练习）

我们的模型表现不佳（不能很好地拟合数据）。防止拟合不足的 3 种方法是什么？写下来并用一句话解释每种方法。
重新创建我们在第 1、2、3 和 4 节中构建的数据加载函数。您应该已经DataLoader准备好训练和测试了。
重新创建model_0我们在第 7 节中构建的内容。
为建立训练和测试功能model_0。
尝试对你在练习 3 中创建的模型进行 5、20 和 50 个时期的训练，结果会怎样？
- 使用torch.optim.Adam()学习率为 0.001 作为优化器。
将模型中的隐藏单元数量加倍，并训练 20 个时期，结果会发生什么变化？
将您在模型中使用的数据增加一倍，并进行 20 个时期的训练，结果会怎样？
- 注意：您可以使用自定义数据创建笔记本来扩大您的 Food101 数据集。
- 您还可以在 GitHub 上找到已经格式化的双倍数据（20% 而不是 10% 子集）数据集，您需要像练习 2 中那样编写下载代码才能将其放入此笔记本中。
对您自己定制的披萨/牛排/寿司图像做出预测（您甚至可以从互联网上下载一个）并分享您的预测。
- 你在练习 7 中训练的模型正确吗？
- 如果不是，您认为可以做些什么来改善它？

Extra-curriculum

通过 PyTorch数据集和数据加载器教程笔记本练习对 PyTorchDataset和的知识。DataLoader
花 10 分钟阅读 PyTorchtorchvision.transforms文档
- 您可以在变换教程的插图中看到变换的实际演示。
花 10 分钟阅读 PyTorch torchvision.datasets文档
- 哪些数据集令您印象深刻？
- 您如何尝试基于这些来建立模型？
TorchData 目前处于测试阶段（截至 2022 年 4 月），它将成为未来在 PyTorch 中加载数据的一种方式，但您现在就可以开始检查它。
为了加速深度学习模型，你可以使用一些技巧来改进计算、内存和开销计算，更多信息请阅读Horace He 的文章《从第一原理开始让深度学习变得更好》。