Pytorch与视觉竞赛入门3.2-使用Pytorch搭建ResNet

ResNet

原理

原理部分主要摘自:ResNet详解与分析

残差结构有什么作用可以查看之前的博客:Transformer相关——残差模块

残差模块结构图示

\(F(x)+x\)构成的block称之为Residual Block,即残差块,如下图所示,多个相似的Residual Block串联构成ResNet。

一个残差块有2条路径\(F(x)\)\(x\)\(F(x)\)路径拟合残差,不妨称之为残差路径,\(x\)路径为identity mapping恒等映射,称之为”shortcut”。图中的\(⊕\)为element-wise addition,要求参与运算的\(F(x)\)\(x\)的尺寸要相同。所以,随之而来的问题是:

  1. 残差路径如何设计?
  2. shortcut路径如何设计?
  3. Residual Block之间怎么连接?(也就是残差网络的结构)
残差路径如何设计?

在原论文中,残差路径可以大致分成2种,一种有bottleneck结构,即下图右中的1×1卷积层,用于先降维再升维,主要出于降低计算复杂度的现实考虑,称之为“bottleneck block”,另一种没有bottleneck结构,如下图左所示,称之为“basic block”。

shortcut路径如何设计?

shortcut路径大致也可以分成2种,取决于残差路径是否改变了feature map数量和尺寸,一种是将输入x原封不动地输出,另一种则需要经过1×1卷积来升维 or/and 降采样,主要作用是将输出与F(x)路径的输出保持shape一致,对网络性能的提升并不明显,两种结构如下图所示:

https://d2l.ai/chapter_convolutional-modern/resnet.html
https://d2l.ai/chapter_convolutional-modern/resnet.html

残差网络的结构

ResNet的设计有如下特点:

  1. 与plain net相比,ResNet多了很多“旁路”,即shortcut路径,其首尾圈出的layers构成一个Residual Block;
  2. ResNet中,所有的Residual Block都没有pooling层,降采样是通过conv的stride实现的
  3. 分别在conv3_1、conv4_1和conv5_1 Residual Block,降采样1倍,同时feature map数量增加1倍,如图中虚线划定的block;
  4. 通过Average Pooling得到最终的特征,而不是通过全连接层;
  5. 每个卷积层之后都紧接着BatchNorm layer,为了简化,图中并没有标出;

ResNet结构非常容易修改和扩展,通过调整block内的channel数量以及堆叠的block数量,就可以很容易地调整网络的宽度和深度,来得到不同表达能力的网络,而不用过多地担心网络的“退化”问题,只要训练数据足够,逐步加深网络,就可以获得更好的性能表现。

不断地增加ResNet的深度,甚至增加到1000层以上,残差也没有发生“退化”,可见Residual Block的有效性。ResNet的动机在于认为拟合残差比直接拟合潜在映射更容易优化,下面通过绘制error surface直观感受一下shortcut路径的作用,图片截自Loss Visualization

  • ResNet-20(no short)浅层plain net的error surface还没有很复杂,优化也会很困难,但是增加到56层后复杂程度极度上升。对于plain net,随着深度增加,error surface 迅速“恶化”
  • 引入shortcut后,error suface变得平滑很多,梯度的可预测性变得更好,显然更容易优化

改进

论文Identity Mappings in Deep Residual Networks进一步研究ResNet,通过ResNet反向传播的理论分析以及设计了以下几种不同的Residual Block结构。注意,这里的视角与之前不同,这里将shortcut路径视为主干路径,将残差路径视为旁路。

经过实验验证,(e)full pre-activation表现最好,具有更强的泛化能力,能更好地避免“退化”,堆叠大于1000层后,性能仍在变好。具体的变化在于:

  1. 通过保持shortcut路径的“纯净”,可以让信息在前向传播和反向传播中平滑传递,这点十分重要。为此,如无必要,不引入1×1卷积等操作,同时将上图灰色路径上的ReLU移到了F(x)路径上。
  2. 在残差路径上,将BN和ReLU统一放在weight前作为pre-activation,获得了“Ease of optimization”以及“Reducing overfitting”的效果。

代码实现

模型需求分析

对模型进行抽象和分析,以便写出更优美的代码~

需要实现:

  1. 两个不同残差模块的类BasicBlockBottleneck;其中的3x3的卷积层1x1的卷积层可以被进一步包装起来,注意在conv3_1、conv4_1和conv5_1 Residual Block,需要降采样1倍,但在conv2_1不需要,因此残差模块需要一个控制是否降采样的参数。
  2. ResNet模型:需要传入block类型及block的层数;
  3. 构建模型的函数:封装构建不同的ResNet。

代码

pytorch官方实现的源码

conv3x3和conv1x1

dilation表示空洞卷积间隔,空洞卷积是什么可以查看的博客:Pytorch与视觉竞赛入门2_冬于的博客-CSDN博客

from typing import Type, Any, Callable, Union, List, Optional

import torch
import torch.nn as nn
from torch import Tensor
def conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1) -> nn.Conv2d:
"""3x3 convolution with padding"""
return nn.Conv2d(
in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias=False,
dilation=dilation,
)


def conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:
"""1x1 convolution"""
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
BasicBlock
class BasicBlock(nn.Module):
expansion: int = 1

def __init__(
self,
inplanes: int,
planes: int,
stride: int = 1,
downsample: Optional[nn.Module] = None,
groups: int = 1,
base_width: int = 64,
dilation: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None,
) -> None:
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError("BasicBlock only supports groups=1 and base_width=64")
if dilation > 1: #空洞卷积
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
# Both self.conv1 and self.downsample layers downsample the input when stride != 1 如果stride=1就不使用下采样
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride

def forward(self, x: Tensor) -> Tensor:
identity = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)

if self.downsample is not None:
identity = self.downsample(x)

out += identity
out = self.relu(out)

return out
Bottleneck
class Bottleneck(nn.Module):
# Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
# while original implementation places the stride at the first 1x1 convolution(self.conv1)
# according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
# This variant is also known as ResNet V1.5 and improves accuracy according to
# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.

expansion: int = 4

def __init__(
self,
inplanes: int,
planes: int,
stride: int = 1,
downsample: Optional[nn.Module] = None,
groups: int = 1,
base_width: int = 64,
dilation: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None,
) -> None:
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = int(planes * (base_width / 64.0)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride

def forward(self, x: Tensor) -> Tensor:
identity = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)

if self.downsample is not None:
identity = self.downsample(x)

out += identity
out = self.relu(out)

return out

ResNet模型
class ResNet(nn.Module):
def __init__(
self,
block: Type[Union[BasicBlock, Bottleneck]],
layers: List[int],
num_classes: int = 1000,
zero_init_residual: bool = False,
groups: int = 1,
width_per_group: int = 64,
replace_stride_with_dilation: Optional[List[bool]] = None,
norm_layer: Optional[Callable[..., nn.Module]] = None,
) -> None:
super().__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer

self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError(
"replace_stride_with_dilation should be None "
f"or a 3-element tuple, got {replace_stride_with_dilation}"
)
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)

# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.
# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0) # type: ignore[arg-type]
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0) # type: ignore[arg-type]

def _make_layer(
self,
block: Type[Union[BasicBlock, Bottleneck]],
planes: int,
blocks: int,
stride: int = 1,
dilate: bool = False,
) -> nn.Sequential:
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
norm_layer(planes * block.expansion),
)

layers = []
layers.append(
block(
self.inplanes, planes, stride, downsample, self.groups, self.base_width, previous_dilation, norm_layer
)
)
self.inplanes = planes * block.expansion
for _ in range(1, blocks): #只有layer2...layerN内部的第一个残差模块要用下采样
layers.append(
block(
self.inplanes,
planes,
groups=self.groups,
base_width=self.base_width,
dilation=self.dilation,
norm_layer=norm_layer,
)
)

return nn.Sequential(*layers)

def _forward_impl(self, x: Tensor) -> Tensor:
# See note [TorchScript super()]
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)

return x

def forward(self, x: Tensor) -> Tensor:
return self._forward_impl(x)
构建模型
def _resnet(
arch: str,
block: Type[Union[BasicBlock, Bottleneck]],
layers: List[int],
pretrained: bool,
progress: bool,
**kwargs: Any,
) -> ResNet:
model = ResNet(block, layers, **kwargs)
if pretrained:
state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
model.load_state_dict(state_dict)
return model


def resnet18(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
r"""ResNet-18 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet("resnet18", BasicBlock, [2, 2, 2, 2], pretrained, progress, **kwargs)


def resnet34(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
r"""ResNet-34 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet("resnet34", BasicBlock, [3, 4, 6, 3], pretrained, progress, **kwargs)


def resnet50(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
r"""ResNet-50 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet("resnet50", Bottleneck, [3, 4, 6, 3], pretrained, progress, **kwargs)


def resnet101(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
r"""ResNet-101 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet("resnet101", Bottleneck, [3, 4, 23, 3], pretrained, progress, **kwargs)


def resnet152(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:
r"""ResNet-152 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet("resnet152", Bottleneck, [3, 8, 36, 3], pretrained, progress, **kwargs)

打印模型

from torchsummary import summary
net = resnet34()
summary(net, (3, 224, 224))
#打印结果
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 36,864
BatchNorm2d-6 [-1, 64, 56, 56] 128
ReLU-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
ReLU-10 [-1, 64, 56, 56] 0
BasicBlock-11 [-1, 64, 56, 56] 0
Conv2d-12 [-1, 64, 56, 56] 36,864
BatchNorm2d-13 [-1, 64, 56, 56] 128
ReLU-14 [-1, 64, 56, 56] 0
Conv2d-15 [-1, 64, 56, 56] 36,864
BatchNorm2d-16 [-1, 64, 56, 56] 128
ReLU-17 [-1, 64, 56, 56] 0
BasicBlock-18 [-1, 64, 56, 56] 0
Conv2d-19 [-1, 64, 56, 56] 36,864
BatchNorm2d-20 [-1, 64, 56, 56] 128
ReLU-21 [-1, 64, 56, 56] 0
Conv2d-22 [-1, 64, 56, 56] 36,864
BatchNorm2d-23 [-1, 64, 56, 56] 128
ReLU-24 [-1, 64, 56, 56] 0
BasicBlock-25 [-1, 64, 56, 56] 0
Conv2d-26 [-1, 128, 28, 28] 73,728
BatchNorm2d-27 [-1, 128, 28, 28] 256
ReLU-28 [-1, 128, 28, 28] 0
Conv2d-29 [-1, 128, 28, 28] 147,456
BatchNorm2d-30 [-1, 128, 28, 28] 256
Conv2d-31 [-1, 128, 28, 28] 8,192
BatchNorm2d-32 [-1, 128, 28, 28] 256
ReLU-33 [-1, 128, 28, 28] 0
BasicBlock-34 [-1, 128, 28, 28] 0
Conv2d-35 [-1, 128, 28, 28] 147,456
BatchNorm2d-36 [-1, 128, 28, 28] 256
ReLU-37 [-1, 128, 28, 28] 0
Conv2d-38 [-1, 128, 28, 28] 147,456
BatchNorm2d-39 [-1, 128, 28, 28] 256
ReLU-40 [-1, 128, 28, 28] 0
BasicBlock-41 [-1, 128, 28, 28] 0
Conv2d-42 [-1, 128, 28, 28] 147,456
BatchNorm2d-43 [-1, 128, 28, 28] 256
ReLU-44 [-1, 128, 28, 28] 0
Conv2d-45 [-1, 128, 28, 28] 147,456
BatchNorm2d-46 [-1, 128, 28, 28] 256
ReLU-47 [-1, 128, 28, 28] 0
BasicBlock-48 [-1, 128, 28, 28] 0
Conv2d-49 [-1, 128, 28, 28] 147,456
BatchNorm2d-50 [-1, 128, 28, 28] 256
ReLU-51 [-1, 128, 28, 28] 0
Conv2d-52 [-1, 128, 28, 28] 147,456
BatchNorm2d-53 [-1, 128, 28, 28] 256
ReLU-54 [-1, 128, 28, 28] 0
BasicBlock-55 [-1, 128, 28, 28] 0
Conv2d-56 [-1, 256, 14, 14] 294,912
BatchNorm2d-57 [-1, 256, 14, 14] 512
ReLU-58 [-1, 256, 14, 14] 0
Conv2d-59 [-1, 256, 14, 14] 589,824
BatchNorm2d-60 [-1, 256, 14, 14] 512
Conv2d-61 [-1, 256, 14, 14] 32,768
BatchNorm2d-62 [-1, 256, 14, 14] 512
ReLU-63 [-1, 256, 14, 14] 0
BasicBlock-64 [-1, 256, 14, 14] 0
Conv2d-65 [-1, 256, 14, 14] 589,824
BatchNorm2d-66 [-1, 256, 14, 14] 512
ReLU-67 [-1, 256, 14, 14] 0
Conv2d-68 [-1, 256, 14, 14] 589,824
BatchNorm2d-69 [-1, 256, 14, 14] 512
ReLU-70 [-1, 256, 14, 14] 0
BasicBlock-71 [-1, 256, 14, 14] 0
Conv2d-72 [-1, 256, 14, 14] 589,824
BatchNorm2d-73 [-1, 256, 14, 14] 512
ReLU-74 [-1, 256, 14, 14] 0
Conv2d-75 [-1, 256, 14, 14] 589,824
BatchNorm2d-76 [-1, 256, 14, 14] 512
ReLU-77 [-1, 256, 14, 14] 0
BasicBlock-78 [-1, 256, 14, 14] 0
Conv2d-79 [-1, 256, 14, 14] 589,824
BatchNorm2d-80 [-1, 256, 14, 14] 512
ReLU-81 [-1, 256, 14, 14] 0
Conv2d-82 [-1, 256, 14, 14] 589,824
BatchNorm2d-83 [-1, 256, 14, 14] 512
ReLU-84 [-1, 256, 14, 14] 0
BasicBlock-85 [-1, 256, 14, 14] 0
Conv2d-86 [-1, 256, 14, 14] 589,824
BatchNorm2d-87 [-1, 256, 14, 14] 512
ReLU-88 [-1, 256, 14, 14] 0
Conv2d-89 [-1, 256, 14, 14] 589,824
BatchNorm2d-90 [-1, 256, 14, 14] 512
ReLU-91 [-1, 256, 14, 14] 0
BasicBlock-92 [-1, 256, 14, 14] 0
Conv2d-93 [-1, 256, 14, 14] 589,824
BatchNorm2d-94 [-1, 256, 14, 14] 512
ReLU-95 [-1, 256, 14, 14] 0
Conv2d-96 [-1, 256, 14, 14] 589,824
BatchNorm2d-97 [-1, 256, 14, 14] 512
ReLU-98 [-1, 256, 14, 14] 0
BasicBlock-99 [-1, 256, 14, 14] 0
Conv2d-100 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-101 [-1, 512, 7, 7] 1,024
ReLU-102 [-1, 512, 7, 7] 0
Conv2d-103 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-104 [-1, 512, 7, 7] 1,024
Conv2d-105 [-1, 512, 7, 7] 131,072
BatchNorm2d-106 [-1, 512, 7, 7] 1,024
ReLU-107 [-1, 512, 7, 7] 0
BasicBlock-108 [-1, 512, 7, 7] 0
Conv2d-109 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-110 [-1, 512, 7, 7] 1,024
ReLU-111 [-1, 512, 7, 7] 0
Conv2d-112 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-113 [-1, 512, 7, 7] 1,024
ReLU-114 [-1, 512, 7, 7] 0
BasicBlock-115 [-1, 512, 7, 7] 0
Conv2d-116 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-117 [-1, 512, 7, 7] 1,024
ReLU-118 [-1, 512, 7, 7] 0
Conv2d-119 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-120 [-1, 512, 7, 7] 1,024
ReLU-121 [-1, 512, 7, 7] 0
BasicBlock-122 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-123 [-1, 512, 1, 1] 0
Linear-124 [-1, 1000] 513,000
================================================================
Total params: 21,797,672
Trainable params: 21,797,672
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 96.29
Params size (MB): 83.15
Estimated Total Size (MB): 180.01
----------------------------------------------------------------

参考资料

ResNet详解与分析

pytorch官方实现的源码