基于python的BP神经网络算法对mnist数据集的识别--批量处理版

基于python的BP神经网络算法对mnist数据集的识别
目录：

1. mnist数据集

1.1 mnist数据集是什么

1.2 mnist数据集的读取

2. 神经网络

2.1 批处理数据

2.2 前向传播

2.2.1 sigmoid和softmax函数

2.2.2 损失函数

2.2.3 识别精度

2.3 反向传播

2.4 构建神经网络

3.训练神经网络

1.mnist数据集

在使用机器学习以及深度学习做神经网络算法时，常用的示例是使用mnist数据集的train_img 和 test_img 作为神经网络的输入，以mnist数据集的 train_label 和 test_label
本节简要的介绍mnist数据集和mnis数据集的导入以及处理

1.1 mnist数据集是什么
mnist是一个包含各种手写数字图片的数据集：其中有60000个训练数据和10000个测试时局，即60000个 train_img 和与之对应的 train_label，10000个 test_img 和与之对应的test_label。

其中的 train_img 和 test_img 就是这种图片的形式，train_img 是为了训练神经网络算法的训练数据，test_img 是为了测试神经网络算法的测试数据，每一张图片为2828，将图片转换为2828=784个像素点，每个像素点的值为0到255，像素点值的大小代表灰度，从而构成一个1784的矩阵，作为神经网络的输入，而神经网络的输出形式为110的矩阵，个：eg：[0.01，0.01，0.01，0.04，0.8，0.01，0.1，0.01，0.01，0.01]，矩阵里的数字代表神经网络预测值的概率，比如0.8代表第五个数的预测值概率。

其中 train_label 和 test_label 是对应训练数据和测试数据的标签，可以理解为一个1*10的矩阵，用one-hot-vectors（只有正确解表示为1）表示，one_hot_label为True的情况下，标签作为one-hot数组返回，one-hot数组例：[0，0，0，0，1，0，0，0，0，0]，即矩阵里的数字1代表第五个数为True，也就是这个标签代表数字5。

1.2 mnist数据集的读取
load_mnist(normalize=True, flatten=True, one_hot_label=False):中，
normalize : 是否将图像的像素值正规化为0.0~1.0（将像素值正规化有利于提高精度）flatten : 是否将图像展开为一维数组
one_hot_label:是否采用one-hot表示。
源码在 https://gitee.com/ldy1118/netural-network 中的mnist.py中，可直接调用（需要提前下载mnist数据集，mnist官网下载地址：http://yann.lecun.com/exdb/mnist/，四个红色文件，并将文件放在mnist同级目录下）：

from mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=True)
print(x_train.shape, t_train.shape, x_test.shape, t_test.shape)

输出结果为：(60000, 784) (60000, 10) (10000, 784) (10000, 10)

2. 神经网络

2.1批处理数据

现在已经获得数据集，然后搭建一个两层（两个权重矩阵，一个隐藏层）的神经网络，其中输入节点和输出节点的个数是确定的，分别为 784 和 10。而隐藏层节点的个数还未确定，并没有明确要求隐藏层的节点个数，所以在这里取50个。现在神经网络的结构已经确定了，再看一下里面是怎么样的，这里画出了对一个数据的运算过程：

数学公式推导为：

在实际过程中，如果每次对一个数据训练n次神经网络，一共60000个数据，这个运算可想而知还是很庞大的，所以在这里介绍一种mini-batch的方法批量选取数据：

from mnist import load_mnist
  #读取数据：
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, flatten=True, one_hot_label=True)
epoch = 20000  #对一批数据的迭代次数
for i in range(epoch):
    batch_mask = np.random.choice(train_size, batch_size)  # 从0到60000 随机选100个数
    x_batch = x_train[batch_mask]  # 索引x_train中随机选出的行数，构成一批数据
    y_batch = net.predict(x_batch)  # 计算这批数据的预测值
    t_batch = t_train[batch_mask]  # 同x_batch

2.2 前向传播
前向传播时，我们可以构造一个函数，输入数据，输出预测值

def predict(x,t):
	    a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

2.2.1 sigmoid和softmax函数

在神经网络中，输入数据经过前向传播得到预测值
需要用到激活函数得出各节点的输出值，这里用到sigmoid和softmax函数

其中要注意y=softmax（）函数并不是一个自变量和一个因变量，每个因变量都与各个自变量是有关系的。
下面会用到sigmoid函数的导数，sigmoid，sigmoid的导数和softmax函数的代码如下：

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))  
    
def sigmoid_grad(x):
    return (1.0 - sigmoid(x)) * sigmoid(x)

def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)  #
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) 
    return np.exp(x) / np.sum(np.exp(x))

2.2.2 损失函数
到上一步，我们已经求出神经网络对一组数据的预测值，是一个110的矩阵，但是如何衡量神经网络算法的精度呢？这就引入了损失函数，常用损失函数有均方误差和交叉熵误差

其中，Yk表示的是第k个节点的预测值，Tk表示标签中第k个节点的one-hot值，举前面的eg：（手写数字5的图片预测值和5的标签）
Yk=[0.01，0.01，0.01，0.04，0.8，0.01，0.1，0.01，0.01，0.01]
Tk=[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
值得一提的是，在交叉熵误差函数中，Tk的值只有一个1，其余为0，所以对于这个数据的交叉熵误差就为 E = -1（log0.8）。
在这里选用交叉熵误差作为损失函数，代码实现如下：

def loss(y, t):
    # 监督数据是one-hot-vector的情况下，转换为正确解标签的索引
    if t.size == y.size:
        t = t.argmax(axis=1)  #找出一行中最大数值的索引号
             
    batch_size = y.shape[0]  # 批的尺寸，y.shape[0]即y的行数
    s = y[np.arange(batch_size), t]  # 找出y中对应于标签t中正确解位置的预测值
      # s+1e-7 防止取到无穷大，除以batch_size是因为np.sum求了和
    return -np.sum(np.log(s + 1e-7)) / batch_size  # s+1e-7 防止取到无穷大

2.2.3 识别精度

废话不多说，直接上代码：

    def accuracy(x,t):
        y = predict(x)  # y为100*10的矩阵，因为前面选取了一批数据（包含100个数据）
        p = np.argmax(y, axis=1)  # 找出y中最大值的索引号，构成1*100的矩阵
        q = np.argmax(t, axis=1)  # 找出t中最大值的索引号，构成1*100的矩阵
        acc = np.sum(p == q) / len(y)  # 按布尔类型求和，在除以数据个数
        return acc

整个前向传播过程到此就结束了，梳理一下思路：目的是求一个能使输入数据尽可能得出与标签相等的预测值的w1, b1, w2, b2，衡量神经网络精度的是损失函数，也就是说，我们要对损失函数求w1, b1, w2, b2 的偏导数构成梯度，物理意义为：w1, b1, w2, b2 的变化在多大程度上影响损失函数的值，也就是将各偏导数加在第一次迭代的w1, b1, w2, b2 上进行更新（但不是单纯的相加，后面会介绍），第二次迭代将使用更新后的w1, b1, w2, b2 ，这一步称为反向传播，一个前向传播再加一个反向传播构成一次迭代，下面将介绍反向传播中随机梯度下降的方法。

2.3反向传播

计算梯度
在求偏导数的过程中要用到链式法则，我们来看一下在预测值Yk和w1, b1, w2, b2之间的变量：

loss 对 w1, b1, w2, b2的偏导数：

这里需要注意矩阵的偏导数，求完要检查矩阵的形状，其次上述公式里主义区分矩阵的点乘和*乘。

2.4 构建神经网络
前面我们定义了预测值predict, 损失函数loss, 识别精度accuracy, 梯度grad，下面构建一个神经网络的类，把这些方法添加到神经网络的类中：

import numpy as np
from functions import sigmoid, sigmoid_grad, softmax, loss

class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std):
        # 初始化权重
        self.dict = {}  # 创建一个字典用于存储w1, b1, w2, b2
        self.dict['w1'] = weight_init_std * np.random.randn(input_size, hidden_size)  
        self.dict['b1'] = np.zeros(hidden_size)  
        self.dict['w2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.dict['b2'] = np.zeros(output_size) 

    def predict(self, x):
        w1, w2 = self.dict['w1'], self.dict['w2']
        b1, b2 = self.dict['b1'], self.dict['b2']

        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

        return y
        
	def loss(y, t):
    	if t.size == y.size:
        	t = t.argmax(axis=1) 
             
    	batch_size = y.shape[0] 
    	
    	return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size  

    def gradient(self, x, t):
        w1, w2 = self.dict['w1'], self.dict['w2']
        b1, b2 = self.dict['b1'], self.dict['b2']
        grads = {}

        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, w2) + b2
        y = softmax(a2)

        num = x.shape[0]
        dy = (y - t) / num
        grads['w2'] = np.dot(z1.T, dy)
        grads['b2'] = np.sum(dy, axis=0)

        da1 = np.dot(dy, w2.T)
        dz1 = sigmoid_grad(a1) * da1
        grads['w1'] = np.dot(x.T, dz1)
        grads['b1'] = np.sum(dz1, axis=0)

        return grads

    def accuracy(self,x,t):
        y = self.predict(x)
        p = np.argmax(y, axis=1)
        q = np.argmax(t, axis=1)
        acc = np.sum(p == q) / len(y)
        return acc

3.训练神经网络

现在，神经网络已经是一个带有计算预测值，损失值，精度和随机梯度下降法的网络了，我们只需要指定迭代就ok了，为了验证输入每一批训练后神经网络的训练情况，在对每一批数据训练后加入了对测试数据的精度，实现如下：

import numpy as np
import matplotlib.pyplot as plt
from TwoLayerNet import TwoLayerNet
from mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)
net = TwoLayerNet(input_size=784, hidden_size=50, output_size=10, weight_init_std=0.01)

epoch = 20000
batch_size = 100
lr = 0.1

train_size = x_train.shape[0]  # 60000
iter_per_epoch = max(train_size / batch_size, 1)  # 600

train_loss_list = []
train_acc_list = []
test_acc_list = []

for i in range(epoch):
    batch_mask = np.random.choice(train_size, batch_size)  # 从0到60000 随机选100个数
    x_batch = x_train[batch_mask]
    y_batch = net.predict(x_batch)
    t_batch = t_train[batch_mask]
    grad = net.gradient(x_batch, t_batch)

    for key in ('w1', 'b1', 'w2', 'b2'):
        net.dict[key] -= lr * grad[key]
    loss = net.loss(y_batch, t_batch)
    train_loss_list.append(loss)

    # 对每批数据记录一次精度和当前的损失值
    if i % iter_per_epoch == 0:
        train_acc = net.accuracy(x_train, t_train)
        test_acc = net.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(
            '第' + str(i + 1) + '次迭代''train_acc, test_acc, loss :| ' + str(train_acc) + ", " + str(test_acc) + ',' + str(
                loss))

# 绘制 精度 = f（迭代批数）的图像
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()

运行后识别精度可将近达到96%，识别精度在训练15批数据后趋于稳定，由于初始权重和偏置是随机生成的，每次运行结果可能不一样，源码在我的码云仓库。

刚刚入门Python和深度学习，理论推导均为手打，文中若有错误，欢迎批评指正！

参考书目：深度学习入门–基于python的理论与实现

本文地址：https://blog.csdn.net/qq_39474621/article/details/107442903

基于python的BP神经网络算法对mnist数据集的识别--批量处理版

1. mnist数据集

1.1 mnist数据集是什么

1.2 mnist数据集的读取

2. 神经网络

2.1 批处理数据

2.2 前向传播

2.2.1 sigmoid和softmax函数

2.2.2 损失函数

2.2.3 识别精度

2.3 反向传播

2.4 构建神经网络

3.训练神经网络

1.mnist数据集

2. 神经网络

2.3反向传播

3.训练神经网络

相关推荐

Python图像处理中图像增广算法介绍

【pandas小技巧】--数据转置

7.1 C++ STL 非变易查找算法

《深入理解Java虚拟机》读书笔记：垃圾收集算法

opencv-python 车牌检测和识别

flink-cdc同步mysql数据到elasticsearch

K210 调节颜色阈值识别红绿黄三色

图解算法，原理逐步揭开「GitHub 热点速览」