TensorFlow2_200729系列---8、前向传播（张量）实战

一、总结

一句话总结：

A、就是手动（模拟原理）实现多层多节点的神经网络计算，784（输入）->256->128->10（输出）

B、多节点的神经网络，用矩阵计算很方便，比代数计算方便多了

for epoch in range(10): # iterate db for 10
    for step, (x, y) in enumerate(train_db): # for every batch
        # x:[128, 28, 28]
        # y: [128]

        # 维度变换，-1自动计算
        # [b, 28, 28] => [b, 28*28]
        x = tf.reshape(x, [-1, 28*28])

        # 自动梯度计算
        with tf.GradientTape() as tape: # tf.Variable
            # x: [b, 28*28]
            # h1 = x@w1 + b1
            # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256]
            # 这里矩阵运算真方便，如果是代数，就要多写几层循环，太麻烦
            h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256])
            h1 = tf.nn.relu(h1)
            # [b, 256] => [b, 128]
            h2 = h1@w2 + b2
            h2 = tf.nn.relu(h2)
            # [b, 128] => [b, 10]
            out = h2@w3 + b3

            # compute loss
            # 转换成one_hot编码
            # out: [b, 10]
            # y: [b] => [b, 10]
            y_onehot = tf.one_hot(y, depth=10)

            # mse = mean(sum(y-out)^2)
            # [b, 10]
            loss = tf.square(y_onehot - out)
            # mean: scalar
            loss = tf.reduce_mean(loss)

        # compute gradients
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # print(grads)
        # w1 = w1 - lr * w1_grad
        # 数据原地更新，只是值改变，类型不变
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])
        w3.assign_sub(lr * grads[4])
        b3.assign_sub(lr * grads[5])


        if step % 100 == 0:
            print(epoch, step, 'loss:', float(loss))

1、手写数字识别的时候，[784]->[512]->[128]->[10]不断降维，表示的神经网络是怎样的？

输入是图片，也就是相当于784节点，然后是512节点，所以参数w的话，是[784, 256]个，也就是784*256个

2、tf.random.truncated_normal()？

截断正态分布：sigmoid激活函数，用截断的正态分布更好，因为这样就不会有两侧的梯度消失的情况

3、神经网络参数初始化实例（第一层的784*256个w，以及256个b）？

A、w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))

B、b1 = tf.Variable(tf.zeros([256]))

4、初始化数据的时候，为什么转换成tf.Variable，比如 w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))？

tf.Variable类型的数据才能自动跟踪梯度

5、第一层神经网络的计算（y=relu(w@x+b)）？

h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256])

h1 = tf.nn.relu(h1)

6、tensorflow转成one_hot编码的代码？

y_onehot = tf.one_hot(y, depth=10)

7、w1 = w1 - lr * w1_grad 过程代码？

w1.assign_sub(lr * grads[0]) # 数据原地更新，只是值改变，类型不变

二、前向传播（张量）实战

博客对应课程的视频位置：

import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import datasets
import  os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# x: [60k, 28, 28],
# y: [60k]
(x, y), _ = datasets.mnist.load_data()

# 图像数据归一化
# x: [0~255] => [0~1.]
x = tf.convert_to_tensor(x, dtype=tf.float32) / 255.
y = tf.convert_to_tensor(y, dtype=tf.int32)

print(x.shape, y.shape, x.dtype, y.dtype)
print("----------------x的最小值和最大值----------------")
print(tf.reduce_min(x), tf.reduce_max(x))
print("----------------y的最小值和最大值----------------")
print(tf.reduce_min(y), tf.reduce_max(y))

(60000, 28, 28) (60000,) <dtype: 'float32'> <dtype: 'int32'>
----------------x的最小值和最大值----------------
tf.Tensor(0.0, shape=(), dtype=float32) tf.Tensor(1.0, shape=(), dtype=float32)
----------------y的最小值和最大值----------------
tf.Tensor(0, shape=(), dtype=int32) tf.Tensor(9, shape=(), dtype=int32)

In [2]:

# print(x[0])

In [3]:

# 每次取128张图片
train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128)
train_iter = iter(train_db)
sample = next(train_iter)
print('batch:', sample[0].shape, sample[1].shape)

batch: (128, 28, 28) (128,)

初始化w和b

In [5]:

# [b, 784] => [b, 256] => [b, 128] => [b, 10]
# [dim_in, dim_out], [dim_out]
# 创建三对tensor

# tf.random.truncated_normal()
# 截断正态分布
# sigmoid激活函数，用截断的正态分布更好，因为这样就不会有两侧的梯度消失的情况
# w给随机数，b给0
w1 = tf.Variable(tf.random.truncated_normal([784, 256], stddev=0.1))
# 方差为0.1
b1 = tf.Variable(tf.zeros([256]))
w2 = tf.Variable(tf.random.truncated_normal([256, 128], stddev=0.1))
b2 = tf.Variable(tf.zeros([128]))
w3 = tf.Variable(tf.random.truncated_normal([128, 10], stddev=0.1))
b3 = tf.Variable(tf.zeros([10]))
# tf.Variable类型的数据才能自动跟踪梯度
print(w1)
print(b1)

# 手写数字识别的时候，[784]->[512]->[128]->[10]不断降维，表示的神经网络是怎样的
# 输入是图片，也就是相当于784节点，然后是512节点，
# 所以参数w的话，是[784, 256]个，也就是784*256个

# 这里这个例子是模拟神经节点个数比较多的例子，也就是原理

<tf.Variable 'Variable:0' shape=(784, 256) dtype=float32, numpy=
array([[-0.01459773,  0.04212301, -0.13790604, ...,  0.08872523,
         0.0180968 , -0.05272236],
       [-0.05939614, -0.07033015,  0.11356516, ..., -0.07142108,
        -0.05276656, -0.11897977],
       [-0.13474506,  0.0867815 , -0.02494188, ...,  0.07558486,
        -0.09416623, -0.10133455],
       ...,
       [ 0.02961926,  0.00650147,  0.0724051 , ...,  0.00143908,
         0.06767387,  0.07980036],
       [-0.08844294, -0.08748388,  0.04978892, ..., -0.04194697,
        -0.19967984,  0.09004744],
       [-0.00167311, -0.00427087, -0.03273085, ..., -0.03299765,
        -0.13217148,  0.0366438 ]], dtype=float32)>
<tf.Variable 'Variable:0' shape=(256,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.], dtype=float32)>

In [6]:

# 学习率
lr = 1e-3

In [7]:

for epoch in range(10): # iterate db for 10
    for step, (x, y) in enumerate(train_db): # for every batch
        # x:[128, 28, 28]
        # y: [128]

        # 维度变换，-1自动计算
        # [b, 28, 28] => [b, 28*28]
        x = tf.reshape(x, [-1, 28*28])

        # 自动梯度计算
        with tf.GradientTape() as tape: # tf.Variable
            # x: [b, 28*28]
            # h1 = x@w1 + b1
            # [b, 784]@[784, 256] + [256] => [b, 256] + [256] => [b, 256] + [b, 256]
            # 这里矩阵运算真方便，如果是代数，就要多写几层循环，太麻烦
            h1 = x@w1 + tf.broadcast_to(b1, [x.shape[0], 256])
            h1 = tf.nn.relu(h1)
            # [b, 256] => [b, 128]
            h2 = h1@w2 + b2
            h2 = tf.nn.relu(h2)
            # [b, 128] => [b, 10]
            out = h2@w3 + b3

            # compute loss
            # 转换成one_hot编码
            # out: [b, 10]
            # y: [b] => [b, 10]
            y_onehot = tf.one_hot(y, depth=10)

            # mse = mean(sum(y-out)^2)
            # [b, 10]
            loss = tf.square(y_onehot - out)
            # mean: scalar
            loss = tf.reduce_mean(loss)

        # compute gradients
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # print(grads)
        # w1 = w1 - lr * w1_grad
        # 数据原地更新，只是值改变，类型不变
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])
        w3.assign_sub(lr * grads[4])
        b3.assign_sub(lr * grads[5])


        if step % 100 == 0:
            print(epoch, step, 'loss:', float(loss))

0 0 loss: 0.44212737679481506
0 100 loss: 0.19576606154441833
0 200 loss: 0.1763547956943512
0 300 loss: 0.14690229296684265
0 400 loss: 0.16461005806922913
1 0 loss: 0.1481630802154541
1 100 loss: 0.13947637379169464
1 200 loss: 0.14118121564388275
1 300 loss: 0.125637024641037
1 400 loss: 0.1387656033039093
2 0 loss: 0.1261889487504959
2 100 loss: 0.12272664159536362
2 200 loss: 0.12352275848388672
2 300 loss: 0.11308278143405914
2 400 loss: 0.123313769698143
3 0 loss: 0.11250965297222137
3 100 loss: 0.11181002855300903
3 200 loss: 0.11165700107812881
3 300 loss: 0.10446763038635254
3 400 loss: 0.1128990426659584
4 0 loss: 0.10315148532390594
4 100 loss: 0.10395848751068115
4 200 loss: 0.10311208665370941
4 300 loss: 0.09810370206832886
4 400 loss: 0.10533001273870468
5 0 loss: 0.09622949361801147
5 100 loss: 0.09795238822698593
5 200 loss: 0.09663848578929901
5 300 loss: 0.09307707101106644
5 400 loss: 0.09949030727148056
6 0 loss: 0.09085840731859207
6 100 loss: 0.0931745320558548
6 200 loss: 0.09150215238332748
6 300 loss: 0.08896996080875397
6 400 loss: 0.09488751739263535
7 0 loss: 0.08653409779071808
7 100 loss: 0.08927203714847565
7 200 loss: 0.08729908615350723
7 300 loss: 0.08555082231760025
7 400 loss: 0.09112046658992767
8 0 loss: 0.0829336866736412
8 100 loss: 0.08598417043685913
8 200 loss: 0.08376041799783707
8 300 loss: 0.0826173797249794
8 400 loss: 0.087943896651268
9 0 loss: 0.07984770834445953
9 100 loss: 0.08314327895641327
9 200 loss: 0.08072730898857117
9 300 loss: 0.08008376508951187
9 400 loss: 0.08518940210342407

In [ ]:

我的旨在学过的东西不再忘记（主要使用艾宾浩斯遗忘曲线算法及其它智能学习复习算法）的偏公益性质的完全免费的编程视频学习网站： fanrenyi.com；有各种前端、后端、算法、大数据、人工智能等课程。

版权申明：欢迎转载，但请注明出处

一些博文中有一些参考内容因时间久远找不到来源了没有注明，如果侵权请联系我删除。

聊技术，交朋友，修心境，qq404006308，微信fan404006308

人工智能群：939687837

作者相关推荐

感悟总结

相关阅读:
RHEL 6.5 安装Docker
sar命令
 Linux 安装部署 Redis
hugepage设置
 pycharm使用
 oracle如何保证数据一致性和避免脏读
 转：数据库实例自动crash并报ORA-27157、ORA-27300等错误
 oracle安装内核参数设置
 外部表
 LNMP环境搭建
原文地址：https://www.cnblogs.com/Renyi-Fan/p/13418110.html