Auto Byte

Science AI

Faizan Shaikh作者张一豪校对黄继彦编辑李文婧翻译

# 数据科学家的必备读物：从零开始用Python构建循环神经网络（附代码）

#### 引言

• 语音识别

• 机器翻译

• 音乐创作

• 手写识别

• 语法学习

• 深度学习的基础知识

https://www.analyticsvidhya.com/blog/2016/03/introduction-deep-learning-fundamentals-neural-networks/

• 循环神经网络简介

https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/

#### 三、使用Python编码循环神经网络

(number_of_records x length_of_sequence x types_of_sequences)

(number_of_records x types_of_sequences) #where types_of_sequences is 1

%pylab inline

import math

sin_wave = np.array([math.sin(x) for x in np.arange(200)])

plt.plot(sin_wave[:50])

X = []

Y = []

seq_len = 50

num_records = len(sin_wave) - seq_len

for i in range(num_records - 50):

X.append(sin_wave[i:i+seq_len])

Y.append(sin_wave[i+seq_len])

X = np.array(X)

X = np.expand_dims(X, axis=2)

Y = np.array(Y)

Y = np.expand_dims(Y, axis=1)

X.shape, Y.shape

((100, 50, 1), (100, 1))

X_val = []

Y_val = []

for i in range(num_records - 50, num_records):

X_val.append(sin_wave[i:i+seq_len])

Y_val.append(sin_wave[i+seq_len])

X_val = np.array(X_val)

X_val = np.expand_dims(X_val, axis=2)

Y_val = np.array(Y_val)

Y_val = np.expand_dims(Y_val, axis=1)

learning_rate = 0.0001

nepoch = 25

T = 50                   # length of sequence

hidden_dim = 100

output_dim = 1

bptt_truncate = 5

min_clip_value = -10

max_clip_value = 10

U = np.random.uniform(0, 1, (hidden_dim, T))

W = np.random.uniform(0, 1, (hidden_dim, hidden_dim))

V = np.random.uniform(0, 1, (output_dim, hidden_dim))

• U是输入和隐藏图层之间权重权重矩阵

• V是隐藏层和输出层之间权重权重矩阵

• W是循环神经网络层（隐藏层）中共享权重权重矩阵

def sigmoid(x):

return 1 / (1 + np.exp(-x))

步骤2.1.1：前馈传递

步骤2.1.2：计算误差

步骤2.2.1前馈传递

步骤2.2.2：计算误差

步骤2.3.1：正推法

步骤2.3.2：反向传递误差

步骤2.3.3：更新权重

• 步骤2.1：检查训练数据是否丢失

for epoch in range(nepoch):

# check loss on train

loss = 0.0

# do a forward pass to get prediction

for i in range(Y.shape[0]):

x, y = X[i], Y[i]                    # get input, output values of each record

prev_s = np.zeros((hidden_dim, 1))   # here, prev-s is the value of the previous activation of hidden layer; which is initialized as all zeroes

for t in range(T):

new_input = np.zeros(x.shape)    # we then do a forward pass for every timestep in the sequence

new_input[t] = x[t]              # for this, we define a single input for that timestep

mulu = np.dot(U, new_input)

mulw = np.dot(W, prev_s)

mulv = np.dot(V, s)

prev_s = s

# calculate error

loss_per_record = (y - mulv)**2 / 2

loss += loss_per_record

loss = loss / float(y.shape[0])

• 步骤2.2：检查验证数据是否丢失

# check loss on val

val_loss = 0.0

for i in range(Y_val.shape[0]):

x, y = X_val[i], Y_val[i]

prev_s = np.zeros((hidden_dim, 1))

for t in range(T):

new_input = np.zeros(x.shape)

new_input[t] = x[t]

mulu = np.dot(U, new_input)

mulw = np.dot(W, prev_s)

mulv = np.dot(V, s)

prev_s = s

loss_per_record = (y - mulv)**2 / 2

val_loss += loss_per_record

val_loss = val_loss / float(y.shape[0])

print('Epoch: ', epoch + 1, ', Loss: ', loss, ', Val Loss: ', val_loss)

Epoch:  1 , Loss:  [[101185.61756671]] , Val Loss:  [[50591.0340148]]

...

...

• 步骤2.3：开始实际训练

• 步骤2.3.1：正推法

1. 我们首先将输入与输入和隐藏层之间的权重相乘；

2. 在循环神经网络层中添加权重乘以此项，这是因为我们希望获取前一个时间步的内容；

3. 通过sigmoid 激活函数将其与隐藏层和输出层之间的权重相乘；

4. 在输出层，我们对数值进行线性激活，因此我们不会通过激活层传递数值；

5. 在字典中保存当前图层的状态以及上一个时间步的状态。

# train model

for i in range(Y.shape[0]):

x, y = X[i], Y[i]

layers = []

prev_s = np.zeros((hidden_dim, 1))

dU = np.zeros(U.shape)

dV = np.zeros(V.shape)

dW = np.zeros(W.shape)

dU_t = np.zeros(U.shape)

dV_t = np.zeros(V.shape)

dW_t = np.zeros(W.shape)

dU_i = np.zeros(U.shape)

dW_i = np.zeros(W.shape)

# forward pass

for t in range(T):

new_input = np.zeros(x.shape)

new_input[t] = x[t]

mulu = np.dot(U, new_input)

mulw = np.dot(W, prev_s)

mulv = np.dot(V, s)

layers.append({'s':s, 'prev_s':prev_s})

prev_s = s

• 步骤2.3.2：反向传播误差

BPTT与backprop的核心差异在于，循环神经网络层中的所有时间步骤，都进行了反向传播步骤。 因此，如果我们的序列长度为50，我们将反向传播当前时间步之前的所有时间步长。

# derivative of pred

dmulv = (mulv - y)

# backward pass

for t in range(T):

dV_t = np.dot(dmulv, np.transpose(layers[t]['s']))

dsv = np.dot(np.transpose(V), dmulv)

ds = dsv

dprev_s = np.dot(np.transpose(W), dmulw)

for i in range(t-1, max(-1, t-bptt_truncate-1), -1):

ds = dsv + dprev_s

dW_i = np.dot(W, layers[t]['prev_s'])

dprev_s = np.dot(np.transpose(W), dmulw)

new_input = np.zeros(x.shape)

new_input[t] = x[t]

dU_i = np.dot(U, new_input)

dx = np.dot(np.transpose(U), dmulu)

dU_t += dU_i

dW_t += dW_i

dV += dV_t

dU += dU_t

dW += dW_t

• 步骤2.3.3：更新权重

if dU.max() > max_clip_value:

dU[dU > max_clip_value] = max_clip_value

if dV.max() > max_clip_value:

dV[dV > max_clip_value] = max_clip_value

if dW.max() > max_clip_value:

dW[dW > max_clip_value] = max_clip_value

if dU.min() < min_clip_value:

dU[dU < min_clip_value] = min_clip_value

if dV.min() < min_clip_value:

dV[dV < min_clip_value] = min_clip_value

if dW.min() < min_clip_value:

dW[dW < min_clip_value] = min_clip_value

# update

U -= learning_rate * dU

V -= learning_rate * dV

W -= learning_rate * dW

Epoch:  1 , Loss:  [[101185.61756671]] , Val Loss:  [[50591.0340148]]

Epoch:  2 , Loss:  [[61205.46869629]] , Val Loss:  [[30601.34535365]]

Epoch:  3 , Loss:  [[31225.3198258]] , Val Loss:  [[15611.65669247]]

Epoch:  4 , Loss:  [[11245.17049551]] , Val Loss:  [[5621.96780111]]

Epoch:  5 , Loss:  [[1264.5157739]] , Val Loss:  [[632.02563908]]

Epoch:  6 , Loss:  [[20.15654115]] , Val Loss:  [[10.05477285]]

Epoch:  7 , Loss:  [[17.13622839]] , Val Loss:  [[8.55190426]]

Epoch:  8 , Loss:  [[17.38870495]] , Val Loss:  [[8.68196484]]

Epoch:  9 , Loss:  [[17.181681]] , Val Loss:  [[8.57837827]]

Epoch:  10 , Loss:  [[17.31275313]] , Val Loss:  [[8.64199652]]

Epoch:  11 , Loss:  [[17.12960034]] , Val Loss:  [[8.54768294]]

Epoch:  12 , Loss:  [[17.09020065]] , Val Loss:  [[8.52993502]]

Epoch:  13 , Loss:  [[17.17370113]] , Val Loss:  [[8.57517454]]

Epoch:  14 , Loss:  [[17.04906914]] , Val Loss:  [[8.50658127]]

Epoch:  15 , Loss:  [[16.96420184]] , Val Loss:  [[8.46794248]]

Epoch:  16 , Loss:  [[17.017519]] , Val Loss:  [[8.49241316]]

Epoch:  17 , Loss:  [[16.94199493]] , Val Loss:  [[8.45748739]]

Epoch:  18 , Loss:  [[16.99796892]] , Val Loss:  [[8.48242177]]

Epoch:  19 , Loss:  [[17.24817035]] , Val Loss:  [[8.6126231]]

Epoch:  20 , Loss:  [[17.00844599]] , Val Loss:  [[8.48682234]]

Epoch:  21 , Loss:  [[17.03943262]] , Val Loss:  [[8.50437328]]

Epoch:  22 , Loss:  [[17.01417255]] , Val Loss:  [[8.49409597]]

Epoch:  23 , Loss:  [[17.20918888]] , Val Loss:  [[8.5854792]]

Epoch:  24 , Loss:  [[16.92068017]] , Val Loss:  [[8.44794633]]

Epoch:  25 , Loss:  [[16.76856238]] , Val Loss:  [[8.37295808]]

preds = []

for i in range(Y.shape[0]):

x, y = X[i], Y[i]

prev_s = np.zeros((hidden_dim, 1))

# Forward pass

for t in range(T):

mulu = np.dot(U, x)

mulw = np.dot(W, prev_s)

mulv = np.dot(V, s)

prev_s = s

preds.append(mulv)

preds = np.array(preds)

plt.plot(preds[:, 0, 0], 'g')

plt.plot(Y[:, 0], 'r')

plt.show()

preds = []

for i in range(Y_val.shape[0]):

x, y = X_val[i], Y_val[i]

prev_s = np.zeros((hidden_dim, 1))

# For each time step...

for t in range(T):

mulu = np.dot(U, x)

mulw = np.dot(W, prev_s)

mulv = np.dot(V, s)

prev_s = s

preds.append(mulv)

preds = np.array(preds)

plt.plot(preds[:, 0, 0], 'g')

plt.plot(Y_val[:, 0], 'r')

plt.show()

from sklearn.metrics import mean_squared_error

math.sqrt(mean_squared_error(Y_val[:, 0] * max_val, preds[:, 0, 0] * max_val))

0.127191931509431

#### 总结

Build a Recurrent Neural Network from Scratch in Python – An Essential Read for Data Scientists

THU数据派

THU数据派"基于清华，放眼世界"，以扎实的理工功底闯荡“数据江湖”。发布全球大数据资讯，定期组织线下活动，分享前沿产业动态。了解清华大数据，敬请关注姐妹号“数据派THU”。

（人工）神经网络是一种起源于 20 世纪 50 年代的监督式机器学习模型，那时候研究者构想了「感知器（perceptron）」的想法。这一领域的研究者通常被称为「联结主义者（Connectionist）」，因为这种模型模拟了人脑的功能。神经网络模型通常是通过反向传播算法应用梯度下降训练的。目前神经网络有两大主要类型，它们都是前馈神经网络：卷积神经网络（CNN）和循环神经网络（RNN），其中 RNN 又包含长短期记忆（LSTM）、门控循环单元（GRU）等等。深度学习是一种主要应用于神经网络帮助其取得更好结果的技术。尽管神经网络主要用于监督学习，但也有一些为无监督学习设计的变体，比如自动编码器和生成对抗网络（GAN）。