赵天琪作者

论文笔记(3)

<Recurrent neural network based language model>

1.Define language model: The link below introduces the definition of language model, which is P(S), we all know the probability theory:

Naive Bayes ignores the context, which means:

This is called unigram, similarly, if the context equals one, then it is called bigram, and if the context equals two, then it is called trigram. We also have n-gram.

This is a great article introducing language model: 漫谈 Language Model (1): 原理篇

2. This paper introduces RNNLM. We usually call it simple recurrent neural network or Elman network. x(t) is input layer, s(t) is hidden layer (or state), y(t) is output layer, w(t) means current word. They are all vectors. This plus notation in the first line means concatenating since they have different dimensions. f and g means sigmoid and softmax function. Obviously y(t) is a probability distribution of next word.

And the error function is easy:

desired(t) is a one-hot vector representing the ground truth.

3. For rare words, we have:

If w(t+1) is rare, it obeys uniform distribution, otherwise we refer to y_i(t)

关注机器学习,深度学习,自然语言处理,强化学习等人工智能新技术。

入门时间递归神经网络
1
相关数据
时间递归神经网络技术
Recurrent neural network

时间递归神经网络 (aka.循环神经网络, RNN) 是一类擅长处理序列数据的神经网络,其单元连接形成一个有向环。一般人工神经网络(ANN)由多层神经元组成,典型的连接方式是在前馈神经网络中,仅存在层与层之间的互相连接,而同层神经元之间没有连接。RNN在此基础上结合了隐藏层的循环连接,从而能从序列或时序数据中学习特征和长期依赖关系。RNN隐藏层的每一单独计算单元对应了数据中某个时间节点的状态,它可以是简单神经元、神经元层或各式的门控系统。 每一单元通过参数共享的层间顺序连接,并随着数据序列传播。这一特性使得RNN中每一单元的状态都取决于它的过去状态,从而具有类似“记忆”的功能,可以储存并处理长时期的数据信号。 大多数RNN能处理可变长度的序列,理论上也可以建模任何动态系统。

推荐文章