最近開始琢磨了下關係抽取以及標籤識別上面,這個方向也可能是接下來的工作重心傾向,然後一臉懵逼去看了CRF等等,感覺還是有些暈,應該還欠點火候,可惜接下來5-12月應該都在準備考研了,手頭堆了一堆論文都沒辦法去看了,推免應該是遙遙無期了 = =
但是發現,在NER中,當前時刻標籤的標註是會基於上一時刻的標註結果,其實個人感覺就是循環神經網絡的機制可以完成,然而循環神經網絡輸出為隱藏單元的狀態,直接將這種狀態作為標籤應該是不夠準確的,所以大多論文中會在最後一層網絡中添加一層MLP什麼的,對隱藏單元序列進行解碼輸出所謂標籤,然而作為TensorFlow菜雞,一直就用人家封裝好的函數如下:
with tf.variable_scope('GRU-cell'):
gru_cell_fw = tf.nn.rnn_cell.GRUCell(self.HIDDEN_SIZE,
kernel_initializer=tf.orthogonal_initializer())
gru_cell_bw = tf.nn.rnn_cell.GRUCell(self.HIDDEN_SIZE,
kernel_initializer=tf.orthogonal_initializer())
gru_cell_fw = tf.contrib.rnn.DropoutWrapper(gru_cell_fw,
output_keep_prob=self.dropout_blstm_prob)
gru_cell_bw = tf.contrib.rnn.DropoutWrapper(gru_cell_bw,
output_keep_prob=self.dropout_blstm_prob)
with tf.variable_scope("BGRU",initializer=tf.orthogonal_initializer()):
(fw_outputs,bw_outputs),state = tf.nn.bidirectional_dynamic_rnn
(gru_cell_fw,gru_cell_bw,input,sequence_length=self.seq_len
,dtype="float32")
這樣雖然特別特別的方便,但是在一些任務上,個人感覺是不夠靈活的,所以就打算自己手寫下循環神經網絡,以及遞歸神經網絡(個人感覺遞歸在表達語義上面應該表現沒有線性鏈式網絡來得好,畢竟現在網絡用語的出現,無章可循)。
具體實現細節如下,【本文大量參考YJango大大,知乎超智能體專欄,有興趣可以去看看大佬的】
我的只是有些東西自己進行改動,加入了一些自己的理解注釋。
import tensorflow as tf
import config
import numpy as np
class GRU_Cell(object):
def __init__(self,incoming,reverse=False):
"""
:param incoming:輸入的數據
:param Wz:更新門輸入相應權重
:param Uz:前一隱藏門的相應權重值
:param W:候選激活門的輸入相應權重值
:param U:候選門重置門對應權重
:param Wr:重置門的相應權重
:param Ur:重置門的相應權重
"""
if reverse == False: #方便於設置雙向網絡,從不同方向來encode 語句
self.incoming = incoming
elif reverse == True:
self.incoming = tf.reverse(incoming,axis=[1])
else:
raise AttributeError("Without this attribute{}".format(reverse))
with tf.variable_scope("Gate_weight"):
self.Wz = self.orthogonal_initializer(
[config.embedding_size,config.hidden_size],"Wz")
self.W = self.orthogonal_initializer(
[config.embedding_size, config.hidden_size], "W")
self.Wr = self.orthogonal_initializer(
[config.embedding_size, config.hidden_size], "Wr")
self.Uz = self.orthogonal_initializer(
[config.hidden_size,config.hidden_size],"Uz")
self.U = self.orthogonal_initializer(
[config.hidden_size,config.hidden_size],"U")
self.Ur = self.orthogonal_initializer(
[config.hidden_size,config.hidden_size],"Ur")
self.init_state = tf.matmul(self.incoming[:, 0, :],
tf.zeros((config.embedding_size,config.hidden_size)))
self.incoming = tf.transpose(self.incoming, perm=[1,0,2])
def orthogonal_initializer(self,shape, name,
scale=1.0):
"""
:param shape: Variable size
:param name: Variable name
:param scale: scale=1.0
:return: tensor
"""
scale = 1.0
flat_shape = (shape[0], np.prod(shape[1:]))
a = np.random.normal(0.0, 1.0, flat_shape)
u, _, v = np.linalg.svd(a, full_matrices=False)
q = u if u.shape == flat_shape else v
q = q.reshape(shape) # this needs to be corrected to float32
return tf.Variable(scale * q[:shape[0], :shape[1]],
trainable=True, dtype=tf.float32, name=name)
def one_step(self,Privious_state,current_x):
with tf.variable_scope("update"):
"""
Zt = sigmoid(Wz*Xt+Uz*H(t-1))
"""
Zt = tf.sigmoid((tf.matmul(current_x, self.Wz))
+tf.matmul(Privious_state, self.Uz))
with tf.variable_scope("reset"):
"""
Rt = sigmoid(Wr*Xt+Ur*H(t-1))
"""
Rt = tf.sigmoid(tf.matmul(current_x, self.Wr)+
tf.matmul(Privious_state, self.Ur))
with tf.variable_scope("candidate"):
"""
Ch = tanh(W*Xt+U*(rt.*H(t-1))
"""
Ch = tf.tanh(tf.matmul(current_x, self.W)+
tf.matmul(tf.multiply(Rt, Privious_state),self.U))
with tf.variable_scope("cell"):
"""
Ht = (1-Zt)*H(t-1)+Zt*Ch
"""
Ht = tf.multiply((1-Zt), Privious_state)+
tf.multiply(Zt, Ch)
return Ht
def all_step(self):
all_hidden_state = tf.scan
(fn=self.one_step, elems=self.incoming, initializer=self.init_state)
return tf.transpose(all_hidden_state, perm=[1,0,2])
這邊解釋下tf.scan這個神器,其實這個跟python中的scan功能是差不多,Fn是你所需要執行的函數,one_step()函數第一參數為過去的值,而第二個參數為當前輸入值,而elems則是你的數據源,這個函數我認為才是整個核心所在,中間的對角初始化函數是直接copy別人寫好的
orthogonal_initializer
大概就隨筆寫到這邊,接下來時間應該都是準備考研了,當然也會準備下暑期的夏令營,若有老師或者師兄師姐願意分享經驗,感激不盡!
kris@stu.sicau.edu.cn
拙文一篇,望各位看官多多指出問題