「強基固本,行穩致遠」,科學研究離不開理論基礎,人工智慧學科更是需要數學、物理、神經科學等基礎學科提供有力支撐,為了緊扣時代脈搏,我們推出「強基固本」專欄,講解AI領域的基礎知識,為你的科研學習提供助力,夯實理論基礎,提升原始創新能力,敬請關注。
地址:https://www.zhihu.com/people/liu-xin-chen-64
01
02
03
04
05
anchor_output = ... # shape [None, 128]positive_output = ... # shape [None, 128]negative_output = ... # shape [None, 128]
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)
loss = tf.maximum(0.0, margin + d_pos - d_neg)loss = tf.reduce_mean(loss)我們使用網絡提取anchor, positive, negative的embedding。anchor_output, positive_output, negative_output分別表示 anchor embeddings, positive embeddings, negative embeddings以上這種方式雖然簡單,但卻非常低效,因為以上採用的是offline triplet mining接下來我們來嘗試下online triplet mining版本的triplet loss:triplet loss需要計算 和 ,因此我們需要高效地計算pairwise distance matrix對於輸入的 embeddings,我們期待得到 的距離矩陣。距離計算公式:參數squared為True表示計算的是距離的平方,為False表示計算的是歐式距離def _pairwise_distance(embeddings, squared=False): ''' 計算兩兩embedding的距離 -- Args: embedding: 特徵向量, 大小(batch_size, vector_size) squared: 是否距離的平方,即歐式距離 Returns: distances: 兩兩embeddings的距離矩陣,大小 (batch_size, batch_size) ''' # 矩陣相乘,得到(batch_size, batch_size),因為計算歐式距離|a-b|^2 = a^2 -2ab + b^2, # 其中 ab 可以用矩陣乘表示 dot_product = tf.matmul(embeddings, tf.transpose(embeddings)) # dot_product對角線部分就是 每個embedding的平方 square_norm = tf.diag_part(dot_product) # |a-b|^2 = a^2 - 2ab + b^2 # tf.expand_dims(square_norm, axis=1)是(batch_size, 1)大小的矩陣,減去 (batch_size, batch_size)大小的矩陣,相當於每一列操作 distances = tf.expand_dims(square_norm, axis=1) - 2.0 * dot_product + tf.expand_dims(square_norm, axis=0) distances = tf.maximum(distances, 0.0) # 小於0的距離置為0 if not squared: # 如果不平方,就開根號,但是注意有0元素,所以0的位置加上 1e*-16 distances = distances + mask * 1e-16 distances = tf.sqrt(distances) distances = distances * (1.0 - mask) # 0的部分仍然置為0
return distancesOnline triplet mining strategy 1: batch all strategy 的label相同, 的label不同此時代碼中所謂的mask,就是一個3D tensor,就是將mask在合規triplet index位置置為1,其它位置置為0.def _get_triplet_mask(labels): ''' 得到一個3D的mask [a, p, n], 對應triplet(a, p, n)是valid的位置是True ---- Args: labels: 對應訓練數據的labels, shape = (batch_size,)
Returns: mask: 3D,shape = (batch_size, batch_size, batch_size) ''' # 初始化一個二維矩陣,坐標(i, j)不相等置為1,得到indices_not_equal indices_equal = tf.cast(tf.eye(tf.shape(labels)[0]), tf.bool) indices_not_equal = tf.logical_not(indices_equal) # 因為最後得到一個3D的mask矩陣(i, j, k),增加一個維度,則 i_not_equal_j 在第三個維度增加一個即,(batch_size, batch_size, 1), 其他同理 i_not_equal_j = tf.expand_dims(indices_not_equal, 2) i_not_equal_k = tf.expand_dims(indices_not_equal, 1) j_not_equal_k = tf.expand_dims(indices_not_equal, 0) # 想得到i!=j!=k, 三個不等取and即可, 最後可以得到當下標(i, j, k)不相等時才取True distinct_indices = tf.logical_and(tf.logical_and(i_not_equal_j, i_not_equal_k), j_not_equal_k) # 同樣根據labels得到對應i=j, i!=k label_equal = tf.equal(tf.expand_dims(labels, 0), tf.expand_dims(labels, 1)) i_equal_j = tf.expand_dims(label_equal, 2) i_equal_k = tf.expand_dims(label_equal, 1) valid_labels = tf.logical_and(i_equal_j, tf.logical_not(i_equal_k)) # mask即為滿足上面兩個約束,所以兩個3D取and mask = tf.logical_and(distinct_indices, valid_labels)
return mask我們需要一個形狀為 的3D tensor,其中 表示triplet 的loss我們通過_get_triplet_mask來獲得合規triplets的index mask,從而獲得合規的triplets統計合規triplets中loss不為0.0的triplets,最後計算平均得到batch_all_triplet_loss.def batch_all_triplet_loss(labels, embeddings, margin, squared=False): ''' triplet loss of a batch - Args: labels: 標籤數據,shape = (batch_size,) embeddings: 提取的特徵向量, shape = (batch_size, vector_size) margin: margin大小, scalar
Returns: triplet_loss: scalar, 一個batch的損失值 fraction_postive_triplets : valid的triplets佔的比例 ''' # 得到每兩兩embeddings的距離,然後增加一個維度,一維需要得到(batch_size, batch_size, batch_size)大小的3D矩陣 # 然後再點乘上valid 的 mask即可 pairwise_dis = _pairwise_distance(embeddings, squared=squared) anchor_positive_dist = tf.expand_dims(pairwise_dis, 2) assert anchor_positive_dist.shape[2] == 1, "{}".format(anchor_positive_dist.shape) anchor_negative_dist = tf.expand_dims(pairwise_dis, 1) assert anchor_negative_dist.shape[1] == 1, "{}".format(anchor_negative_dist.shape) triplet_loss = anchor_positive_dist - anchor_negative_dist + margin mask = _get_triplet_mask(labels) mask = tf.to_float(mask) triplet_loss = tf.multiply(mask, triplet_loss) triplet_loss = tf.maximum(triplet_loss, 0.0) # 計算valid的triplet的個數,然後對所有的triplet loss求平均 valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16)) num_positive_triplets = tf.reduce_sum(valid_triplets) num_valid_triplets = tf.reduce_sum(mask) fraction_postive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16) triplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)
return triplet_loss, fraction_postive_tripletsOnline triplet mining strategy 2: batch hard strategy在batch hard strategy中,我們期待為每個anchor找到hardest positive和hardest negative.step 1. 構建embedding pairwise距離矩陣step 2. 計算合規pair的2D mask(合規要求: 且 和 具有相同的label). 距離矩陣在mask之外的位置均置為0step 3. 此時取距離矩陣中每一行的最大值,其所對應的triplet,就是該行對應anchor的hardest positive.提取方法與上面hardest positive類似,不再贅述。triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)所以batch hard策略計算triplet loss的代碼實現如下所示:def batch_hard_triplet_loss(labels, embeddings, margin, squared=False): """Build the triplet loss over a batch of embeddings. For each anchor, we get the hardest positive and hardest negative to form a triplet. Args: labels: labels of the batch, of size (batch_size,) embeddings: tensor of shape (batch_size, embed_dim) margin: margin for triplet loss squared: Boolean. If true, output is the pairwise squared euclidean distance matrix. If false, output is the pairwise euclidean distance matrix. Returns: triplet_loss: scalar tensor containing the triplet loss """ pairwise_dist = _pairwise_distances(embeddings, squared=squared)
mask_anchor_positive = _get_anchor_positive_triplet_mask(labels) mask_anchor_positive = tf.to_float(mask_anchor_positive)
anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)
hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)
mask_anchor_negative = _get_anchor_negative_triplet_mask(labels) mask_anchor_negative = tf.to_float(mask_anchor_negative)
max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True) anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)
hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)
triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)
triplet_loss = tf.reduce_mean(triplet_loss)
return triplet_loss本文目的在於學術交流,並不代表本公眾號贊同其觀點或對其內容真實性負責,版權歸原作者所有,如有侵權請告知刪除。
分享、點讚、在看,給個三連擊唄!