潮科技行業入門指南 | 深度學習理論與實戰:提高篇(14)——Mask R...

2020-12-06 36氪

編者按：本文節選自《深度學習理論與實戰：提高篇》一書，原文連結http://fancyerii.github.io/2019/03/14/dl-book/。作者李理，環信人工智慧研發中心vp，有十多年自然語言處理和人工智慧研發經驗，主持研發過多款智能硬體的問答和對話系統，負責環信中文語義分析開放平臺和環信智慧機器人的設計與研發。

以下為正文。

安裝demo.ipynb運行關鍵代碼train_shapes.ipynb配置Dataset創建模型訓練檢測測試inspect_data.ipynb選擇數據集加載Dataset顯示樣本Bounding BoxMini MasksAnchor訓練數據生成器

Facebook(Mask R-CNN的作者He Kaiming等人目前在Facebook)的實現在這裡。但是這是用Caffe2實現的，本書沒有介紹這個框架，因此我們介紹Tensorflow和Keras的版本實現的版本。但是建議有興趣的讀者也可以嘗試一下Facebook提供的代碼。

安裝

git clone https://github.com/matterport/Mask_RCNN.git# 或者使用作者fork的版本git clone https://github.com/fancyerii/Mask_RCNN.git#建議創建一個virtualenvpip install -r requirements.txt# 還需要安裝pycocotools# 否則會出現ImportError: No module named 'pycocotools'# 參考 https://github.com/matterport/Mask_RCNN/issues/6pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"

demo.ipynb

1、運行

jupyter notebook打開文件samples/demo.ipynb，運行所有的Cell

2、關鍵代碼

這裡是使用預訓練的模型，會自動上網下載，所以第一次運行會比較慢。這是下載模型參數的代碼：

COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")# Download COCO trained weights from Releases if neededif not os.path.exists(COCO_MODEL_PATH):utils.download_trained_weights(COCO_MODEL_PATH)

創建模型和加載參數：

# 創建MaskRCNN對象，模式是inferencemodel = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)# 加載模型參數 model.load_weights(COCO_MODEL_PATH, by_name=True)

讀取圖片並且進行分割：

# 隨機加載一張圖片 file_names = next(os.walk(IMAGE_DIR))[2]image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))# 進行目標檢測和分割results = model.detect([image], verbose=1)# 顯示結果r = results[0] visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])

檢測結果r包括rois(RoI)、masks(對應RoI的每個像素是否屬於目標物體)、scores(得分)和class_ids(類別)。

下圖是運行的效果，我們可以看到它檢測出來4個目標物體，並且精確到像素級的分割處理物體和背景。

圖：Mask RCNN檢測效果

train_shapes.ipynb

除了可以使用訓練好的模型，我們也可以用自己的數據進行訓練，為了演示，這裡使用了一個很小的shape數據集。這個數據集是on-the-fly的用代碼生成的一些三角形、正方形、圓形，因此不需要下載數據。

1、配置

代碼提供了基礎的類Config，我們只需要繼承並稍作修改：

class ShapesConfig(Config):"""用於訓練shape數據集的配置繼承子基本的Config類，然後override了一些配置項。 """ # 起個好記的名字 NAME = "shapes" # 使用一個GPU訓練，每個GPU上8個圖片。因此batch大小是8 (GPUs * images/GPU). GPU_COUNT = 1 IMAGES_PER_GPU = 8 # 分類數(需要包括背景類) NUM_CLASSES = 1 + 3 # background + 3 shapes # 圖片為固定的128x128 IMAGE_MIN_DIM = 128 IMAGE_MAX_DIM = 128 # 因為圖片比較小，所以RPN anchor也是比較小的 RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels # 每張圖片建議的RoI數量，對於這個小圖片的例子可以取比較小的值。 TRAIN_ROIS_PER_IMAGE = 32 # 每個epoch的數據量 STEPS_PER_EPOCH = 100 # 每5步驗證一下。 VALIDATION_STEPS = 5config = ShapesConfig()config.display()

2、Dataset

對於我們自己的數據集，我們需要繼承utils.Dataset類，並且重寫如下方法：

load_imageload_maskimage_reference在重寫這3個方法之前我們首先來看load_shapes，這個函數on-the-fly的生成數據。

class ShapesDataset(utils.Dataset):"""隨機生成shape數據。包括三角形，正方形和圓形，以及它的位置。這是on-th-fly的生成數據，因此不需要訪問文件。 """ def load_shapes(self, count, height, width): """生成圖片 count: 返回的圖片數量 height, width: 生成圖片的height和width """ # 類別 self.add_class("shapes", 1, "square") self.add_class("shapes", 2, "circle") self.add_class("shapes", 3, "triangle") # 注意：這裡只是生成圖片的specifications(說明書)， # 具體包括性質、顏色、大小和位置等信息。 # 真正的圖片是在load_image()函數裡根據這些specifications # 來on-th-fly的生成。 for i in range(count): bg_color, shapes = self.random_image(height, width) self.add_image("shapes", image_id=i, path=None, width=width, height=height, bg_color=bg_color, shapes=shapes)

其中add_image是在基類中定義：

def add_image(self, source, image_id, path, **kwargs):image_info = { "id": image_id, "source": source, "path": path, } image_info.update(kwargs) self.image_info.append(image_info)

它有3個命名參數source、image_id和path。source是標識圖片的來源，我們這裡都是固定的字符串」shapes」；image_id是圖片的id，我們這裡用生成的序號i，而path一般標識圖片的路徑，我們這裡是None。其餘的參數就原封不動的保存下來。

random_image函數隨機的生成圖片的位置，請讀者仔細閱讀代碼注釋。

def random_image(self, height, width):"""隨機的生成一個specifications 它包括圖片的背景演示和一些(最多4個)不同的shape的specifications。 """ # 隨機選擇背景顏色 bg_color = np.array([random.randint(0, 255) for _ in range(3)]) # 隨機生成一些(最多4個)shape shapes = [] boxes = [] N = random.randint(1, 4) for _ in range(N): # random_shape函數隨機產生一個shape(比如圓形)，它的顏色和位置 shape, color, dims = self.random_shape(height, width) shapes.append((shape, color, dims)) # 位置是中心點和大小(正方形，圓形和等邊三角形只需要一個值表示大小) x, y, s = dims # 根據中心點和大小計算bounding box boxes.append([y-s, x-s, y+s, x+s]) # 使用non-max suppression去掉重疊很嚴重的圖片 keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), 0.3) shapes = [s for i, s in enumerate(shapes) if i in keep_ixs] return bg_color, shapes

隨機生成一個shape的函數是random_shape：

def random_shape(self, height, width):"""隨機生成一個shape的specifications，要求這個shape在height和width的範圍內。返回一個3-tuple: * shape名字 (square, circle, ...) * shape的顏色：代表RGB的3-tuple * shape的大小，一個數值 """ # 隨機選擇shape的名字 shape = random.choice(["square", "circle", "triangle"]) # 隨機選擇顏色 color = tuple([random.randint(0, 255) for _ in range(3)]) # 隨機選擇中心點位置，在範圍[buffer, height/widht - buffer -1]內隨機選擇 buffer = 20 y = random.randint(buffer, height - buffer - 1) x = random.randint(buffer, width - buffer - 1) # 隨機的大小size s = random.randint(buffer, height//4) return shape, color, (x, y, s)

上面的函數是我們為了生成(或者讀取磁碟的圖片)而寫的代碼。接下來我們需要重寫上面的三個函數，我們首先來看load_image：

def load_image(self, image_id):"""根據specs生成實際的圖片如果是實際的數據集，通常是從一個文件讀取。 """ info = self.image_info[image_id] bg_color = np.array(info['bg_color']).reshape([1, 1, 3]) # 首先填充背景色 image = np.ones([info['height'], info['width'], 3], dtype=np.uint8) image = image * bg_color.astype(np.uint8) # 分別繪製每一個shape for shape, color, dims in info['shapes']: image = self.draw_shape(image, shape, dims, color) return image

上面的函數會調用draw_shape來繪製一個shape：

def draw_shape(self, image, shape, dims, color):"""根據specs繪製shape""" # 獲取中心點x, y和size s x, y, s = dims if shape == 'square': cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -1) elif shape == "circle": cv2.circle(image, (x, y), s, color, -1) elif shape == "triangle": points = np.array([[(x, y-s), (x-s/math.sin(math.radians(60)), y+s), (x+s/math.sin(math.radians(60)), y+s), ]], dtype=np.int32) cv2.fillPoly(image, points, color) return image

這個函數很直白，使用opencv的函數在image上繪圖，正方形和圓形都很簡單，就是等邊三角形根據中心點和size(中心點到頂點的距離)求3個頂點的坐標需要一些平面幾何的知識。

接下來是load_mask函數，這個函數需要返回圖片中的目標物體的mask。這裡需要稍作說明。通常的實例分隔數據集同時提供Bounding box和Mask(Bounding的某個像素是否屬於目標物體)。為了更加通用，這裡假設我們值提供Mask(也就是物體包含的像素)，而Bounding box就是包含這些Mask的最小的長方形框，因此不需要提供。

對於我們隨機生成的性質，只要知道哪種shape以及中心點和size，我們可以計算出這個物體(shape)到底包含哪些像素。對於真實的數據集，這通常是人工標註出來的。

def load_mask(self, image_id):"""生成給定圖片的mask """ info = self.image_info[image_id] shapes = info['shapes'] count = len(shapes) # 每個物體都有一個mask矩陣，大小是height x width mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8) for i, (shape, _, dims) in enumerate(info['shapes']): # 繪圖函數draw_shape已經把mask繪製出來了。我們只需要傳入特殊顏色值1。 mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(), shape, dims, 1) # 處理遮擋(occlusions) occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8) for i in range(count-2, -1, -1): mask[:, :, i] = mask[:, :, i] * occlusion occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i])) # 類名到id class_ids = np.array([self.class_names.index(s[0]) for s in shapes]) return mask.astype(np.bool), class_ids.astype(np.int32)

處理遮擋的代碼可能有些tricky，不過這都不重要，因為通常的訓練數據都是人工標註的，我們只需要從文件讀取就行。這裡我們值需要知道返回值的shape和含義就足夠了。最後是image_reference函數，它的輸入是image_id，輸出是正確的分類。

def image_reference(self, image_id): info = self.image_info[image_id] if info["source"] == "shapes": return info["shapes"] else: super(self.__class__).image_reference(self, image_id)

上面的代碼還判斷了一些info[「source」]，如果是」shapes」，說明是我們生成的圖片，直接返回shape的名字，否則調用基類的image_reference。下面我們來生成一些圖片看看。

# 訓練集500個圖片dataset_train = ShapesDataset()dataset_train.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])dataset_train.prepare()# 驗證集50個圖片dataset_val = ShapesDataset()dataset_val.load_shapes(50, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])dataset_val.prepare()image_ids = np.random.choice(dataset_train.image_ids, 4)for image_id in image_ids:image = dataset_train.load_image(image_id) mask, class_ids = dataset_train.load_mask(image_id) visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

隨機生成的圖片如下圖所示，注意，因為每次都是隨機生成，因此讀者得到的結果可能是不同的。左圖是生成的圖片，右邊是mask。

圖：隨機生成的Shape圖片

3、創建模型

model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)

因為我們的訓練數據不多，因此使用預訓練的模型進行Transfer Learning會效果更好。

# 默認使用coco模型來初始化init_with = "coco" # imagenet, coco, or lastif init_with == "imagenet":model.load_weights(model.get_imagenet_weights(), by_name=True)elif init_with == "coco": # 加載COCO模型的參數，去掉全連接層(mrcnn_bbox_fc)， # logits(mrcnn_class_logits) # 輸出的boudning box(mrcnn_bbox)和Mask(mrcnn_mask) model.load_weights(COCO_MODEL_PATH, by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])elif init_with == "last": # 加載我們最近訓練的模型來初始化 model.load_weights(model.find_last(), by_name=True)

4、訓練

訓練分為兩個階段：

heads 只訓練上面沒有初始化的4層網絡的參數，適合訓練數據較少(比如本例子)的情況all 訓練所有的參數我們這裡值訓練heads就夠了。

model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=1, layers='heads')

保存模型參數：

# 手動保存參數，這通常是不需要的，# 因為每次epoch介紹會自動保存，所以這裡是注釋掉的。# model_path = os.path.join(MODEL_DIR, "mask_rcnn_shapes.h5")# model.keras_model.save_weights(model_path)

5、檢測

我們首先需要構造預測的Config並且加載模型參數。

class InferenceConfig(ShapesConfig):GPU_COUNT = 1 IMAGES_PER_GPU = 1inference_config = InferenceConfig()# 重新構建用於inference的模型 model = modellib.MaskRCNN(mode="inference", config=inference_config, model_dir=MODEL_DIR)# 加載模型參數，可以手動指定也可以讓它自己找最近的模型參數文件 # model_path = os.path.join(ROOT_DIR, ".h5 file name here")model_path = model.find_last()# 加載模型參數 print("Loading weights from ", model_path)model.load_weights(model_path, by_name=True)

我們隨機尋找一個圖片來檢測：

# 隨機選擇驗證集的一張圖片。image_id = random.choice(dataset_val.image_ids)original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\modellib.load_image_gt(dataset_val, inference_config, image_id, use_mini_mask=False)log("original_image", original_image)log("image_meta", image_meta)log("gt_class_id", gt_class_id)log("gt_bbox", gt_bbox)log("gt_mask", gt_mask)visualize.display_instances(original_image, gt_bbox, gt_mask, gt_class_id, dataset_train.class_names, figsize=(8, 8))

上面的代碼加載一張圖片，結果如下圖所示，它顯示的是真正的(gold/ground-truth) Bounding box和Mask。

圖：隨機挑選的測試圖片

接下來我們用模型來預測一下：

results = model.detect([original_image], verbose=1)r = results[0]visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], dataset_val.class_names, r['scores'], ax=get_ax())

模型預測的結果如下圖所示，可以對比看成模型預測的非常準確。

圖：模型預測的結果

6、測試

前面我們只是測試了一個例子，我們需要更加全面的評測。

image_ids = np.random.choice(dataset_val.image_ids, 10)APs = []for image_id in image_ids:# 加載圖片和正確的Bounding box以及mask image, image_meta, gt_class_id, gt_bbox, gt_mask =\ modellib.load_image_gt(dataset_val, inference_config, image_id, use_mini_mask=False) molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0) # 進行檢測 results = model.detect([image], verbose=0) r = results[0] # 計算AP AP, precisions, recalls, overlaps =\ utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks']) APs.append(AP)print("mAP: ", np.mean(APs))# 輸出0.95

inspect_data.ipynb

這個notebook演示了Mask R-CNN的數據預處理過程。這個notebook可以用COCO數據集或者我們之前介紹的shape數據集進行演示，為了避免下載大量的COCO數據集，我們這裡用shape數據集。

1、選擇數據集

config = ShapesConfig()# 我們把下面的代碼注釋掉# MS COCO Dataset#import coco#config = coco.CocoConfig()#COCO_DIR = "path to COCO dataset" # TODO: enter value here

2、加載Dataset

# Load datasetif config.NAME == 'shapes':dataset = ShapesDataset() dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])elif config.NAME == "coco": dataset = coco.CocoDataset() dataset.load_coco(COCO_DIR, "train")# 使用dataset之前必須調用prepare() dataset.prepare()print("Image Count: {}".format(len(dataset.image_ids)))print("Class Count: {}".format(dataset.num_classes))for i, info in enumerate(dataset.class_info): print("{:3}. {:50}".format(i, info['name']))# 運行後的結果為：Image Count: 500Class Count: 40. BG1. square2. circle3. triangle

3、顯示樣本

我們可以顯示一些樣本。

image_ids = np.random.choice(dataset.image_ids, 4)for image_id in image_ids:image = dataset.load_image(image_id) mask, class_ids = dataset.load_mask(image_id) visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

結果如下圖所示。

圖：Mask 顯示4個樣本

4、Bounding Box

一般的數據集同時提供Bounding box和Mask，但是為了簡單，我們只需要數據集提供Mask，我們可以通過Mask計算出Bounding box來。這樣還有一個好處，那就是如果我們對目標物體進行旋轉縮放等操作，計算Mask會比較容易，我們可以用新的Mask重新計算新的Bounding Box。否則我們就得對Bounding box進行相應的旋轉縮放，這通常比較麻煩。

# 隨機加載一個圖片和它對應的mask.image_id = random.choice(dataset.image_ids)image = dataset.load_image(image_id)mask, class_ids = dataset.load_mask(image_id)# 計算Bounding boxbbox = utils.extract_bboxes(mask)# 顯示圖片其它的統計信息 print("image_id ", image_id, dataset.image_reference(image_id))log("image", image)log("mask", mask)log("class_ids", class_ids)log("bbox", bbox)# 顯示圖片 visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

最重要的代碼就是bbox = utils.extract_bboxes(mask)。最終得到的圖片如下圖所示。

圖：顯示樣本

\subsubsection{縮放圖片} 我們需要把圖片都縮放成1024x1024(shape數據是生成的，都是固定大小，但實際數據集肯定不是這樣)。我們會保持寬高比比最大的縮放成1024，比如原來是512x256，那麼就會縮放成1024x512。然後我們把不足的維度兩邊補零，比如把1024x512padding成1024x1024，height維度上下各補256個0(256個0+512個真實數據+256個0)。

# 隨機加載一個圖片和它的maskimage_id = np.random.choice(dataset.image_ids, 1)[0]image = dataset.load_image(image_id)mask, class_ids = dataset.load_mask(image_id)original_shape = image.shape# 縮放圖片，image, window, scale, padding, _ = utils.resize_image(image, min_dim=config.IMAGE_MIN_DIM, max_dim=config.IMAGE_MAX_DIM, mode=config.IMAGE_RESIZE_MODE)# 縮放圖片後一定要縮放mask，否則就不一致了 mask = utils.resize_mask(mask, scale, padding)# 計算Bounding boxbbox = utils.extract_bboxes(mask)# 顯示圖片的其它統計信息print("image_id: ", image_id, dataset.image_reference(image_id))print("Original shape: ", original_shape)log("image", image)log("mask", mask)log("class_ids", class_ids)log("bbox", bbox)# 顯示圖片visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

5、Mini Masks

一個圖片可能有多個目標物體，每個物體的Mask是一個bool數組，大小是[width, height]。很顯然，Bounding box之外的Mask肯定都是False，如果物體的比較小的話，這麼存儲是比較浪費空間的。因此我們有如下改進方法：

我們只存儲Bounding Box裡的坐標對應的Mask值我們把Mask縮小(比如56x56)，用的時候在放大回去，這對大的目標物體會有誤差。但是由於我們的(人工)標註本來就沒那麼準。為了可視化Mask縮放，我們來看幾個例子。

image_id = np.random.choice(dataset.image_ids, 1)[0]image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)log("image", image)log("image_meta", image_meta)log("class_ids", class_ids)log("bbox", bbox)log("mask", mask)display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])# 輸出image shape: (128, 128, 3) min: 4.00000 max: 241.00000 uint8image_meta shape: (16,) min: 0.00000 max: 409.00000 int64class_ids shape: (2,) min: 1.00000 max: 3.00000 int32bbox shape: (2, 4) min: 14.00000 max: 128.00000 int32mask shape: (128, 128, 2) min: 0.00000 max: 1.00000 bool如下圖所示，這個圖片有一個正方形和一個三角形。

圖：顯示樣本

接下來我們對圖片進行增強，比如鏡像。

image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(dataset, config, image_id, augment=True, use_mini_mask=True)log("mask", mask)display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])

上面調用函數modellib.load_image_gt，參數use_mini_mask設置為True。效果如下圖所示。首先做了鏡像對稱變化，另外我們可以看到mask的shape從(128, 128, 2)變成了(56, 56, 2)，而且mask都是Bounding Box裡的mask。

圖：mini mask和增強

6、Anchor

anchor的順序非常重要，訓練和預測要使用相同的anchor序列。另外也要匹配卷積的運算順序。對於一個FPN，anchor的順序要便於卷積層的輸出預測anchor的得分和位移(shift)。因此通常使用如下順序：

首先安裝金字塔的層級排序，首先是第一層，然後是第二層對於同一層，安裝卷積的順序從左上到右下逐行排序對於同一個點，按照寬高比(aspect ratio)排序Anchor Stride：在FPN網絡結構下，前幾層的feature map是高解析度的。比如輸入圖片是1024x1024，則第一層的feature map是256x256，這將產生大約200k個anchor(2562563)，這些anchor是32x32的，而它們的stride是4個像素，因此會有大量重疊的anchor。如果我們每隔一個cell(而不是每個cell)生成一次anchor，這將極大降低計算量。這裡使用的stride是2，這和論文使用的1不同。生成anchor的代碼如下：

# Generate Anchorsbackbone_shapes = modellib.compute_backbone_shapes(config, config.IMAGE_SHAPE)anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, backbone_shapes, config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)# 輸出anchor的摘要信息num_levels = len(backbone_shapes)anchors_per_cell = len(config.RPN_ANCHOR_RATIOS)print("Count: ", anchors.shape[0])print("Scales: ", config.RPN_ANCHOR_SCALES)print("ratios: ", config.RPN_ANCHOR_RATIOS)print("Anchors per Cell: ", anchors_per_cell)print("Levels: ", num_levels)anchors_per_level = []for l in range(num_levels): num_cells = backbone_shapes[l][0] * backbone_shapes[l][1] anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE**2) print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))

輸出的統計信息是：

Count: 4092Scales: (8, 16, 32, 64, 128)ratios: [0.5, 1, 2]Anchors per Cell: 3Levels: 5Anchors in Level 0: 3072Anchors in Level 1: 768Anchors in Level 2: 192Anchors in Level 3: 48Anchors in Level 4: 12

我們來分析一下，總共有5種scales。對於第0層，Feature map是32x32，每個cell有3種寬高比，因此總共有3072個anchor；而第一層的Feature map是16x16，所以有768個anchor。我們來看每一層的feature map中心cell的anchor。

## Visualize anchors of one cell at the center of the feature map of a specific level# Load and draw random imageimage_id = np.random.choice(dataset.image_ids, 1)[0]image, image_meta, _, _, _ = modellib.load_image_gt(dataset, config, image_id)fig, ax = plt.subplots(1, figsize=(10, 10))ax.imshow(image)levels = len(backbone_shapes)for level in range(levels):colors = visualize.random_colors(levels) # Compute the index of the anchors at the center of the image level_start = sum(anchors_per_level[:level]) # sum of anchors of previous levels level_anchors = anchors[level_start:level_start+anchors_per_level[level]] print("Level {}. Anchors: {:6} Feature map Shape: {}".format(level, level_anchors.shape[0], backbone_shapes[level])) center_cell = backbone_shapes[level] // 2 center_cell_index = (center_cell[0] * backbone_shapes[level][1] + center_cell[1]) level_center = center_cell_index * anchors_per_cell center_anchor = anchors_per_cell * ( (center_cell[0] * backbone_shapes[level][1] / config.RPN_ANCHOR_STRIDE**2) \ + center_cell[1] / config.RPN_ANCHOR_STRIDE) level_center = int(center_anchor) # Draw anchors. Brightness show the order in the array, dark to bright. for i, rect in enumerate(level_anchors[level_center:level_center+anchors_per_cell]): y1, x1, y2, x2 = rect p = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, facecolor='none', edgecolor=(i+1)*np.array(colors[level]) / anchors_per_cell) ax.add_patch(p)

結果如下圖所示。

圖：Anchor

7、訓練數據生成器

我們在訓練Mask R-CNN的時候，會計算候選的區域和真實的目標區域的IoU，從而選擇正例和負例。

random_rois = 2000g = modellib.data_generator(dataset, config, shuffle=True, random_rois=random_rois, batch_size=4, detection_targets=True)# Get Next Imageif random_rois: [normalized_images, image_meta, rpn_match, rpn_bbox, gt_class_ids, gt_boxes, gt_masks, rpn_rois, rois], [mrcnn_class_ids, mrcnn_bbox, mrcnn_mask] = next(g) log("rois", rois) log("mrcnn_class_ids", mrcnn_class_ids) log("mrcnn_bbox", mrcnn_bbox) log("mrcnn_mask", mrcnn_mask)else: [normalized_images, image_meta, rpn_match, rpn_bbox, gt_boxes, gt_masks], _ = next(g)log("gt_class_ids", gt_class_ids)log("gt_boxes", gt_boxes)log("gt_masks", gt_masks)log("rpn_match", rpn_match, )log("rpn_bbox", rpn_bbox)image_id = modellib.parse_image_meta(image_meta)["image_id"][0]print("image_id: ", image_id, dataset.image_reference(image_id))# Remove the last dim in mrcnn_class_ids. It's only added# to satisfy Keras restriction on target shape.mrcnn_class_ids = mrcnn_class_ids[:,:,0] b = 0# Restore original image (reverse normalization)sample_image = modellib.unmold_image(normalized_images[b], config)# Compute anchor shifts.indices = np.where(rpn_match[b] == 1)[0]refined_anchors = utils.apply_box_deltas(anchors[indices], rpn_bbox[b, :len(indices)] * config.RPN_BBOX_STD_DEV)log("anchors", anchors)log("refined_anchors", refined_anchors)# Get list of positive anchorspositive_anchor_ids = np.where(rpn_match[b] == 1)[0]print("Positive anchors: {}".format(len(positive_anchor_ids)))negative_anchor_ids = np.where(rpn_match[b] == -1)[0]print("Negative anchors: {}".format(len(negative_anchor_ids)))neutral_anchor_ids = np.where(rpn_match[b] == 0)[0]print("Neutral anchors: {}".format(len(neutral_anchor_ids)))# ROI breakdown by classfor c, n in zip(dataset.class_names, np.bincount(mrcnn_class_ids[b].flatten())): if n: print("{:23}: {}".format(c[:20], n))# Show positive anchorsvisualize.draw_boxes(sample_image, boxes=anchors[positive_anchor_ids], refined_boxes=refined_anchors)

輸出為：

anchors shape: (4092, 4) min: -90.50967 max: 154.50967 float64refined_anchors shape: (3, 4) min: 6.00000 max: 128.00000 float32Positive anchors: 3Negative anchors: 253Neutral anchors: 3836BG : 22square : 1circle : 9

對於隨機的一個圖片，這裡生成了4092個anchor，其中3個正樣本，253個負樣本，其餘的都是無用的樣本。下圖是3個正樣本；下圖是負樣本；而下圖是無用的數據。

圖：正樣本anchor

圖：負樣本anchor

圖：無用的anchor

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(14)——Mask R...

相關焦點

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(13)——Faster...

.| 深度學習理論與實戰:提高篇(14)——Mask R-CNN代碼簡介

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(3)——基於HMM...

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(4)——基於HMM...

.| 深度學習理論與實戰:提高篇(5)——深度學習在語音識別中的應用

推薦算法系統/人臉識別/深度學習對話機器人高級實戰課

在多目標識別方面,maskr-cnn已經取得了一些進展

經典目標檢測方法Faster R-CNN和Mask R-CNN|基於PaddlePaddle深度...

實踐入門NLP:基於深度學習的自然語言處理

最熱門的深度學習框架TensorFlow入門必備書籍

FAIR最新視覺論文集錦:FPN,RetinaNet,Mask 和 Mask-X RCNN(含代碼...

【PPT下載】深度學習入門指南!六步構建深度神經網絡

關於AI學習方法的思考——產品經理入門人工智慧

「python opencv視覺零基礎實戰」七邏輯運算應用

數據分析入門學習指南,零基礎小白都能輕鬆看懂

潮科技行業入門指南 | 半導體測試設備行業研究分析報告

Mask R-CNN官方實現「又」來了!基於PyTorch,訓練速度是原來2倍

淺析深度學習在半導體行業的應用

PyTorch 深度學習官方入門中文教程 pdf 下載|PyTorchChina

潮科技入門指南 | 半導體測試設備行業研究分析報告

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(14)——Mask R...

相關焦點

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(13)——Faster...

.| 深度學習理論與實戰:提高篇(14)——​Mask R-CNN代碼簡介

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(3)——基於HMM...

潮科技行業入門指南 | 深度學習理論與實戰:提高篇(4)——基於HMM...

.| 深度學習理論與實戰:提高篇(5)——深度學習在語音識別中的應用

推薦算法系統/人臉識別/深度學習對話機器人高級實戰課

在多目標識別方面,maskr-cnn已經取得了一些進展

經典目標檢測方法Faster R-CNN和Mask R-CNN|基於PaddlePaddle深度...

實踐入門NLP:基於深度學習的自然語言處理

最熱門的深度學習框架TensorFlow入門必備書籍

FAIR最新視覺論文集錦:FPN,RetinaNet,Mask 和 Mask-X RCNN(含代碼...

【PPT下載】深度學習入門指南!六步構建深度神經網絡

關於AI學習方法的思考——產品經理入門人工智慧

「python opencv視覺零基礎實戰」七邏輯運算應用

數據分析入門學習指南,零基礎小白都能輕鬆看懂

潮科技行業入門指南 | 半導體測試設備行業研究分析報告

Mask R-CNN官方實現「又」來了!基於PyTorch,訓練速度是原來2倍

淺析深度學習在半導體行業的應用

PyTorch 深度學習官方入門中文教程 pdf 下載|PyTorchChina

潮科技入門指南 | 半導體測試設備行業研究分析報告

.| 深度學習理論與實戰:提高篇(14)——Mask R-CNN代碼簡介