你有沒有想過Snapchat(色拉布,是由史丹福大學兩位學生開發的一款「閱後即焚」照片分享應用)是如何根據你的臉來使用神奇的濾鏡的?它已經被編程來檢測你臉上的一些標記,並根據這些標記投射濾鏡。在機器學習中,這些標記被稱為人臉關鍵點。在本文中,我將指導您如何使用機器學習來檢測這些「面部坐標」。
首先,我將簡單地從導入這個任務所需的所有庫開始。在本文中,我將使用PyTorch使用深度學習進行地標檢測。讓我們導入所有庫:
import timeimport cv2import osimport randomimport numpy as npimport matplotlib.pyplot as pltfrom PIL import Imageimport imutilsimport matplotlib.image as mpimgfrom collections import OrderedDictfrom skimage import io, transformfrom math import *import xml.etree.ElementTree as ET
import torchimport torchvisionimport torch.nn as nnimport torch.optim as optimimport torch.nn.functional as Fimport torchvision.transforms.functional as TFfrom torchvision import datasets, models, transformsfrom torch.utils.data import Datasetfrom torch.utils.data import DataLoader
下載DLIB數據集我將在這裡選擇的數據集來檢測官方dlib數據集中的臉地標,該數據集包含超過6666幅不同維度的圖像。下面的代碼將下載數據集並解壓縮以進行進一步的探索:
%%captureif not os.path.exists('/content/ibug_300W_large_face_landmark_dataset'): !wget http: !tar -xvzf 'ibug_300W_large_face_landmark_dataset.tar.gz' !rm -r 'ibug_300W_large_face_landmark_dataset.tar.gz'可視化數據集
現在,讓我們來看看我們正在進行的工作,看看我們需要經歷的所有數據清理和預處理機會。下面是我們為此任務拍攝的數據集中的圖像示例。
file = open('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.pts')points = file.readlines()[3:-1]
landmarks = []
for point in points: x,y = point.split(' ') landmarks.append([floor(float(x)), floor(float(y[:-1]))])
landmarks = np.array(landmarks)
plt.figure(figsize=(10,10))plt.imshow(mpimg.imread('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.jpg'))plt.scatter(landmarks[:,0], landmarks[:,1], s = 5, c = 'g')plt.show()你可以看到臉部在圖像中所佔的空間非常小。如果我們在神經網絡中使用這張圖像,它也會作為背景。因此,就像我們準備文本數據一樣,我們也將準備這個圖像數據集以進行進一步的探索。
創建數據集類
現在,讓我們深入研究數據集中的類和標籤。labels_ibug_300W_tra.xml由輸入圖像和地標以及裁剪臉部的邊框組成。我將將所有這些值存儲在列表中,以便我們可以在培訓過程中輕鬆地訪問它們。
class Transforms(): def __init__(self): pass def rotate(self, image, landmarks, angle): angle = random.uniform(-angle, +angle)
transformation_matrix = torch.tensor([ [+cos(radians(angle)), -sin(radians(angle))], [+sin(radians(angle)), +cos(radians(angle))] ])
image = imutils.rotate(np.array(image), angle)
landmarks = landmarks - 0.5 new_landmarks = np.matmul(landmarks, transformation_matrix) new_landmarks = new_landmarks + 0.5 return Image.fromarray(image), new_landmarks
def resize(self, image, landmarks, img_size): image = TF.resize(image, img_size) return image, landmarks
def color_jitter(self, image, landmarks): color_jitter = transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1) image = color_jitter(image) return image, landmarks
def crop_face(self, image, landmarks, crops): left = int(crops['left']) top = int(crops['top']) width = int(crops['width']) height = int(crops['height'])
image = TF.crop(image, top, left, height, width)
img_shape = np.array(image).shape landmarks = torch.tensor(landmarks) - torch.tensor([[left, top]]) landmarks = landmarks / torch.tensor([img_shape[1], img_shape[0]]) return image, landmarks
def __call__(self, image, landmarks, crops): image = Image.fromarray(image) image, landmarks = self.crop_face(image, landmarks, crops) image, landmarks = self.resize(image, landmarks, (224, 224)) image, landmarks = self.color_jitter(image, landmarks) image, landmarks = self.rotate(image, landmarks, angle=10) image = TF.to_tensor(image) image = TF.normalize(image, [0.5], [0.5]) return image, landmarksclass FaceLandmarksDataset(Dataset):
def __init__(self, transform=None):
tree = ET.parse('ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml') root = tree.getroot()
self.image_filenames = [] self.landmarks = [] self.crops = [] self.transform = transform self.root_dir = 'ibug_300W_large_face_landmark_dataset' for filename in root[2]: self.image_filenames.append(os.path.join(self.root_dir, filename.attrib['file']))
self.crops.append(filename[0].attrib)
landmark = [] for num in range(68): x_coordinate = int(filename[0][num].attrib['x']) y_coordinate = int(filename[0][num].attrib['y']) landmark.append([x_coordinate, y_coordinate]) self.landmarks.append(landmark)
self.landmarks = np.array(self.landmarks).astype('float32')
assert len(self.image_filenames) == len(self.landmarks)
def __len__(self): return len(self.image_filenames)
def __getitem__(self, index): image = cv2.imread(self.image_filenames[index], 0) landmarks = self.landmarks[index] if self.transform: image, landmarks = self.transform(image, landmarks, self.crops[index])
landmarks = landmarks - 0.5
return image, landmarks
dataset = FaceLandmarksDataset(Transforms())可視化隊列轉化
現在讓我們快速地看看到現在為止我們已經做了些什麼。我將通過執行上述類將提供給數據集的轉換來可視化數據集:
image, landmarks = dataset[0]landmarks = (landmarks + 0.5) * 224plt.figure(figsize=(10, 10))plt.imshow(image.numpy().squeeze(), cmap='gray');plt.scatter(landmarks[:,0], landmarks[:,1], s=8);人臉關鍵點檢測訓練與預測數據集的分割:
現在,為了更進一步,我將將數據集拆分為一個序列和一個有效的數據集:
len_valid_set = int(0.1*len(dataset))len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))print("The length of Valid set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)輸出:
The length of Train set is 6000
The length of Valid set is 666測試輸入數據的形狀:
images, landmarks = next(iter(train_loader))
print(images.shape)print(landmarks.shape)輸出:
torch.Size([64, 1, 224, 224])
torch.Size([64, 68, 2])人臉關鍵點檢測模型的定義
現在,我將使用ResNet 18作為我們的基本框架。我將修改第一層和最後一層,以便這些層能夠很容易地適應我們的目的:
class Network(nn.Module): def __init__(self,num_classes=136): super().__init__() self.model_name='resnet18' self.model=models.resnet18() self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False) self.model.fc=nn.Linear(self.model.fc.in_features, num_classes) def forward(self, x): x=self.model(x) return ximport sysdef print_overwrite(step, total_step, loss, operation): sys.stdout.write('\r') if operation == 'train': sys.stdout.write("Train Steps: %d/%d Loss: %.4f " % (step, total_step, loss)) else: sys.stdout.write("Valid Steps: %d/%d Loss: %.4f " % (step, total_step, loss)) sys.stdout.flush()人臉關鍵點檢測的神經網絡訓練
現在我將使用真實和預測的人臉標記之間的均方誤差:
torch.autograd.set_detect_anomaly(True)network = Network()network.cuda() criterion = nn.MSELoss()optimizer = optim.Adam(network.parameters(), lr=0.0001)loss_min = np.infnum_epochs = 10start_time = time.time()for epoch in range(1,num_epochs+1): loss_train = 0 loss_valid = 0 running_loss = 0 network.train() for step in range(1,len(train_loader)+1): images, landmarks = next(iter(train_loader)) images = images.cuda() landmarks = landmarks.view(landmarks.size(0),-1).cuda() predictions = network(images) optimizer.zero_grad() loss_train_step = criterion(predictions, landmarks) loss_train_step.backward() optimizer.step() loss_train += loss_train_step.item() running_loss = loss_train/step print_overwrite(step, len(train_loader), running_loss, 'train') network.eval() with torch.no_grad(): for step in range(1,len(valid_loader)+1): images, landmarks = next(iter(valid_loader)) images = images.cuda() landmarks = landmarks.view(landmarks.size(0),-1).cuda() predictions = network(images)
loss_valid_step = criterion(predictions, landmarks)
loss_valid += loss_valid_step.item() running_loss = loss_valid/step
print_overwrite(step, len(valid_loader), running_loss, 'valid') loss_train /= len(train_loader) loss_valid /= len(valid_loader) print('\n') print('Epoch: {} Train Loss: {:.4f} Valid Loss: {:.4f}'.format(epoch, loss_train, loss_valid)) print('') if loss_valid < loss_min: loss_min = loss_valid torch.save(network.state_dict(), '/content/face_landmarks.pth') print("\nMinimum Validation Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs)) print('Model Saved\n')print('Training Complete')print("Total Elapsed Time : {} s".format(time.time()-start_time))人臉關鍵點預測
讓我們使用上面我們在DataSet中未見過的圖像上訓練的模型:
start_time = time.time()with torch.no_grad(): best_network = Network() best_network.cuda() best_network.load_state_dict(torch.load('/content/face_landmarks.pth')) best_network.eval() images, landmarks = next(iter(valid_loader)) images = images.cuda() landmarks = (landmarks + 0.5) * 224
predictions = (best_network(images).cpu() + 0.5) * 224 predictions = predictions.view(-1,68,2) plt.figure(figsize=(10,40)) for img_num in range(8): plt.subplot(8,1,img_num+1) plt.imshow(images[img_num].cpu().numpy().transpose(1,2,0).squeeze(), cmap='gray') plt.scatter(predictions[img_num,:,0], predictions[img_num,:,1], c = 'r', s = 5) plt.scatter(landmarks[img_num,:,0], landmarks[img_num,:,1], c = 'g', s = 5)
print('Total number of test images: {}'.format(len(valid_dataset)))
end_time = time.time()print("Elapsed Time : {}".format(end_time - start_time))以下是人臉關鍵點檢測的一些照片,可以看出,現在已經可以精確的檢測出人臉各個關鍵點。
指導老師
長按二維碼關注我們
歡迎關注公眾號:沈浩老師