雖然目前還不完全清楚類人機器人相比傳統機器人的優勢在哪裡,但它們比較擅長的一個領域就是娛樂。迪斯尼研究所的工作人員進行的都是關於娛樂方面的研究,他們在這方面的工作已經很長時間了,他們的一些電子動畫吸引人的地方相當令人印象深刻。
The next step for Disney is to make its animatronic figures, which currently feature scripted behaviors, to perform in an interactive manner with visitors. The challenge is that this is where you start to get into potential Uncanny Valley territory, which is what happens when you try to create 「the illusion of life,」 which is what Disney (they explicitly say) is trying to do.
迪斯尼的下一步是製作動畫,目前它以腳本行為為特徵,以與遊客互動的方式進行表演。難點是,這就是你開始進入神秘山谷地區的地方,當你試圖創造「生活的幻象」時發生的事情,迪斯尼(他們明確地說)正試圖做到這一點。
In a paper presented at IROS this month, a team from Disney Research, Caltech, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering is trying to nail that illusion of life with a single, and perhaps most important, social cue: eye gaze.
本月在IROS上發表的一篇論文中,一個來自迪斯尼研究所、加州理工大學、伊利諾伊大學香檳分校和沃爾特迪斯尼想像工程的團隊試圖用一個單一的,也許是最重要的,社會線索:眼睛凝視,來固定這種生活錯覺。
What, exactly, does 「lifelike」 mean in the context of robotic gaze? The paper abstract describes the goal as 「[seeking] to create an interaction which demonstrates the illusion of life.」 I suppose you could think of it like a sort of old-fashioned Turing test focused on gaze: If the gaze of this robot cannot be distinguished from the gaze of a human, then victory, that’s lifelike. And critically, we’re talking about mutual gaze here—not just a robot gazing off into the distance, but you looking deep into the eyes of this robot and it looking right back at you just like a human would. Or, just like some humans would.
在機器人凝視的語境中,「栩栩如生」到底意味著什麼?論文摘要將目標描述為「[尋求]創造一種展示生命幻覺的互動。」我想你可以把它想像成一種專注於凝視的老式圖靈測試:如果這個機器人的凝視無法與人類的凝視區分開來,那麼勝利,就栩栩如生了。關鍵是,我們談論的是相互凝視,不僅僅是一個機器人凝視著遠方,而是你深深地凝視著這個機器人的眼睛,它就像人類一樣,正回望著你。或者,就像有些人一樣。
The approach that Disney is using is more animation-y than biology-y or psychology-y. In other words, they’re not trying to figure out what’s going on in our brains to make our eyes move the way that they do when we’re looking at other people and basing their control system on that, but instead, Disney just wants it to look right. This 「visual appeal」 approach is totally fine, and there’s been an enormous amount of human-robot interaction (HRI) research behind it already, albeit usually with less explicitly human-like platforms. And speaking of human-like platforms, the hardware is a 「custom Walt Disney Imagineering Audio-Animatronics bust,」 which has DoFs that include neck, eyes, eyelids, and eyebrows.
迪士尼使用的方法更像是動畫,而不是生物學或心理學。換句話說,他們並不是想弄清楚我們大腦中發生了什麼,讓我們的眼睛像我們看著別人時那樣移動,並以此為基礎建立他們的控制系統,但相反,迪士尼只是希望它看起來正確。這種「視覺吸引力」的方法是完全正確的,並且已經有大量的人機互動(HRI)研究支持它,儘管通常使用的是不太明顯的類人平臺。說到類似人類的平臺,這個硬體是一個「定製的沃爾特迪斯尼想像工程音頻動畫半身像」,它有包括脖子、眼睛、眼瞼和眉毛在內的自由度。
In order to decide on gaze motions, the system first identifies a person to target with its attention using an RGB-D camera. If more than one person is visible, the system calculates a curiosity score for each, currently simplified to be based on how much motion it sees. Depending on which person that the robot can see has the highest curiosity score, the system will choose from a variety of high level gaze behavior states, including:
Read: The Read state can be considered the 「default」 state of the character. When not executing another state, the robot character will return to the Read state. Here, the character will appear to read a book located at torso level.
Glance: A transition to the Glance state from the Read or Engage states occurs when the attention engine indicates that there is a stimuli with a curiosity score […] above a certain threshold.
Engage: The Engage state occurs when the attention engine indicates that there is a stimuli [...] to meet a threshold and can be triggered from both Read and Glance states. This state causes the robot to gaze at the person-of-interest with both the eyes and head.
Acknowledge: The Acknowledge state is triggered from either Engage or Glance states when the person-of-interest is deemed to be familiar to the robot.
為了確定注視動作,該系統首先使用RGB-D攝像機識別目標。如果不止一個人可見,系統會為每個人計算一個好奇心分數,目前簡化為基於看到的運動量。根據機器人能看到的人的好奇心得分最高,系統將從各種高級凝視行為狀態中進行選擇,包括:
讀:讀狀態可以被認為是字符的「默認」狀態。當不執行其他狀態時,robot角色將返回到Read狀態。在這裡,角色將出現在閱讀位於軀幹水平。
瞥:當注意引擎顯示有一個好奇心分數[…]高於某個閾值的刺激時,就會從Read或Engage狀態過渡到Glance狀態。
僱傭:Engage狀態發生在注意力引擎指示有一個滿足閾值的刺激[…]並且可以從Read和Glance狀態觸發。這種狀態會使機器人用眼睛和頭注視感興趣的人。
確認:當感興趣的人被認為是機器人所熟悉時,確認狀態從嚙合或掃視狀態觸發。
Running underneath these higher level behavior states are lower level motion behaviors like breathing, small head movements, eye blinking, and saccades (the quick eye movements that occur when people, or robots, look between two different focal points). The term for this hierarchical behavioral state layering is a subsumption architecture, which goes all the way back to Rodney Brooks』 work on robots like Genghis in the 1980s and Cog and Kismet in the 』90s, and it provides a way for more complex behaviors to emerge from a set of simple, decentralized low-level behaviors.
在這些高級行為狀態下運行的是較低層次的運動行為,如呼吸、小頭運動、眨眼和眼跳(當人或機器人在兩個不同的焦點之間觀看時發生的快速眼動)。這種分層行為狀態分層的術語是一種包容體系結構,它可以追溯到羅德尼·布魯克斯(Rodney Brooks)在20世紀80年代對成吉思汗(Genghis)和90年代的Cog和Kismet等機器人的研究,它為更複雜的行為從一系列簡單、分散的低級行為中脫穎而出提供了一種途徑。
Brooks, an emeritus professor at MIT and, most recently, cofounder and CTO of Robust.ai, tweeted about the Disney project, saying: 「People underestimate how long it takes to get from academic paper to real world robotics. 25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.」
布魯克斯,麻省理工學院名譽教授,最近也是穩健.ai他在推特上談到迪斯尼項目時說:「人們低估了從學術論文到現實世界機器人技術所需的時間。迪斯尼25年來一直在使用我的包容架構來控制人眼,現在比我們1995年在Cog和Kismet上的實現更好、更流暢。」