蘋果棋的人工智慧

2021-02-28 神奇海螺萬歲

我做了三個難度

初級

初級難度就是隨便下，上一篇，有一個函數 getLegal 用來獲取所有可以落子的坐標

初級難度，cpu就是自己隨機選擇一個

const arr = engine.getLegal();
const [ col, row ] = arr[parseInt(Math.random() * arr.length)];
const data = { col, row, color: _this.getColor() };
const e = { type: AIPlayer.EVENT.MOVE, data };
_this.dispatchEvent(e);
中級中級我們按照有限度做了策略
Roxanne 策略 詳見 《Analysis of Monte Carlo Techniques in Othello》
提出者：Canosa, R. Roxanne canosa homepage. https://www.cs.rit.edu/~rlc/
我們把8*8的所有位置分了一下優先級
我們畫個圖清楚的表現一下
那麼根據Roxanne策略寫一個類，就是讓cpu按照優先級落子，這就是中級難度
class Roxanne {
  constructor() {
    this._table = [ // 優先級
      [ [0, 0], [0, 7], [7, 0], [7, 7] ],
      [ [2, 2], [2, 5], [5, 2], [5, 5] ],
      [ [3, 2], [3, 5], [4, 2], [4, 5], [2, 3], [2, 4], [5, 3], [5, 4] ],
      [ [2, 0], [2, 7], [5, 0], [5, 7], [0, 2], [0, 5], [7, 2], [7, 5] ],
      [ [3, 0], [3, 7], [4, 0], [4, 7], [0, 3], [0, 4], [7, 3], [7, 4] ],
      [ [2, 1], [2, 6], [5, 1], [5, 6], [1, 2], [1, 5], [6, 2], [6, 5] ],
      [ [3, 1], [3, 6], [4, 1], [4, 6], [1, 3], [1, 4], [6, 3], [6, 4] ],
      [ [1, 1], [1, 6], [6, 1], [6, 6] ],
      [ [1, 0], [1, 7], [6, 0], [6, 7], [0, 1], [0, 6], [7, 1], [7, 6]]
    ];
  }
  select(arr) {
    if (arr && arr.length > 0) {
      for(const moves of this._table) { // 按照優先級遍歷所有策略
        moves.sort(() => 0.5 - Math.random()); // 隨機排列該等級策略
        for(const move of moves) {
          if(arr.findIndex(n => n[0] === move[0] && n[1] === move[1]) > -1) { // 判斷可下的位置是否在策略裡
            return move;
          }
        }
      }
    } else {
      return false;
    }
  }
}
高級高級使用的是蒙特卡洛算法
蒙特卡羅方法於20世紀40年代美國在第二次世界大戰中研製原子彈的「曼哈頓計劃」計劃的成員S.M.烏拉姆和J.馮·諾伊曼首先提出。數學家馮·諾伊曼用馳名世界的賭城—摩納哥的Monte Carlo—來命名這種方法，為它蒙上了一層神秘色彩。在這之前，蒙特卡羅方法就已經存在。1777年，法國數學家布豐（Georges Louis Leclere de Buffon，1707—1788）提出用投針實驗的方法求圓周率π。這被認為是蒙特卡羅方法的起源。
所謂的蒙特卡洛算法，其實就是試驗模擬結果，然後算高分
模擬人腦思考對手我們是要創建一個模擬玩家，去走棋(落子)。
為了提升效率，不考慮比較差的走法，我們就按照Roxanne的優先級落子
/**
 * 模擬玩家
 * @param {int} color 顏色
 */
class SimulatePlayer {
  /**
   * 
   * @param {enum} color 顏色
   * @param {enum} type 類型
   */
  constructor(color, type) {
    this.color = color;
    this.type = type;
    this.roxnane = new Roxanne();
  }
  /**
   * 走棋
   * @param {Array} arr 可以走的位置
   * @returns 走棋步驟[col, row]
   */
  getMove(arr) {
    return this.roxnane.select(arr);
  }
}
這就是在模擬人腦，比如你在下棋時候，也會自己想設想一下走完這一步對方怎麼走，想出5~8步再落子，機器的運算能力高，我們就可以算到最終勝負
/**
 * 模擬器
 */
class Simulator {
  /**
   * 構造
   * @param {Player} black 黑棋模擬選手
   * @param {Player} white 白棋模擬選手
   * @param {Engine} engine 引擎
   * @param {int} current 現在棋手
   */
  constructor(black, white, engine, current) {
    this.black = black;
    this.white = white;
    this.current = current;
    this.engine = engine;
  }
  run() {
    let n = 0;
    while(!this.engine.isGameOver()) {
      this.shift();
      const arr = this.engine.getLegal(this.current.color);
      if(arr && arr.length > 0) {
        const action = this.current.getMove(arr);
        this.engine.move(action[1], action[0]);  
      }
      if(n >100)
        break;
      else
        n++;
    }
    return this.engine.getResult();
  }
  /**
   * 設置current玩家
   */
  shift() {
    const color = this.engine.getPlayer();
    if(isNaN(color)) {
      this.current = this.black;
    } else {
      this.current = color === Engine.CHESS.BLACK ? this.black : this.white;
    }
    return this.current;
  }
}
蒙特卡洛算法
首先創建一個蒙特卡洛樹
其實就代表了一步棋，它擁有children可以無限差分，就是把下面的走法放到子步驟裡；它也有一個parent方便尋找上一級計算得分
/**
 * 夢特卡洛樹
 */
class TreeNode {
  constructor(parent, color) {
    this.parent = parent;
    this.w = 0;
    this.n = 0;
    this.row = NaN;
    this.col = NaN;
    this.color = color;
    this.children = [];
  }
  /**
   * 設置走棋步驟
   * @param {array} move [col, row]
   */
  setMove(move) {
    this.row = move[1];
    this.col = move[0];
  }
  /**
   * 獲得走棋步驟
   */
  getMove() {
    return [this.col, this.row];
  }
  /**
   * 添加節點，去重
   * @param {TreeNode} node 蒙特卡洛樹節點
   */
  add(node) {
    if(this.children.findIndex(p => p.row === node.row && p.col === node.col) === -1) {
      this.children.push(node);
    }
  }
}
然後就是開始執行蒙特卡洛算法了，其實也就是模擬走棋然後選出最佳方案
這裡要注意的是，js沒有真正的多線程，所以我使用的Promise配合setInterval創建一個異步任務保證這個計算運行的同時不阻塞程序
/**
 * 夢特卡洛算法
 * @param {int} color 顏色
 * @param {int} expire 限時
 */
class MonteCarlo {
  constructor(color, expire = 2000) {
    this.color = color;
    this.expire = expire;
    this.tick = 0;
    this.black = new SimulatePlayer(Engine.CHESS.BLACK);
    this.white = new SimulatePlayer(Engine.CHESS.WHITE);
  }
  /**
   * 運行
   * @param {Engine} engine 引擎
   */
  run(engine) {
    this.tick = new Date().getTime();
    const root = new TreeNode(null, this.color);
    return new Promise((resolve, reject) => {
      const sid = setInterval(() => {
        const simulateEngine = engine.clone();
        const choice = this.select(root, simulateEngine);
        this.expand(choice, simulateEngine);
        const {winner} = this.simulate(choice, simulateEngine);
        let backScore = [0.5, 1, 0][winner];
        if (choice.color === Engine.CHESS.BLACK) {
          backScore = 1 - backScore;
        }
        this.backProp(choice, backScore);
        if(new Date().getTime() - this.tick > this.expire - 1) {
          clearInterval(sid);
          let bestN = -1;
          let bestMove = null;
          for(const k in root.children) {
            if(root.children[k].n > bestN) {
              bestN = root.children[k].n;
              bestMove = root.children[k].getMove();
            }
          }      
          resolve(bestMove);
        }
      }, 1);
  
    });
  }
  /**
   * 蒙特卡洛樹搜索，節點選擇
   * @param {TreeNode} node 節點
   * @param {Engine} engine 引擎
   * @returns 搜索樹向下遞歸選擇子節點
   */
  select(node, engine) {
    if (node.children.length === 0) {
      return node;
    } else {
      let bestScore = -1;
      let bestMove = null;
      for (const k in node.children) {
        if (node.children[k].n === 0) {
          bestMove = k;
          break;
        } else {
          let N = node.n;
          let n = node.children[k].n;
          let w = node.children[k].w;
          const score = w / n + Math.sqrt(2 * Math.log(N) / n);
          if (score > bestScore) {
            bestScore = score;
            bestMove = k;
          }
        }
      }
      return this.select(node.children[bestMove], engine);
    }
  }
  /**
   * 蒙特卡洛樹搜索，節點擴展
   * @param {TreeNode} node 節點
   * @param {Engine} engine 引擎
   * @returns 搜索樹向下遞歸選擇子節點
   */
  expand(node, engine) {
    for (const move of engine.getLegal(node.color)) {
      const child = new TreeNode(node, node.color);
      child.setMove(move);
      node.add(child);
    }
  }
  /**
   * 蒙特卡洛樹搜索，採用Roxanne策略代替隨機策略搜索，模擬擴展搜索樹
   * @param {TreeNode} node 節點
   * @param {Engine} engine 引擎
   */
  simulate(node, engine) {
    const current = node.color === Engine.CHESS.WHITE ? Engine.CHESS.BLACK : Engine.CHESS.WHITE;
    const simulator = new Simulator(this.black, this.white, engine, current);
    return simulator.run();
  }
  /**
   * 蒙特卡洛樹搜索，反向傳播，回溯更新模擬路徑中的節點獎勵
   * @param {TreeNode} node 節點
   * @param {int} score 計分
   */
  backProp(node, score) {
    node.n += 1;
    node.w += score;
    if (node.parent) {
      this.backProp(node.parent, 1 - score);
    } 
  }
  /**
   * 蒙特卡洛樹搜索
   * @param {Engine} engine 引擎
   * @returns 採取最佳拓展落子策略
   */
  getMove(engine) {
    this.tick = new Date().getTime();
    const action = this.mcts(engine.clone());
    return action;
  }
}
這樣，cpu的AI就完成了
擴展其實蒙特卡洛樹算法，並不是最佳。這裡就是電腦跟人腦的一個大區別，電腦只能看出眼前的優先級，並不能跟人一樣輕鬆使用「爬邊」、「斯通納陷阱」之類先抑後揚的戰術
術語C位、星位（C-squares and X-squares）：C位就是位於(0, 1)、(0, 6)、(1, 0)、(1, 7)、(6, 0)、(6, 7)、(7, 1)和(7, 6)的位置，星位就是位於(1, 1)、(1, 6)、(6, 1)和(6, 6)的位置。這些位置務必要小心佔用。
中心（Center）：局面的中心就是內部子的集合。
控制中心（Control of the center）：一種策略，它試圖在局面中心擁有儘可能多的棋子，沿邊界擁有儘可能少的棋子，以獲得最大可能的行動力。
角（Corner）：角就是位於(0,0)、(0,7)、(7,0)和(7,7)的位置。這些位置不可能被對方夾吃。
爬邊（Edge creeping）：一種以弱邊(不平衡邊)為代價，在一條或兩條邊上獲得最大數量棋步的策略。爬邊者試圖通過將全部邊界都留給對手來快速耗盡他的棋步，但是如果爬邊不能湊效，壞邊產生的效應將使他的局面迅速變弱。
邊界（Frontier）：邊緣子的集合，也就是說那些與空位相鄰的棋子。
獲得餘裕手（Gain a tempo）：在棋盤的某個區域內比對手多下一步棋，以迫使他在其他地方先開始下棋（從而延長他的邊界）。
效應（Influence）：當棋手的棋子迫使他同時在多個方向上翻轉棋子時，我們就說這些棋子產生了效應。
內部子（Internal discs）、邊緣子（external discs）：內部子就是不與空位相鄰的棋子。沒有內部子在戰略上是很糟的。
自由（Liberty）：非災難性的棋步。「缺少自由」：在不久的將來不得不送角。
多子策略（Maximum disc strategy）：許多初學者所用的錯誤策略，他們每步棋都試圖翻轉最大數量的棋子。
行動力（Mobility）：棋手合法的可能棋步數量。更進一步說，當棋手擁有大量的可能棋步時，他就擁有好的行動力。
奇偶性（Parity）：一種在對手佔據的每個區域內都留下偶數個空位的策略。
安靜步（Quiet move）：不翻轉邊緣子的一步棋，通常這是步好棋。
穩定子（Stable discs）：絕對不會被翻轉的棋子。角就是一個穩定子的實例。
斯通納陷阱/四通陷阱（Stoner Trap）：一種針對弱邊局面強迫進行角交換的攻擊。
不平衡邊（Unbalanced edge）：由非6個同色棋子組成的邊結構。
AlphaGo因為這個遊戲展示是純前臺的，如果帶後臺，我們可以更方便的窮盡所有的走棋步驟，存入資料庫方便復用，而不是只按照優先級落子
後記人工智慧介紹完畢，後面會介紹一下3D棋盤的前端canvas技術，也就是基於threejs框架的webgl
遊戲引擎請看 蘋果棋的遊戲引擎
在線遊戲請看 蘋果棋

蘋果棋的人工智慧

相關焦點

蘋果2億美元收購人工智慧公司或與iPhone有關

人工智慧大動作不斷,海信在下一步什麼棋?

AlphaGo、康德和人工智慧

各大高校新增人工智慧專業，關於人工智慧，你知道多少？

【WWDC18 專訪】蘋果獎學金的背後,是哪些神奇的人和 App?

AlphaGo對圍棋研究謝幕留下輔助學棋的軟體

馬斯克將通過連線訪談參加2020世界人工智慧大會

挖角谷歌AI元老John Giannandrea,蘋果想在人工智慧領域幹點啥?

谷歌頂級人工智慧專家Ian Goodfellow加入蘋果公司擔任總監級別職務

馬雲馬斯克探討人工智慧;華為9月在慕尼黑推出Mate30系列手機;蘋果...

新奧杯彭立堯驚天逆轉譜經典番棋故事

Square Off智能象棋將哈利波特巫師棋帶進現實

有趣的民間遊戲:墨點棋、扇子棋、五虎棋、五角星棋、扎槍蘿蔔

星陣網頁智能對弈免費開放人人可享頂尖職業免費指導棋

蘋果出人意料地改用自研電腦晶片,到底在下一盤多大的棋?

有趣的民間遊戲:乘棋、丁字棋、梯形棋、分割包圍、對角棋

樊麾回應輸給電腦:最初不信它能贏自己棋太臭

【芯視野】蘋果的下一盤大棋?

漫畫:什麼是人工智慧?

棋後侯逸凡分享自己的學習和人生經歷

蘋果棋的人工智慧

相關焦點

蘋果2億美元收購人工智慧公司 或與iPhone有關

人工智慧大動作不斷,海信在下一步什麼棋?

AlphaGo、康德和人工智慧

各大高校新增人工智慧專業，關於人工智慧，你知道多少？

【WWDC18 專訪】蘋果獎學金的背後,是哪些神奇的人和 App?

AlphaGo對圍棋研究謝幕 留下輔助學棋的軟體

馬斯克將通過連線訪談參加2020世界人工智慧大會

挖角谷歌AI元老John Giannandrea,蘋果想在人工智慧領域幹點啥?

谷歌頂級人工智慧專家Ian Goodfellow加入蘋果公司 擔任總監級別職務

馬雲馬斯克探討人工智慧;華為9月在慕尼黑推出Mate30系列手機;蘋果...

新奧杯彭立堯驚天逆轉 譜經典番棋故事

Square Off智能象棋將哈利波特巫師棋帶進現實

有趣的民間遊戲:墨點棋、扇子棋、五虎棋、五角星棋、扎槍蘿蔔

星陣網頁智能對弈免費開放 人人可享頂尖職業免費指導棋

蘋果出人意料地改用自研電腦晶片,到底在下一盤多大的棋?

有趣的民間遊戲:乘棋、丁字棋、梯形棋、分割包圍、對角棋

樊麾回應輸給電腦:最初不信它能贏 自己棋太臭

【芯視野】蘋果的下一盤大棋?

漫畫:什麼是人工智慧?

棋後侯逸凡分享自己的學習和人生經歷

蘋果2億美元收購人工智慧公司或與iPhone有關

AlphaGo對圍棋研究謝幕留下輔助學棋的軟體

谷歌頂級人工智慧專家Ian Goodfellow加入蘋果公司擔任總監級別職務

新奧杯彭立堯驚天逆轉譜經典番棋故事

星陣網頁智能對弈免費開放人人可享頂尖職業免費指導棋

樊麾回應輸給電腦:最初不信它能贏自己棋太臭