我們以一主兩從的複製方式結合案例學習一下哨兵的部署,相關環境如下
redis+sentinel
redis複製部署請參考另外一篇文章
部署完成後效果如下
> info replication34;master&34;192.168.100.4&34;6379&34;368018&34;192.168.100.3&34;6379&34;368032& Replicationrole:slavemaster_host:192.168.100.2master_port:6379master_link_status:upmaster_last_io_seconds_ago:0master_sync_in_progress:0slave_repl_offset:368298slave_priority:100slave_read_only:1connected_slaves:0master_replid:4668490d0bd8e3d2967fe33eb49efde8af6a537bmaster_replid2:0000000000000000000000000000000000000000master_repl_offset:368298second_repl_offset:-1repl_backlog_active:1repl_backlog_size:10485760repl_backlog_first_byte_offset:43repl_backlog_histlen:368256127.0.0.1:6379> role1) &34;2) &34;3) (integer) 63794) &34;5) (integer) 368312
> info replication34;slave&34;192.168.100.2&34;connected&34;/var/run/redis-sentinel.pid&34;/usr/local/redis/log/sentinel.log&34;/usr/local/redis/data& cat /usr/lib/systemd/system/sentinel.service [Unit]Description=Redis Sentinel provides high availability for Redis.Documentation=https://redis.io/topics/sentinelAfter=syslog.targetAfter=network.targetAfter=redis.target //注意sentinel服務最好在redis服務啟動後再啟動.[Service]Type=notifyUser=redisGroup=dbaRestart=always systemctl daemon-reload systemctl start sentinel oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo17397:X 04 Jul 2020 21:02:31.744 Configuration loaded17397:X 04 Jul 2020 21:02:31.744 * supervised by systemd, will signal readiness17397:X 04 Jul 2020 21:02:31.746 * Running mode=sentinel, port=26379.17397:X 04 Jul 2020 21:02:31.752 +monitor master userservice 192.168.100.2 6379 quorum 217397:X 04 Jul 2020 21:02:31.758 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:02:31.761 * +slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:02:33.786 * +sentinel sentinel 620c49f4f658095dae056e670822b634a257cc23 192.168.100.4 26379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:02:33.810 * +sentinel sentinel d7ec4699a9744e8b806efba64a12adf74fdf367f 192.168.100.3 26379 @ userservice 192.168.100.2 6379
從每個sentinel哨兵節點的日誌可以看到,哨兵根據我們指定的主節點地址通過info命令找到了所有從節點信息,通過發布和訂閱redis頻道__sentinel__:hello頻道找到了其他哨兵節點
哨兵啟動後,在每個哨兵的主配置文件中自動添加了以下動態參數
34;psubscribe&34;__sentinel__:hello&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.4,26379,620c49f4f658095dae056e670822b634a257cc23,0,userservice,192.168.100.2,6379,0&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.2,26379,85519e3d32442ac8c87329f381079007a08f9b9f,0,userservice,192.168.100.2,6379,0&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.3,26379,d7ec4699a9744e8b806efba64a12adf74fdf367f,0,userservice,192.168.100.2,6379,0&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.4,26379,620c49f4f658095dae056e670822b634a257cc23,0,userservice,192.168.100.2,6379,0&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.2,26379,85519e3d32442ac8c87329f381079007a08f9b9f,0,userservice,192.168.100.2,6379,0&34;pmessage&34;__sentinel__:hello&34;__sentinel__:hello&34;192.168.100.3,26379,d7ec4699a9744e8b806efba64a12adf74fdf367f,0,userservice,192.168.100.2,6379,0&34;name&34;userservice&34;ip&34;192.168.100.2&34;port&34;6379&34;runid&34;ce24ee8b27e42b9762ec595deb7542abf9ac4adc&34;flags&34;master&34;link-pending-commands&34;0&34;link-refcount&34;1&34;last-ping-sent&34;0&34;last-ok-ping-reply&34;688&34;last-ping-reply&34;688&34;down-after-milliseconds&34;5000&34;info-refresh&34;9582&34;role-reported&34;master&34;role-reported-time&34;892700&34;config-epoch&34;0&34;num-slaves&34;2&34;num-other-sentinels&34;2&34;quorum&34;2&34;failover-timeout&34;30000&34;parallel-syncs&34;1&34;192.168.100.2&34;6379& systemctl stop redis
分別查看三個節點的log輸出
ec2-redis-01哨兵日誌
17397:X 04 Jul 2020 21:33:40.687 +odown master userservice 192.168.100.2 6379 +new-epoch 117397:X 04 Jul 2020 21:33:40.740 +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 117397:X 04 Jul 2020 21:33:40.745 620c49f4f658095dae056e670822b634a257cc23 voted for 85519e3d32442ac8c87329f381079007a08f9b9f 117397:X 04 Jul 2020 21:33:40.813 +failover-state-select-slave master userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:40.866 +promoted-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:41.467 -odown master userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-inprog slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:42.478 * +slave-reconf-done slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:42.544 +switch-master userservice 192.168.100.2 6379 192.168.100.3 637917397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 637917397:X 04 Jul 2020 21:33:42.545 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 637917397:X 04 Jul 2020 21:33:47.548 User requested shutdown...1447:M 04 Jul 2020 21:33:35.576 * Calling fsync() on the AOF file.1447:M 04 Jul 2020 21:33:35.576 * Saving the final RDB snapshot before exiting.1447:M 04 Jul 2020 21:33:35.607 * DB saved on disk1447:M 04 Jul 2020 21:33:35.607 * Removing the pid file.1447:M 04 Jul 2020 21:33:35.607 +sdown master userservice 192.168.100.2 637916627:X 04 Jul 2020 21:33:45.469 quorum 2/216627:X 04 Jul 2020 21:33:45.469 +try-failover master userservice 192.168.100.2 637916627:X 04 Jul 2020 21:33:45.472 85519e3d32442ac8c87329f381079007a08f9b9f voted for 85519e3d32442ac8c87329f381079007a08f9b9f 116627:X 04 Jul 2020 21:33:45.485 +config-update-from sentinel 85519e3d32442ac8c87329f381079007a08f9b9f 192.168.100.2 26379 @ userservice 192.168.100.2 637916627:X 04 Jul 2020 21:33:46.264 +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
ec2-redis-02 redis日誌
1448:S 04 Jul 2020 21:33:40.336 Error condition on socket for SYNC: Connection refused1448:S 04 Jul 2020 21:33:42.015 * Connecting to MASTER 192.168.100.2:63791448:S 04 Jul 2020 21:33:42.015 * MASTER <-> REPLICA sync started1448:S 04 Jul 2020 21:33:42.015 Error condition on socket for SYNC: Connection refused1448:S 04 Jul 2020 21:33:44.030 * Connecting to MASTER 192.168.100.2:63791448:S 04 Jul 2020 21:33:44.030 * MASTER <-> REPLICA sync started1448:S 04 Jul 2020 21:33:44.030 Error condition on socket for SYNC: Connection refused1448:M 04 Jul 2020 21:33:45.677 39;id=43 addr=192.168.100.2:27104 fd=10 name=sentinel-85519e3d-cmd age=1869 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qb1448:M 04 Jul 2020 21:33:45.678 +sdown master userservice 192.168.100.2 637916556:X 04 Jul 2020 21:33:45.473 +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 116556:X 04 Jul 2020 21:33:46.263 +switch-master userservice 192.168.100.2 6379 192.168.100.3 637916556:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.3 637916556:X 04 Jul 2020 21:33:46.264 * +slave slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 637916556:X 04 Jul 2020 21:33:51.296 Connection with master lost.1444:S 04 Jul 2020 21:33:40.336 * Caching the disconnected master state.1444:S 04 Jul 2020 21:33:40.900 * Connecting to MASTER 192.168.100.2:63791444:S 04 Jul 2020 21:33:40.900 * MASTER <-> REPLICA sync started1444:S 04 Jul 2020 21:33:40.905 Error condition on socket for SYNC: Connection refused1444:S 04 Jul 2020 21:33:42.919 * Connecting to MASTER 192.168.100.2:63791444:S 04 Jul 2020 21:33:42.919 * MASTER <-> REPLICA sync started1444:S 04 Jul 2020 21:33:42.920 Error condition on socket for SYNC: Connection refused1444:S 04 Jul 2020 21:33:44.937 * Connecting to MASTER 192.168.100.2:63791444:S 04 Jul 2020 21:33:44.938 * MASTER <-> REPLICA sync started1444:S 04 Jul 2020 21:33:44.938 Error condition on socket for SYNC: Connection refused1444:S 04 Jul 2020 21:33:46.263 * REPLICAOF 192.168.100.3:6379 enabled (user request from &39;)1444:S 04 Jul 2020 21:33:46.264 Master replication ID changed to 727e32660ce348fc07a7805b06954ef9752821b71444:S 04 Jul 2020 21:33:46.961 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
首先,在哨兵節點監測到主節點宕機後,認為該主節點進入主觀下線狀態(sdown),通過三個哨兵節點的的輸出日誌可以看出,如下
17397:X 04 Jul 2020 21:33:40.687 +odown master userservice 192.168.100.2 6379 +odown master userservice 192.168.100.2 6379 +try-failover master userservice 192.168.100.2 6379
哨兵ec2-redis-02
16627:X 04 Jul 2020 21:33:45.469 +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
哨兵ec2-redis-02投票
16627:X 04 Jul 2020 21:33:45.472 +vote-for-leader 85519e3d32442ac8c87329f381079007a08f9b9f 1
以上可以看到,運行id為85519e3d32442ac8c87329f381079007a08f9b9f的哨兵節點共獲得2票,成為領頭哨兵,而且哨兵ec2-redis-01的日誌裡也顯示自己獲得了多數票數成為領頭哨兵
17397:X 04 Jul 2020 21:33:40.813 +selected-slave slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 6379
然後領頭哨兵開始執行故障轉移
17397:X 04 Jul 2020 21:33:40.866 * +failover-state-send-slaveof-noone slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:40.950 * +failover-state-wait-promotion slave 192.168.100.3:6379 192.168.100.3 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:41.467 +failover-state-reconf-slaves master userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:41.536 * +slave-reconf-sent slave 192.168.100.4:6379 192.168.100.4 6379 @ userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:41.839 +failover-end master userservice 192.168.100.2 637917397:X 04 Jul 2020 21:33:42.544 +sdown slave 192.168.100.2:6379 192.168.100.2 6379 @ userservice 192.168.100.3 6379
哨兵ec2-redis-02的日誌和哨兵ec2-redis-02的日誌說明領頭哨兵讓其他哨兵節點以自己的信息為準強制其他哨兵節點更新配置信息,如下所示.
16627:X 04 Jul 2020 21:33:46.263 CONFIG REWRITE executed with success.
這裡需要說明的是,連接原來主節點的從節點重新連接新的主節點後,並不需要從新的主節點同步一份完整的數據,僅僅需要同步缺失的數據即可.
1444:S 04 Jul 2020 21:33:46.957 * Connecting to MASTER 192.168.100.3:63791444:S 04 Jul 2020 21:33:46.957 * MASTER <-> REPLICA sync started1444:S 04 Jul 2020 21:33:46.957 * Non blocking connect for SYNC fired the event.1444:S 04 Jul 2020 21:33:46.958 * Master replied to PING, replication can continue...1444:S 04 Jul 2020 21:33:46.960 * Trying a partial resynchronization (request 4668490d0bd8e3d2967fe33eb49efde8af6a537b:1603594).1444:S 04 Jul 2020 21:33:46.961 * Successful partial resynchronization with master.1444:S 04 Jul 2020 21:33:46.961 systemctl start redis
首先哨兵節點會將該節點降級為新的主節點的從節點
17397:X 04 Jul 2020 22:53:18.228 39;839c9173e6e76ea20ebf46f36c963bc9f81ac84c&39;727e32660ce348fc07a7805b06954ef9752821b7&39;4668490d0bd8e3d2967fe33eb49efde8af6a537b& Next failover delay: I will not start a failover before Sun Jul 5 11:35:13 2020
以上簡單對哨兵的實際部署和故障轉移過程做一個簡單的介紹.