MySQL8.0新增了窗口函數,大大的方便了做數據排名統計的人,很多朋友還不清楚怎麼用MySQL8.0做數據統計排名,下面就來為大家分享一篇心得文章。
MySQL8.0之前,做數據排名統計等相當痛苦,因為沒有像Oracle、SQL SERVER 、PostgreSQL等其他資料庫那樣的窗口函數。但隨著MySQL8.0中新增了窗口函數之後,針對這類統計就再也不是事了,本文就以常用的排序實例介紹MySQL的窗口函數。
1、準備工作
創建表及測試數據
mysql> use testdb;Database changed/*創建表 */mysql> create table tb_score(id int primary key auto_increment,stu_no varchar(10),course varchar(50),score decimal(4,1),key idx_stuNo_course(stu_no,course));Query OK, 0 rows affected (0.03 sec)mysql> show tables;+------------------+| Tables_in_testdb |+------------------+| tb_score |+------------------+/* 新增一批測試數據 */mysql> insert into tb_score(stu_no,course,score)values('2020001','mysql',90),('2020001','C++',85),('2020003','English',100),('2020002','mysql',50),('2020002','C++',70),('2020002','English',99);Query OK, 6 rows affected (0.00 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> insert into tb_score(stu_no,course,score)values('2020003','mysql',78),('2020003','C++',81),('2020003','English',80),('2020004','mysql',80),('2020004','C++',60),('2020004','English',100);Query OK, 6 rows affected (0.01 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> insert into tb_score(stu_no,course,score)values('2020005','mysql',98),('2020005','C++',96),('2020005','English',70),('2020006','mysql',60),('2020006','C++',90),('2020006','English',70);Query OK, 6 rows affected (0.01 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> insert into tb_score(stu_no,course,score)values('2020007','mysql',50),('2020007','C++',66),('2020007','English',76),('2020008','mysql',90),('2020008','C++',69),('2020008','English',86);Query OK, 6 rows affected (0.01 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> insert into tb_score(stu_no,course,score)values('2020009','mysql',70),('2020009','C++',66),('2020009','English',86),('2020010','mysql',75),('2020010','C++',76),('2020010','English',81);Query OK, 6 rows affected (0.01 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> insert into tb_score(stu_no,course,score)values('2020011','mysql',90),('2020012','C++',85),('2020011','English',84),('2020012','English',75),('2020013','C++',96),('2020013','English',88);Query OK, 6 rows affected (0.01 sec)Records: 6 Duplicates: 0 Warnings: 0
2、統計每門課程分數的排名
根據每門課程的分數從高到低進行排名,此時,會出現分數相同時怎麼處理的問題,下面就根據不同的窗口函數來處理不同場景的需求
ROW_NUMBER
由結果可以看出,分數相同時按照學號順序進行排名
mysql> select stu_no,course,score, row_number()over(partition by course order by score desc ) rn-> from tb_score;+---------+---------+-------+----+| stu_no | course | score | rn |+---------+---------+-------+----+| 2020005 | C++ | 96.0 | 1 || 2020013 | C++ | 96.0 | 2 || 2020006 | C++ | 90.0 | 3 || 2020001 | C++ | 85.0 | 4 || 2020012 | C++ | 85.0 | 5 || 2020003 | C++ | 81.0 | 6 || 2020010 | C++ | 76.0 | 7 || 2020002 | C++ | 70.0 | 8 || 2020008 | C++ | 69.0 | 9 || 2020007 | C++ | 66.0 | 10 || 2020009 | C++ | 66.0 | 11 || 2020004 | C++ | 60.0 | 12 || 2020003 | English | 100.0 | 1 || 2020004 | English | 100.0 | 2 || 2020002 | English | 99.0 | 3 || 2020013 | English | 88.0 | 4 || 2020008 | English | 86.0 | 5 || 2020009 | English | 86.0 | 6 || 2020011 | English | 84.0 | 7 || 2020010 | English | 81.0 | 8 || 2020003 | English | 80.0 | 9 || 2020007 | English | 76.0 | 10 || 2020012 | English | 75.0 | 11 || 2020005 | English | 70.0 | 12 || 2020006 | English | 70.0 | 13 || 2020005 | mysql | 98.0 | 1 || 2020001 | mysql | 90.0 | 2 || 2020008 | mysql | 90.0 | 3 || 2020011 | mysql | 90.0 | 4 || 2020004 | mysql | 80.0 | 5 || 2020003 | mysql | 78.0 | 6 || 2020010 | mysql | 75.0 | 7 || 2020009 | mysql | 70.0 | 8 || 2020006 | mysql | 60.0 | 9 || 2020002 | mysql | 50.0 | 10 || 2020007 | mysql | 50.0 | 11 |+---------+---------+-------+----+36 rows in set (0.00 sec)
DENSE_RANK
為了讓分數相同時排名也相同,則可以使用DENSE_RANK函數,結果如下:
mysql> select stu_no,course,score, DENSE_RANK()over(partition by course order by score desc ) rn -> fromtb_score;+---------+---------+-------+----+| stu_no | course | score | rn |+---------+---------+-------+----+| 2020005 | C++ | 96.0 | 1 || 2020013 | C++ | 96.0 | 1 || 2020006 | C++ | 90.0 | 2 || 2020001 | C++ | 85.0 | 3 || 2020012 | C++ | 85.0 | 3 || 2020003 | C++ | 81.0 | 4 || 2020010 | C++ | 76.0 | 5 || 2020002 | C++ | 70.0 | 6 || 2020008 | C++ | 69.0 | 7 || 2020007 | C++ | 66.0 | 8 || 2020009 | C++ | 66.0 | 8 || 2020004 | C++ | 60.0 | 9 || 2020003 | English | 100.0 | 1 || 2020004 | English | 100.0 | 1 || 2020002 | English | 99.0 | 2 || 2020013 | English | 88.0 | 3 || 2020008 | English | 86.0 | 4 || 2020009 | English | 86.0 | 4 || 2020011 | English | 84.0 | 5 || 2020010 | English | 81.0 | 6 || 2020003 | English | 80.0 | 7 || 2020007 | English | 76.0 | 8 || 2020012 | English | 75.0 | 9 || 2020005 | English | 70.0 | 10 || 2020006 | English | 70.0 | 10 || 2020005 | mysql | 98.0 | 1 || 2020001 | mysql | 90.0 | 2 || 2020008 | mysql | 90.0 | 2 || 2020011 | mysql | 90.0 | 2 || 2020004 | mysql | 80.0 | 3 || 2020003 | mysql | 78.0 | 4 || 2020010 | mysql | 75.0 | 5 || 2020009 | mysql | 70.0 | 6 || 2020006 | mysql | 60.0 | 7 || 2020002 | mysql | 50.0 | 8 || 2020007 | mysql | 50.0 | 8 |+---------+---------+-------+----+36 rows in set (0.00 sec)
RANK
DENSE_RANK的結果是分數相同時排名相同了,但是下一個名次是緊接著上一個名次的,如果 2 個並列的第 1 之後,下一個我想是第 3 名,則可以使用RANK函數實現
mysql> select stu_no,course,score, rank()over(partition by course order by score desc ) rn -> fromtb_score;+---------+---------+-------+----+| stu_no | course | score | rn |+---------+---------+-------+----+| 2020005 | C++ | 96.0 | 1 || 2020013 | C++ | 96.0 | 1 || 2020006 | C++ | 90.0 | 3 || 2020001 | C++ | 85.0 | 4 || 2020012 | C++ | 85.0 | 4 || 2020003 | C++ | 81.0 | 6 || 2020010 | C++ | 76.0 | 7 || 2020002 | C++ | 70.0 | 8 || 2020008 | C++ | 69.0 | 9 || 2020007 | C++ | 66.0 | 10 || 2020009 | C++ | 66.0 | 10 || 2020004 | C++ | 60.0 | 12 || 2020003 | English | 100.0 | 1 || 2020004 | English | 100.0 | 1 || 2020002 | English | 99.0 | 3 || 2020013 | English | 88.0 | 4 || 2020008 | English | 86.0 | 5 || 2020009 | English | 86.0 | 5 || 2020011 | English | 84.0 | 7 || 2020010 | English | 81.0 | 8 || 2020003 | English | 80.0 | 9 || 2020007 | English | 76.0 | 10 || 2020012 | English | 75.0 | 11 || 2020005 | English | 70.0 | 12 || 2020006 | English | 70.0 | 12 || 2020005 | mysql | 98.0 | 1 || 2020001 | mysql | 90.0 | 2 || 2020008 | mysql | 90.0 | 2 || 2020011 | mysql | 90.0 | 2 || 2020004 | mysql | 80.0 | 5 || 2020003 | mysql | 78.0 | 6 || 2020010 | mysql | 75.0 | 7 || 2020009 | mysql | 70.0 | 8 || 2020006 | mysql | 60.0 | 9 || 2020002 | mysql | 50.0 | 10 || 2020007 | mysql | 50.0 | 10 |+---------+---------+-------+----+36 rows in set (0.01 sec)
這樣就實現了各種排序需求。
NTILE
NTILE函數的作用是對每個分組排名後,再將對應分組分成N個小組,例如
mysql> select stu_no,course,score, rank()over(partition by course order by score desc )rn,NTILE(2)over(partition by course order by score desc ) rn_group fromtb_score;+---------+---------+-------+----+----------+| stu_no | course | score | rn | rn_group |+---------+---------+-------+----+----------+| 2020005 | C++ | 96.0 | 1 | 1 || 2020013 | C++ | 96.0 | 1 | 1 || 2020006 | C++ | 90.0 | 3 | 1 || 2020001 | C++ | 85.0 | 4 | 1 || 2020012 | C++ | 85.0 | 4 | 1 || 2020003 | C++ | 81.0 | 6 | 1 || 2020010 | C++ | 76.0 | 7 | 2 || 2020002 | C++ | 70.0 | 8 | 2 || 2020008 | C++ | 69.0 | 9 | 2 || 2020007 | C++ | 66.0 | 10 | 2 || 2020009 | C++ | 66.0 | 10 | 2 || 2020004 | C++ | 60.0 | 12 | 2 || 2020003 | English | 100.0 | 1 | 1 || 2020004 | English | 100.0 | 1 | 1 || 2020002 | English | 99.0 | 3 | 1 || 2020013 | English | 88.0 | 4 | 1 || 2020008 | English | 86.0 | 5 | 1 || 2020009 | English | 86.0 | 5 | 1 || 2020011 | English | 84.0 | 7 | 1 || 2020010 | English | 81.0 | 8 | 2 || 2020003 | English | 80.0 | 9 | 2 || 2020007 | English | 76.0 | 10 | 2 || 2020012 | English | 75.0 | 11 | 2 || 2020005 | English | 70.0 | 12 | 2 || 2020006 | English | 70.0 | 12 | 2 || 2020005 | mysql | 98.0 | 1 | 1 || 2020001 | mysql | 90.0 | 2 | 1 || 2020008 | mysql | 90.0 | 2 | 1 || 2020011 | mysql | 90.0 | 2 | 1 || 2020004 | mysql | 80.0 | 5 | 1 || 2020003 | mysql | 78.0 | 6 | 1 || 2020010 | mysql | 75.0 | 7 | 2 || 2020009 | mysql | 70.0 | 8 | 2 || 2020006 | mysql | 60.0 | 9 | 2 || 2020002 | mysql | 50.0 | 10 | 2 || 2020007 | mysql | 50.0 | 10 | 2 |+---------+---------+-------+----+----------+36 rows in set (0.01 sec)
3、窗口函數小結
類別 | 函數 | 說明 |
排序 | ROW_NUMBER | 為表中的每一行分配一個序號,可以指定分組(也可以不指定)及排序欄位 |
DENSE_RANK | 根據排序欄位為每個分組中的每一行分配一個序號。 排名值相同時,序號相同,序號中沒有間隙(1,1,2, 3 這種) | |
RANK | 根據排序欄位為每個分組中的每一行分配一個序號。 排名值相同時,序號相同,但序號中存在間隙(1,1,3, 4 這種) | |
NTILE | 根據排序欄位為每個分組中根據指定欄位的排序再分成對應的組 | |
分布 | PERCENT_RANK | 計算各分組或結果集中行的百分數等級 |
CUME_DIST | 計算某個值在一組有序的數據中累計的分布 | |
前後 | LEAD | 返回分組中當前行之後的第N行的值。如果不存在對應行,則返回NULL。比如N= 1 時,第一名對應的值是第二名的,最後一名結果是NULL |
LAG | 返回分組中當前行之前的第N行的值。如果不存在對應行,則返回NULL。比如N= 1 時,第一名對應的值是是NUL,最後一名結果是倒數第 2 的值 | |
首尾中 | FIRST_VALUE | 返回每個分組中第一名對應的欄位(或表達式)的值,例如本文中可以是第一名的分數、學號等任意欄位的值 |
LAST_VALUE | 返回每個分組中最後一名對應的欄位(或表達式)的值,例如本文中可以是最後一名的分數、學號等任意欄位的值 | |
NTH_VALUE | 返回每個分組中排名第N的對應欄位(或表達式)的值,但小於N的行對應的值是NULL |
MySQL中主要的窗口函數先總結這麼多,建議還是得動手實踐一番。另外,MySQL5. 7 及之前版本的排序方式的實現很多人已總結,也建議實操一番。
本文轉載自微信公眾號【資料庫乾貨鋪】