HIVE命令

2020-08-28 remember2668

1 資料庫的增刪改查

1.1 創建資料庫

（1）創建一個資料庫，資料庫在 HDFS 上的默認存儲路徑是/user/hive/warehouse/*.db。

hive (default)> create database db_hive;

（2）避免要創建的資料庫已經存在錯誤，增加 if not exists 判斷。

hive (default)> create database db_hive;FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.Database db_hive already existshive (default)> create database if not exists db_hive;

查看hdfs網頁的儲存路徑（hadoop111:50070）-- hadoop完全分布式集群

（3）創建一個資料庫，指定資料庫在 HDFS 上存放的位置

hive (default)> create database db_hive2 location &39;;

查看db_hive2.db資料庫是否創建成功

1.2 查詢資料庫

1.2.1 顯示資料庫

1、顯示資料庫

2、過濾顯示查詢的資料庫

1.2.2 查看資料庫詳情

1、顯示資料庫信息

2、顯示資料庫詳細信息，extended

1.2.3 切換當前資料庫

1.3 修改資料庫

用戶可以使用 ALTER DATABASE 命令為某個資料庫的 DBPROPERTIES 設置鍵-值對屬性

值，來描述這個資料庫的屬性信息。資料庫的其他元數據信息都是不可更改的，包括資料庫

名和資料庫所在的目錄位置。

1.4 刪除資料庫

1．刪除空資料庫

hive (db_hive)> drop database db_hive2;

2．如果刪除的資料庫不存在，最好採用 if exists 判斷資料庫是否存在

hive> drop database db_hive;FAILED: SemanticException [Error 10072]: Database does not exist:db_hivehive> drop database if exists db_hive2;

3．如果資料庫不為空，可以採用 cascade 命令，強制刪除

hive> drop database db_hive;FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.InvalidOperationException(message:Database db_hive is not empty.One or more tables exist.)hive> drop database db_hive cascade;

2 創建內部表

2.1 內部表定義

默認創建的表都是所謂的管理表，有時也被稱為內部表。因為這種表，Hive 會（或多

或少地）控制著數據的生命周期。Hive 默認情況下會將這些表的數據存儲在由配置項

hive.metastore.warehouse.dir(例如/user/hive/warehouse)所定義的目錄的子目錄下。當

我們刪除一個管理表時，Hive 也會刪除這個表中數據。管理表不適合和其他工具共享數據。

2.2 創建文件夾

1、在/home/data 創建一個 student.txt文件

[root@hadoop111 ~] cd /home/data

用vim命令輸入數據或用txt文件格式上傳至虛擬機

2.3 創建內部表

1、在 banzhang資料庫中創建 student表

2、查看創建的表

3、向內部表 student 導入數據

4、查詢結果

5、查詢表的類型（格式化數據）

3 外部表

3.1 外部表定義

因為表是外部表，所以 Hive 並非認為其完全擁有這份數據。刪除該表並不會刪除掉這

份數據，不過描述表的元數據信息會被刪除掉。

3.2 創建外部表

1、創建部門表

2、創建員工表

3、查看創建的表

4、向外部表導入數據

hive (default)> load data local inpath &39; into tabledefault.dept;hive (default)> load data local inpath &39; into tabledefault.emp;

5、查詢結果

3.3 內部表和外部表的相互轉換

1、查詢表的類型

2、修改內部表 student轉化為外部表

3、查看表的類型

4、將外部表 student 轉化為內部表

hive(banzhang)> alter table student set tblproperties(&39;=&39;);

5、查詢表的類型

4 分區表

4.1 分區表定義

分區表實際上就是對應一個 HDFS 文件系統上的獨立的文件夾，該文件夾下是該分區所

有的數據文件。Hive 中的分區就是分目錄，把一個大的數據集根據業務需要分割成小的數

據集。在查詢時通過 WHERE 子句中的表達式選擇查詢所需要的指定的分區，這樣的查詢效

率會提高很多。

4.2 創建分區表

1、創建表

hive (default)> create table dept_partition(deptno int, dname string, loc string)partitioned by (month string)row format delimited fields terminated by &39;;

2、加載數據到分區表中

hive (default)> load data local inpath &39; into table default.dept_partitionpartition(month=&39;);hive (default)> load data local inpath &39; into table default.dept_partitionpartition(month=&39;);hive (default)> load data local inpath &39; into table default.dept_partitionpartition(month=&39;);

3、查詢分區表數據

（1）單分區查詢

hive (default)> select * from dept_partition where month=&39;;

（2）多分區查詢

hive (default)> select * from dept_partition where month=&39;union select * from dept_partition where month=&39;union select * from dept_partition where month=&39;;

4.3 增加分區

1、創建單個分區

hive (default)> alter table dept_partition add partition(month=&39;);

2、同時創建多個分區

hive (default)> alter table dept_partition add partition(month=&39;) partition(month=&39;);

4.4 刪除分區

1、刪除單個分區

hive (default)> alter table dept_partition drop partition(month=&39;);

2、同時刪除多個分區

hive (default)> alter table dept_partition drop partition(month=&39;),partition (month=&39;);

註：刪除多個分區有逗號連接，而增加多個分區沒有逗號連接

4.5 查看分區表分區的個數

語法：show partitions 表名;

4.6 查詢分區表結構

hive> desc formatted dept_partition;

5 分桶表

5.1 分桶表定義

分區針對的是數據的存儲路徑；分桶針對的是數據文件。分區提供一個隔離數據和優化

查詢的便利方式。不過，並非所有的數據集都可形成合理的分區，特別是之前所提到過的要

確定合適的劃分大小這個疑慮。而分桶是將數據集分解成更容易管理的若干部分的另一個集

術。

5.2 直接導入數據文件的方式

1、創建分桶表

hive (default)> create table stu_buck(id int, name string)clustered by(id) into 4 bucketsrow format delimited fields terminated by &39;;

2、查看表結構

hive (default)> desc formatted stu_buck;Num Buckets: 4

3、導入數據到分桶表中（並沒有分成 4 個桶）

hive (default)> load data local inpath &39; into table stu_buck;

5.3 的查詢的方式導入

1、先建一個普通的 stu 表

hive (default)> create table stu(id int, name string)row format delimited fields terminated by &39;;

2、向普通的 stu 表中導入數據

hive (default)> load data local inpath &39; into table stu;

3、清空 stu_buck 表中數據

hive (default)> truncate table stu_buck;hive (default)> select * from stu_buck;

4、將 stu 表中的數據導入到分桶表，通過子查詢的方式

hive (default)> insert into table stu_buck select id, name from stu;

5、發現還是只有一個分桶，如下圖所

6、需要設置一個屬性

7、查詢分桶的數據（因為之前採用直接導入數據的方式，已經將/opt/module/data/stu.txt

導入到 stu_buck 表裡面，之後採用子查詢的方式又導入數據一遍，所以有重複的數據。）

5.4 分桶抽樣查詢

註：tablesample 是抽樣語句，語法：TABLESAMPLE(BUCKET x OUT OF y) 。

y 必須是 table 總 bucket 數的倍數或者因子。hive 根據 y 的大小，決定抽樣的比例。

例如，table 總共分了 4 份，當 y=2 時，抽取(4/2=)2 個 bucket 的數據。x 表示從哪個 bucket 開始抽取，如果需要取多個分區，以後的分區號為當前分區號加上 y。例如，table 總 bucket 數為 4，tablesample(bucket 1 out of 2)，表示總共抽取（4/2=2）個 bucket 的數據，抽取第 1(x)個和第 3(x+y)個 bucket 的數據。注意：x 的值必須小於等於 y 的值，否則 FAILED:SemanticException [Error 10061]: Numerator should not be bigger than denominator in sample clause for table

stu_buck

HIVE命令

1 資料庫的增刪改查

2 創建內部表

3 外部表

4 分區表

5 分桶表

相關焦點

Hive學習筆記，看懂 Hive

Hive的學習，Hive的DDL和DML

CentOS+Hadoop+MySQL安裝Hive

Hive分區表

Hive是什麼

大數據兵器譜之hive數據倉庫

hive學習筆記之九：基礎UDF

如何使用docker快速搭建hive環境

大數據實踐HIVE詳解

hive學習筆記之四：分區表

Hive的學習，Hive的查詢與自定義函數

hive學習筆記之八：Sqoop

HIVE的安裝與配置

hive學習筆記之十一：UDTF

數倉工程師的利器-HIVE詳解

大數據之Hive安裝配置

什麼是hive？一篇文章講解清楚

0011-如何在Hive&Impala中使用UDF

hive學習筆記之三：內部表和外部表

hive學習筆記之七：內置函數