Python的collections原來這麼好用!

2021-03-02 Python開發與大數據人工智慧

點擊上方藍色「Python開發與人工智慧」，選擇「設為星標」

學最好的別人，做最好的我們

collections是實現了特定目標的容器，以提供Python標準內建容器 dict , list , set , 和 tuple 的替代選擇。為了讓大家更好的認識，本文詳細總結collections的相關知識，一起來學習吧！
collections模塊：實現了特定目標的容器，以提供Python標準內建容器 dict、list、set、tuple 的替代選擇。Counter：字典的子類，提供了可哈希對象的計數功能。defaultdict：字典的子類，提供了一個工廠函數，為字典查詢提供了默認值。OrderedDict：字典的子類，保留了他們被添加的順序。namedtuple：創建命名元組子類的工廠函數。deque：類似列表容器，實現了在兩端快速添加(append)和彈出(pop)。ChainMap：類似字典的容器類，將多個映射集合到一個視圖裡面。Counter是一個dict子類，主要是用來對你訪問的對象的頻率進行計數。

>>> import collections>>> # 統計字符出現的次數... collections.Counter('hello world')Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})>>> # 統計單詞個數... collections.Counter('hello world hello lucy'.split())Counter({'hello': 2, 'world': 1, 'lucy': 1})
>>> c = collections.Counter('hello world hello lucy'.split())>>> cCounter({'hello': 2, 'world': 1, 'lucy': 1})>>> # 獲取指定對象的訪問次數，也可以使用get方法... c['hello']2>>> # 查看元素... list(c.elements())['hello', 'hello', 'world', 'lucy']>>> c1 = collections.Counter('hello world'.split())>>> c2 = collections.Counter('hello lucy'.split())>>> c1Counter({'hello': 1, 'world': 1})>>> c2Counter({'hello': 1, 'lucy': 1})>>> # 追加對象，+或者c1.update(c2)... c1+c2Counter({'hello': 2, 'world': 1, 'lucy': 1})>>> # 減少對象，-或者c1.subtract(c2)... c1-c2Counter({'world': 1})>>> # 清除... c.clear()>>> cCounter()
返回一個新的類似字典的對象。defaultdict 是內置 dict 類的子類。class collections.defaultdict([default_factory[, ...]])>>> d = collections.defaultdict()>>> ddefaultdict(None, {})>>> e = collections.defaultdict(str)>>> edefaultdict(<class 'str'>, {})
defaultdict的一個典型用法是使用其中一種內置類型（如str、int、list或dict等）作為默認工廠，這些內置類型在沒有參數調用時返回空類型。>>> e = collections.defaultdict(str)>>> edefaultdict(<class 'str'>, {})>>> e['hello']''>>> edefaultdict(<class 'str'>, {'hello': ''})>>> # 普通字典調用不存在的鍵時，報錯... e1 = {}>>> e1['hello']Traceback (most recent call last):  File "<stdin>", line 1, in <module>KeyError: 'hello'
使用 int 作為 default_factory
>>> fruit = collections.defaultdict(int)>>> fruit['apple'] = 2>>> fruitdefaultdict(<class 'int'>, {'apple': 2})>>> fruit['banana']  # 沒有對象時，返回00>>> fruitdefaultdict(<class 'int'>, {'apple': 2, 'banana': 0})
使用 list 作為 default_factory>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]>>> d = collections.defaultdict(list)>>> for k,v in s:...     d[k].append(v)...>>> ddefaultdict(<class 'list'>, {'yellow': [1, 3], 'blue': [2, 4], 'red': [1]})>>> d.items()dict_items([('yellow', [1, 3]), ('blue', [2, 4]), ('red', [1])])>>> sorted(d.items())[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
使用 dict 作為 default_factory>>> nums = collections.defaultdict(dict)>>> nums[1] = {'one':1}>>> numsdefaultdict(<class 'dict'>, {1: {'one': 1}})>>> nums[2]{}>>> numsdefaultdict(<class 'dict'>, {1: {'one': 1}, 2: {}})
使用 set 作為 default_factory>>> types = collections.defaultdict(set)>>> types['手機'].add('華為')>>> types['手機'].add('小米')>>> types['顯示器'].add('AOC')>>> typesdefaultdict(<class 'set'>, {'手機': {'華為', '小米'}, '顯示器': {'AOC'}})
Python字典中的鍵的順序是任意的，它們不受添加的順序的控制。collections.OrderedDict 類提供了保留他們添加順序的字典對象。>>> o = collections.OrderedDict()>>> o['k1'] = 'v1'>>> o['k3'] = 'v3'>>> o['k2'] = 'v2'>>> oOrderedDict([('k1', 'v1'), ('k3', 'v3'), ('k2', 'v2')])
如果在已經存在的 key 上添加新的值，將會保留原來的 key 的位置，然後覆蓋 value 值。>>> o['k1'] = 666>>> oOrderedDict([('k1', 666), ('k3', 'v3'), ('k2', 'v2')])>>> dict(o){'k1': 666, 'k3': 'v3', 'k2': 'v2'}
三種定義命名元組的方法：第一個參數是命名元組的構造器（如下的：Person1，Person2，Person3）>>> P1 = collections.namedtuple('Person1',['name','age','height'])>>> P2 = collections.namedtuple('Person2','name,age,height')>>> P3 = collections.namedtuple('Person3','name age height')
>>> lucy = P1('lucy',23,180)>>> lucyPerson1(name='lucy', age=23, height=180)>>> jack = P2('jack',20,190)>>> jackPerson2(name='jack', age=20, height=190)>>> lucy.name  # 直接通過 實例名.屬性 來調用'lucy'>>> lucy.age23
collections.deque 返回一個新的雙向隊列對象，從左到右初始化（用方法 append()），從 iterable（迭代對象）數據創建。如果 iterable 沒有指定，新隊列為空。collections.deque 隊列支持線程安全，對於從兩端添加（append）或者彈出（pop），複雜度O(1)。雖然 list 對象也支持類似操作，但是這裡優化了定長操作（pop(0)、insert(0,v)）的開銷。如果 maxlen 沒有指定或者是 None ，deque 可以增長到任意長度。否則，deque 就限定到指定最大長度。一旦限定長度的 deque 滿了，當新項加入時，同樣數量的項就從另一端彈出。>>> d = collections.deque(maxlen=10)>>> ddeque([], maxlen=10)>>> d.extend('python')>>> [i.upper() for i in d]['P', 'Y', 'T', 'H', 'O', 'N']>>> d.append('e')>>> d.appendleft('f')>>> d.appendleft('g')>>> d.appendleft('h')>>> ddeque(['h', 'g', 'f', 'p', 'y', 't', 'h', 'o', 'n', 'e'], maxlen=10)>>> d.appendleft('i')>>> ddeque(['i', 'h', 'g', 'f', 'p', 'y', 't', 'h', 'o', 'n'], maxlen=10)>>> d.append('m')>>> ddeque(['h', 'g', 'f', 'p', 'y', 't', 'h', 'o', 'n', 'm'], maxlen=10)
問題背景是我們有多個字典或者映射，想把它們合併成為一個單獨的映射，有人說可以用update進行合併，這樣做的問題就是新建了一個數據結構以致於當我們對原來的字典進行更改的時候不會同步。如果想建立一個同步的查詢方法，可以使用 ChainMap。可以用來合併兩個或者更多個字典，當查詢的時候，從前往後依次查詢。簡單使用：>>> d1 = {'apple':1,'banana':2}>>> d2 = {'orange':2,'apple':3,'pike':1}>>> combined1 = collections.ChainMap(d1,d2)>>> combined2 = collections.ChainMap(d2,d1)>>> combined1ChainMap({'apple': 1, 'banana': 2}, {'orange': 2, 'apple': 3, 'pike': 1})>>> combined2ChainMap({'orange': 2, 'apple': 3, 'pike': 1}, {'apple': 1, 'banana': 2})>>> for k,v in combined1.items():...     print(k,v)...orange 2apple 1pike 1banana 2>>> for k,v in combined2.items():...     print(k,v)...apple 3banana 2orange 2pike 1
有一個注意點就是當對ChainMap進行修改的時候總是只會對第一個字典進行修改，如果第一個字典不存在該鍵，會添加。>>> d1 = {'apple':1,'banana':2}>>> d2 = {'orange':2,'apple':3,'pike':1}>>> c = collections.ChainMap(d1,d2)>>> cChainMap({'apple': 1, 'banana': 2}, {'orange': 2, 'apple': 3, 'pike': 1})>>> c['apple']1>>> c['apple'] = 2>>> cChainMap({'apple': 2, 'banana': 2}, {'orange': 2, 'apple': 3, 'pike': 1})>>> c['pike']1>>> c['pike'] = 3>>> cChainMap({'apple': 2, 'banana': 2, 'pike': 3}, {'orange': 2, 'apple': 3, 'pike': 1})
從原理上面講，ChainMap 實際上是把放入的字典存儲在一個隊列中，當進行字典的增加刪除等操作只會在第一個字典上進行，當進行查找的時候會依次查找，new_child() 方法實質上是在列表的第一個元素前放入一個字典，默認是{}，而 parents 是去掉了列表開頭的元素。>>> a = collections.ChainMap()>>> a['x'] = 1>>> aChainMap({'x': 1})>>> b = a.new_child()>>> bChainMap({}, {'x': 1})>>> b['x'] = 2>>> bChainMap({'x': 2}, {'x': 1})>>> b['y'] = 3>>> bChainMap({'x': 2, 'y': 3}, {'x': 1})>>> aChainMap({'x': 1})>>> c = a.new_child()>>> cChainMap({}, {'x': 1})>>> c['x'] = 1>>> c['y'] = 1>>> cChainMap({'x': 1, 'y': 1}, {'x': 1})>>> d = c.parents>>> dChainMap({'x': 1})>>> d is aFalse>>> d == aTrue
>>> a = {'x':1,'z':3}>>> b = {'y':2,'z':4}>>> c = collections.ChainMap(a,b)>>> cChainMap({'x': 1, 'z': 3}, {'y': 2, 'z': 4})>>> c.maps[{'x': 1, 'z': 3}, {'y': 2, 'z': 4}]>>> c.parentsChainMap({'y': 2, 'z': 4})>>> c.parents.maps[{'y': 2, 'z': 4}]>>> c.parents.parentsChainMap({})>>> c.parents.parents.parentsChainMap({})

長按掃碼，關注公眾號
點擊「閱讀原文」，領取 2021 年最新免費技術資料大全

Python的collections原來這麼好用!

相關焦點

【Python基礎】Python之collections庫-Counter

四種高性能數據類型,Python collections助你優化代碼、簡潔任務

一文看懂Python collections模塊的高效數據類型

python進階,詳解collections工具庫!

Python標準庫 collections,你用過嗎?

二十九、深入Python中的 collections 模塊

Python 這3種超方便的容器你都用過嗎?

【萬字長文詳解】Python庫collections,讓你擊敗99%的Pythoner

如何在Python 3中使用collections模塊

如何在python中引入高性能數據類型?

非常有用的 Python 技巧

Python——詳解collections工具庫,一篇文章全搞定

Python字典裡的5個黑魔法

python中namedtuple與OrderedDict 的使用

python 3.x 與2.x的區別

用 Python 分析了 5 萬條相親數據,告訴你男女相親背後的秘密

《流暢的python》閱讀筆記(上)

10 個 Python 字符串處理技巧

Python讀寫Excel表格,就是這麼簡單粗暴又好用

Python最冷門的模塊