The collections
module in Python provides alternative container datatypes that offer specialized data structures to manage data efficiently. These data structures are often more efficient and expressive than the built-in data structures such as lists, dictionaries, and tuples.
Table of Contents
- Introduction
namedtuple
deque
ChainMap
Counter
OrderedDict
defaultdict
UserDict
UserList
UserString
- Examples
- Using
namedtuple
- Using
deque
- Using
ChainMap
- Using
Counter
- Using
OrderedDict
- Using
defaultdict
- Using
UserDict
,UserList
,UserString
- Real-World Use Case
- Conclusion
- References
Introduction
The collections
module implements specialized container datatypes providing alternatives to Python’s general-purpose built-in containers like dict
, list
, and tuple
. These include namedtuple
, deque
, Counter
, OrderedDict
, defaultdict
, and several user-defined container types that can be subclassed for customization.
namedtuple
The namedtuple
function returns a tuple subclass with named fields.
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, 22)
print(p.x, p.y)
Output:
11 22
deque
A deque
is a double-ended queue that supports adding and removing elements from either end.
from collections import deque
d = deque(['a', 'b', 'c'])
d.append('d')
d.appendleft('z')
print(d)
d.pop()
d.popleft()
print(d)
Output:
deque(['z', 'a', 'b', 'c', 'd'])
deque(['a', 'b', 'c'])
ChainMap
A ChainMap
groups multiple dictionaries into a single view.
from collections import ChainMap
a = {'x': 1, 'y': 2}
b = {'y': 3, 'z': 4}
c = ChainMap(a, b)
print(c)
print(c['y'])
Output:
ChainMap({'x': 1, 'y': 2}, {'y': 3, 'z': 4})
2
Counter
A Counter
is a dictionary subclass for counting hashable objects.
from collections import Counter
cnt = Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print(cnt)
print(cnt.most_common(2))
Output:
Counter({'b': 3, 'a': 2, 'c': 1})
[('b', 3), ('a', 2)]
OrderedDict
An OrderedDict
is a dictionary subclass that remembers the order entries were added.
from collections import OrderedDict
od = OrderedDict()
od['one'] = 1
od['two'] = 2
od['three'] = 3
print(od)
Output:
OrderedDict({'one': 1, 'two': 2, 'three': 3})
defaultdict
A defaultdict
is a dictionary subclass that calls a factory function to supply missing values.
from collections import defaultdict
dd = defaultdict(int)
dd['a'] += 1
print(dd)
Output:
defaultdict(<class 'int'>, {'a': 1})
UserDict
A UserDict
is a wrapper around a dictionary object for easier subclassing.
from collections import UserDict
class MyDict(UserDict):
def __setitem__(self, key, value):
super().__setitem__(key, value*2)
d = MyDict()
d['a'] = 3
print(d)
Output:
{'a': 6}
UserList
A UserList
is a wrapper around a list object for easier subclassing.
from collections import UserList
class MyList(UserList):
def append(self, item):
super().append(item*2)
l = MyList()
l.append(3)
print(l)
Output:
[6]
UserString
A UserString
is a wrapper around a string object for easier subclassing.
from collections import UserString
class MyString(UserString):
def append(self, s):
self.data += s
s = MyString("Hello")
s.append(" World")
print(s)
Output:
Hello World
Examples
Using namedtuple
from collections import namedtuple
Car = namedtuple('Car', 'make model year')
my_car = Car('Toyota', 'Camry', 2020)
print(f'My car is a {my_car.year} {my_car.make} {my_car.model}.')
Output:
My car is a 2020 Toyota Camry.
Using deque
from collections import deque
dq = deque()
dq.append('a')
dq.append('b')
dq.appendleft('z')
print(dq)
dq.pop()
dq.popleft()
print(dq)
Output:
deque(['z', 'a', 'b'])
deque(['a'])
Using ChainMap
from collections import ChainMap
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
chain = ChainMap(dict1, dict2)
print(chain['a']) # 1
print(chain['b']) # 2 (from dict1)
print(chain['c']) # 4
Output:
1
2
4
Using Counter
from collections import Counter
cnt = Counter('abracadabra')
print(cnt)
print(cnt.most_common(3))
Output:
Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
[('a', 5), ('b', 2), ('r', 2)]
Using OrderedDict
from collections import OrderedDict
od = OrderedDict()
od['one'] = 1
od['two'] = 2
od['three'] = 3
for key, value in od.items():
print(key, value)
Output:
one 1
two 2
three 3
Using defaultdict
from collections import defaultdict
dd = defaultdict(list)
dd['a'].append(1)
dd['b'].append(2)
print(dd)
Output:
defaultdict(<class 'list'>, {'a': [1], 'b': [2]})
Using UserDict, UserList, UserString
from collections import UserDict, UserList, UserString
class MyDict(UserDict):
def __setitem__(self, key, value):
super().__setitem__(key, value*2)
class MyList(UserList):
def append(self, item):
super().append(item*2)
class MyString(UserString):
def append(self, s):
self.data += s
d = MyDict()
d['a'] = 3
print(d)
l = MyList()
l.append(3)
print(l)
s = MyString("Hello")
s.append(" World")
print(s)
Output:
{'a': 6}
[6]
Hello World
Real-World Use Case
Grouping Items with defaultdict
Suppose you have a list of employees and their departments. You want to group employees by department.
from collections import defaultdict
employees = [
('John', 'HR'),
('Alice', 'Engineering'),
('Bob', 'HR'),
('Charlie', 'Engineering'),
('Daisy', 'Marketing')
]
dept_dict = defaultdict(list)
for name, dept in employees:
dept_dict[dept].append(name)
for dept, names in dept_dict.items():
print(f"{dept}: {', '.join(names)}")
Output:
HR: John, Bob
Engineering: Alice, Charlie
Marketing: Daisy
Conclusion
The collections
module in Python provides a variety of specialized data structures that make it easier to handle different types of data efficiently and expressively. By using these structures, you can write more readable, maintainable, and efficient code.