Python collections Module

The collections module in Python provides alternative container datatypes that offer specialized data structures to manage data efficiently. These data structures are often more efficient and expressive than the built-in data structures such as lists, dictionaries, and tuples.

Table of Contents

  1. Introduction
  2. namedtuple
  3. deque
  4. ChainMap
  5. Counter
  6. OrderedDict
  7. defaultdict
  8. UserDict
  9. UserList
  10. UserString
  11. Examples
  • Using namedtuple
  • Using deque
  • Using ChainMap
  • Using Counter
  • Using OrderedDict
  • Using defaultdict
  • Using UserDict, UserList, UserString
  1. Real-World Use Case
  2. Conclusion
  3. References

Introduction

The collections module implements specialized container datatypes providing alternatives to Python’s general-purpose built-in containers like dict, list, and tuple. These include namedtuple, deque, Counter, OrderedDict, defaultdict, and several user-defined container types that can be subclassed for customization.

namedtuple

The namedtuple function returns a tuple subclass with named fields.

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(11, 22)
print(p.x, p.y)

Output:

11 22

deque

A deque is a double-ended queue that supports adding and removing elements from either end.

from collections import deque

d = deque(['a', 'b', 'c'])
d.append('d')
d.appendleft('z')
print(d)
d.pop()
d.popleft()
print(d)

Output:

deque(['z', 'a', 'b', 'c', 'd'])
deque(['a', 'b', 'c'])

ChainMap

A ChainMap groups multiple dictionaries into a single view.

from collections import ChainMap

a = {'x': 1, 'y': 2}
b = {'y': 3, 'z': 4}
c = ChainMap(a, b)
print(c)
print(c['y'])

Output:

ChainMap({'x': 1, 'y': 2}, {'y': 3, 'z': 4})
2

Counter

A Counter is a dictionary subclass for counting hashable objects.

from collections import Counter

cnt = Counter(['a', 'b', 'c', 'a', 'b', 'b'])
print(cnt)
print(cnt.most_common(2))

Output:

Counter({'b': 3, 'a': 2, 'c': 1})
[('b', 3), ('a', 2)]

OrderedDict

An OrderedDict is a dictionary subclass that remembers the order entries were added.

from collections import OrderedDict

od = OrderedDict()
od['one'] = 1
od['two'] = 2
od['three'] = 3
print(od)

Output:

OrderedDict({'one': 1, 'two': 2, 'three': 3})

defaultdict

A defaultdict is a dictionary subclass that calls a factory function to supply missing values.

from collections import defaultdict

dd = defaultdict(int)
dd['a'] += 1
print(dd)

Output:

defaultdict(<class 'int'>, {'a': 1})

UserDict

A UserDict is a wrapper around a dictionary object for easier subclassing.

from collections import UserDict

class MyDict(UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value*2)

d = MyDict()
d['a'] = 3
print(d)

Output:

{'a': 6}

UserList

A UserList is a wrapper around a list object for easier subclassing.

from collections import UserList

class MyList(UserList):
    def append(self, item):
        super().append(item*2)

l = MyList()
l.append(3)
print(l)

Output:

[6]

UserString

A UserString is a wrapper around a string object for easier subclassing.

from collections import UserString

class MyString(UserString):
    def append(self, s):
        self.data += s

s = MyString("Hello")
s.append(" World")
print(s)

Output:

Hello World

Examples

Using namedtuple

from collections import namedtuple

Car = namedtuple('Car', 'make model year')
my_car = Car('Toyota', 'Camry', 2020)
print(f'My car is a {my_car.year} {my_car.make} {my_car.model}.')

Output:

My car is a 2020 Toyota Camry.

Using deque

from collections import deque

dq = deque()
dq.append('a')
dq.append('b')
dq.appendleft('z')
print(dq)
dq.pop()
dq.popleft()
print(dq)

Output:

deque(['z', 'a', 'b'])
deque(['a'])

Using ChainMap

from collections import ChainMap

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
chain = ChainMap(dict1, dict2)
print(chain['a'])  # 1
print(chain['b'])  # 2 (from dict1)
print(chain['c'])  # 4

Output:

1
2
4

Using Counter

from collections import Counter

cnt = Counter('abracadabra')
print(cnt)
print(cnt.most_common(3))

Output:

Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
[('a', 5), ('b', 2), ('r', 2)]

Using OrderedDict

from collections import OrderedDict

od = OrderedDict()
od['one'] = 1
od['two'] = 2
od['three'] = 3
for key, value in od.items():
    print(key, value)

Output:

one 1
two 2
three 3

Using defaultdict

from collections import defaultdict

dd = defaultdict(list)
dd['a'].append(1)
dd['b'].append(2)
print(dd)

Output:

defaultdict(<class 'list'>, {'a': [1], 'b': [2]})

Using UserDict, UserList, UserString

from collections import UserDict, UserList, UserString

class MyDict(UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value*2)

class MyList(UserList):
    def append(self, item):
        super().append(item*2)

class MyString(UserString):
    def append(self, s):
        self.data += s

d = MyDict()
d['a'] = 3
print(d)

l = MyList()
l.append(3)
print(l)

s = MyString("Hello")
s.append(" World")
print(s)

Output:

{'a': 6}
[6]
Hello World

Real-World Use Case

Grouping Items with defaultdict

Suppose you have a list of employees and their departments. You want to group employees by department.

from collections import defaultdict

employees = [
    ('John', 'HR'),
    ('Alice', 'Engineering'),
    ('Bob', 'HR'),
    ('Charlie', 'Engineering'),
    ('Daisy', 'Marketing')
]

dept_dict = defaultdict(list)
for name, dept in employees:
    dept_dict[dept].append(name)

for dept, names in dept_dict.items():
    print(f"{dept}: {', '.join(names)}")

Output:

HR: John, Bob
Engineering: Alice, Charlie
Marketing: Daisy

Conclusion

The collections module in Python provides a variety of specialized data structures that make it easier to handle different types of data efficiently and expressively. By using these structures, you can write more readable, maintainable, and efficient code.

References

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top