Multiprocessing
๐ Multiprocessing & Parallelism
For CPU-bound tasks (complex calculations, heavy data transformations), Multiprocessing is necessary to use all available CPU cores.
๐๏ธ 1. Multiprocessing vs. Multithreading
- Multithreading: Runs on one CPU core (due to GIL). Best for I/O-bound tasks.
- Multiprocessing: Runs on multiple CPU cores. Each process has its own memory space and GIL. Best for CPU-bound tasks.
๐ ๏ธ 2. ProcessPoolExecutor
The concurrent.futures module provides a high-level API for running functions in parallel.
from concurrent.futures import ProcessPoolExecutor
import time
def cpu_intensive_task(n):
# Simulate a heavy calculation
return sum(i * i for i in range(n))
def main():
numbers = [10_000_000] * 4
with ProcessPoolExecutor() as executor:
# Submit tasks in parallel
results = list(executor.map(cpu_intensive_task, numbers))
print(results)
if __name__ == "__main__":
main()๐พ 3. Shared Memory & Inter-process Communication (IPC)
Processes donโt share memory by default. To share data, use:
- Queues:
multiprocessing.Queuefor passing objects. - Shared Memory:
multiprocessing.ValueorArrayfor simple types. - Managers:
multiprocessing.Managerfor sharing dicts or lists.
๐ฆ 4. Best Practices
- Pickling: Every object passed between processes must be โpicklable.โ
- Main Block: Always wrap your entry point in
if __name__ == "__main__":to avoid recursive imports on Windows. - Overhead: Creating processes is expensive. Donโt use it for very small tasks.