Skip to content

Multiprocessing

๐Ÿš€ Multiprocessing & Parallelism

For CPU-bound tasks (complex calculations, heavy data transformations), Multiprocessing is necessary to use all available CPU cores.


๐Ÿ—๏ธ 1. Multiprocessing vs. Multithreading

  • Multithreading: Runs on one CPU core (due to GIL). Best for I/O-bound tasks.
  • Multiprocessing: Runs on multiple CPU cores. Each process has its own memory space and GIL. Best for CPU-bound tasks.

๐Ÿ› ๏ธ 2. ProcessPoolExecutor

The concurrent.futures module provides a high-level API for running functions in parallel.

from concurrent.futures import ProcessPoolExecutor
import time

def cpu_intensive_task(n):
    # Simulate a heavy calculation
    return sum(i * i for i in range(n))

def main():
    numbers = [10_000_000] * 4
    
    with ProcessPoolExecutor() as executor:
        # Submit tasks in parallel
        results = list(executor.map(cpu_intensive_task, numbers))
        print(results)

if __name__ == "__main__":
    main()

๐Ÿ’พ 3. Shared Memory & Inter-process Communication (IPC)

Processes donโ€™t share memory by default. To share data, use:

  • Queues: multiprocessing.Queue for passing objects.
  • Shared Memory: multiprocessing.Value or Array for simple types.
  • Managers: multiprocessing.Manager for sharing dicts or lists.

๐Ÿšฆ 4. Best Practices

  1. Pickling: Every object passed between processes must be โ€œpicklable.โ€
  2. Main Block: Always wrap your entry point in if __name__ == "__main__": to avoid recursive imports on Windows.
  3. Overhead: Creating processes is expensive. Donโ€™t use it for very small tasks.