The multiprocessing module provides a simple interface for creating and managing separate processes. Unlike threads, which share the same memory space, processes have their own memory space, making them safer for CPU-bound tasks. This article will cover creating processes, managing inter-process communication (IPC), and using synchronization primitives to coordinate between processes.

Creating Processes

To create a new process, you can use the Process class from the multiprocessing module. Here’s a simple example of how to create and start a process:

import multiprocessing
import time

def worker(name):
    print(f'Worker {name} starting')
    time.sleep(2)
    print(f'Worker {name} finished')

if __name__ == '__main__':
    processes = []
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

Explanation

  1. Defining the Worker Function: The worker function simulates a task by sleeping for 2 seconds.
  2. Creating Processes: We create five processes, each running the worker function with a different argument.
  3. Starting Processes: Each process is started with the start() method.
  4. Joining Processes: The join() method ensures that the main program waits for all processes to complete before exiting.

Inter-Process Communication (IPC)

When processes need to communicate, the multiprocessing module provides several IPC mechanisms, including pipes and queues. Here’s an example using a Queue for communication:

import multiprocessing

def worker(queue):
    queue.put('Hello from worker!')

if __name__ == '__main__':
    queue = multiprocessing.Queue()
    p = multiprocessing.Process(target=worker, args=(queue,))
    p.start()
    message = queue.get()
    p.join()
    print(f'Received: {message}')

Explanation

  1. Creating a Queue: We create a Queue instance for communication between the main process and the worker.
  2. Sending Messages: The worker puts a message into the queue.
  3. Receiving Messages: The main process retrieves the message using get().

Synchronization Primitives

To manage access to shared resources, the multiprocessing module provides several synchronization primitives, including Lock, Event, and Semaphore. Here’s an example using a Lock:

import multiprocessing
import time

def worker(lock, num):
    with lock:
        print(f'Worker {num} is accessing shared resource')
        time.sleep(1)

if __name__ == '__main__':
    lock = multiprocessing.Lock()
    processes = [multiprocessing.Process(target=worker, args=(lock, i)) for i in range(5)]
    
    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

Explanation

  1. Creating a Lock: A Lock is created to control access to the shared resource.
  2. Using the Lock: The worker acquires the lock using a context manager (with statement), ensuring that only one worker accesses the resource at a time.

Best Practices

  1. Avoid Global Variables: Since each process has its own memory space, avoid using global variables for shared state. Instead, use IPC mechanisms.
  2. Use if __name__ == '__main__': This is crucial to prevent recursive process creation on Windows.
  3. Limit the Number of Processes: Spawning too many processes can lead to overhead. Use a pool of workers with multiprocessing.Pool for better resource management.
  4. Handle Exceptions: Ensure that exceptions in worker processes are handled properly, as they won't propagate to the main process.

Conclusion

The multiprocessing module is a powerful tool for achieving concurrency in Python. By leveraging processes instead of threads, you can efficiently utilize CPU resources for CPU-bound tasks. Understanding how to create processes, manage IPC, and synchronize access to shared resources will help you write robust concurrent applications.

Learn more with useful resources: