
Mastering Python's `multiprocessing` Module: A Guide to Concurrency in Python
The multiprocessing module provides a simple interface for creating and managing separate processes. Unlike threads, which share the same memory space, processes have their own memory space, making them safer for CPU-bound tasks. This article will cover creating processes, managing inter-process communication (IPC), and using synchronization primitives to coordinate between processes.
Creating Processes
To create a new process, you can use the Process class from the multiprocessing module. Here’s a simple example of how to create and start a process:
import multiprocessing
import time
def worker(name):
print(f'Worker {name} starting')
time.sleep(2)
print(f'Worker {name} finished')
if __name__ == '__main__':
processes = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()
for p in processes:
p.join()Explanation
- Defining the Worker Function: The
workerfunction simulates a task by sleeping for 2 seconds. - Creating Processes: We create five processes, each running the
workerfunction with a different argument. - Starting Processes: Each process is started with the
start()method. - Joining Processes: The
join()method ensures that the main program waits for all processes to complete before exiting.
Inter-Process Communication (IPC)
When processes need to communicate, the multiprocessing module provides several IPC mechanisms, including pipes and queues. Here’s an example using a Queue for communication:
import multiprocessing
def worker(queue):
queue.put('Hello from worker!')
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
message = queue.get()
p.join()
print(f'Received: {message}')Explanation
- Creating a Queue: We create a
Queueinstance for communication between the main process and the worker. - Sending Messages: The worker puts a message into the queue.
- Receiving Messages: The main process retrieves the message using
get().
Synchronization Primitives
To manage access to shared resources, the multiprocessing module provides several synchronization primitives, including Lock, Event, and Semaphore. Here’s an example using a Lock:
import multiprocessing
import time
def worker(lock, num):
with lock:
print(f'Worker {num} is accessing shared resource')
time.sleep(1)
if __name__ == '__main__':
lock = multiprocessing.Lock()
processes = [multiprocessing.Process(target=worker, args=(lock, i)) for i in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()Explanation
- Creating a Lock: A
Lockis created to control access to the shared resource. - Using the Lock: The worker acquires the lock using a context manager (
withstatement), ensuring that only one worker accesses the resource at a time.
Best Practices
- Avoid Global Variables: Since each process has its own memory space, avoid using global variables for shared state. Instead, use IPC mechanisms.
- Use
if __name__ == '__main__': This is crucial to prevent recursive process creation on Windows. - Limit the Number of Processes: Spawning too many processes can lead to overhead. Use a pool of workers with
multiprocessing.Poolfor better resource management. - Handle Exceptions: Ensure that exceptions in worker processes are handled properly, as they won't propagate to the main process.
Conclusion
The multiprocessing module is a powerful tool for achieving concurrency in Python. By leveraging processes instead of threads, you can efficiently utilize CPU resources for CPU-bound tasks. Understanding how to create processes, manage IPC, and synchronize access to shared resources will help you write robust concurrent applications.
Learn more with useful resources:
