Python > Modules and Packages > Standard Library > Concurrency and Parallelism (`threading`, `multiprocessing` modules)
Basic Multiprocessing Example
This snippet showcases the multiprocessing module for parallel execution. It spawns two processes that each execute the square_numbers function, calculating the squares of numbers from 1 to 5. Unlike threading, multiprocessing bypasses the GIL, enabling true parallel processing on multi-core systems.
Code
The multiprocessing module creates separate processes, each with its own Python interpreter. The square_numbers function calculates the square of each number. We create two processes and start them. process.join() ensures the main program waits for the child processes to complete. This approach leverages multiple CPU cores for true parallel execution, which is beneficial for CPU-bound tasks.
import multiprocessing
import time
def square_numbers(process_id):
for i in range(1, 6):
time.sleep(0.1) # Simulate some work
print(f"Process {process_id}: {i} squared = {i*i}")
if __name__ == "__main__":
# Create two processes
process1 = multiprocessing.Process(target=square_numbers, args=(1,))
process2 = multiprocessing.Process(target=square_numbers, args=(2,))
# Start the processes
process1.start()
process2.start()
# Wait for the processes to finish
process1.join()
process2.join()
print("All processes finished.")
Concepts Behind the Snippet
multiprocessing module enables true parallelism in Python by creating separate processes for each task.multiprocessing.Process: This class is used to create new processes. You specify the target function and any arguments to pass to that function.process.start(): This method starts the process's execution.process.join(): This method blocks the calling process (in this case, the main process) until the process whose join() method is called completes its execution.
Real-Life Use Case
Consider a scenario where you need to perform a computationally intensive task, such as image processing or scientific simulations. You can divide the task into smaller subtasks and assign each subtask to a separate process. This allows you to utilize all available CPU cores and significantly reduce the overall execution time. Machine Learning model training often uses multiprocessing to speed up data preprocessing.
Best Practices
multiprocessing avoids direct memory sharing, be careful about external shared resources like files or databases. Coordinate access to these resources to prevent corruption.multiprocessing.Queue class. It's a thread-safe and process-safe queue that allows you to pass data between processes.multiprocessing for very short or simple tasks, as the overhead might outweigh the benefits of parallelism.multiprocessing.Pool class. It provides a convenient way to distribute tasks among a pool of worker processes.
Interview Tip
Understand the difference between threading and multiprocessing, and when to use each. Explain how multiprocessing overcomes the GIL limitation. Discuss common IPC mechanisms and the challenges of managing communication between processes.
When to Use Multiprocessing
Multiprocessing is ideal for CPU-bound tasks that can be divided into independent subtasks. Examples include numerical computations, image processing, video encoding, and parallel data analysis. If your code is primarily waiting for I/O, then threading or asyncio might be more appropriate.
Memory Footprint
Processes have a larger memory footprint compared to threads because each process has its own memory space. This can be a concern if you are creating a large number of processes or dealing with large datasets.
Alternatives
threading: Suitable for I/O-bound tasks where the GIL is less of a bottleneck.asyncio: For asynchronous programming and event-driven concurrency, especially useful for handling many concurrent I/O operations efficiently.
Pros
Cons
FAQ
-
How does
multiprocessingovercome the GIL limitation?
Each process created bymultiprocessinghas its own Python interpreter and its own memory space. This means that the GIL in one process does not affect other processes. Each process can execute Python bytecode in parallel on different CPU cores. -
How can I share data between processes?
You can share data between processes using IPC mechanisms such as queues (multiprocessing.Queue), pipes (multiprocessing.Pipe), or shared memory (multiprocessing.Value,multiprocessing.Array). Queues are the most common and convenient way to pass data between processes. -
What is a
multiprocessing.Pool, and when should I use it?
Amultiprocessing.Poolis a collection of worker processes that can be used to distribute tasks among them. You should use aPoolwhen you have a large number of tasks to perform and want to manage the processes efficiently. ThePoolclass handles the creation, management, and distribution of tasks among the worker processes.