Python tutorials > Advanced Python Concepts > Concurrency and Parallelism > What are synchronization primitives?
What are synchronization primitives?
Synchronization primitives are essential tools in concurrent programming that allow multiple threads or processes to access shared resources safely. They prevent race conditions and ensure data consistency. In Python, the threading
and multiprocessing
modules provide several synchronization primitives like locks, semaphores, conditions, and barriers.
Core Concept: Synchronization Primitives
Synchronization primitives are mechanisms designed to coordinate the execution of multiple threads or processes that are accessing shared resources. Without them, concurrent access can lead to data corruption or unexpected behavior. These primitives manage access in a way that enforces mutual exclusion (only one thread/process can access a resource at a time) or coordination between threads/processes (e.g., signaling when a resource becomes available).
Lock: Mutual Exclusion
A Lock
is the most basic synchronization primitive. It allows only one thread to acquire the lock at a time, preventing other threads from accessing the critical section until the lock is released. The acquire()
method blocks until the lock is free, and the release()
method releases the lock, allowing another waiting thread to acquire it. Using a try...finally
block ensures the lock is always released, even if an exception occurs within the critical section. This avoids deadlocks.
import threading
lock = threading.Lock()
shared_resource = 0
def increment():
global shared_resource
for _ in range(100000):
lock.acquire()
try:
shared_resource += 1
finally:
lock.release()
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f'Final shared resource value: {shared_resource}') # Expected: 200000
Semaphore: Controlling Access to Limited Resources
A Semaphore
manages access to a limited number of resources. It maintains a counter, and each acquire()
call decrements the counter. If the counter is zero, the thread blocks until another thread releases a resource by calling release()
, which increments the counter. In this example, the semaphore is initialized to 2, meaning that at most two threads can access the resource simultaneously. This is useful when you have a limited pool of connections or licenses.
import threading
import time
semaphore = threading.Semaphore(2) # Only allow 2 threads to access the resource simultaneously
def access_resource(thread_id):
semaphore.acquire()
try:
print(f'Thread {thread_id}: Accessing the resource')
time.sleep(1) # Simulate resource usage
print(f'Thread {thread_id}: Releasing the resource')
finally:
semaphore.release()
threads = []
for i in range(5):
t = threading.Thread(target=access_resource, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
Condition: Signaling between Threads
A Condition
allows threads to wait for a specific condition to become true. Threads must acquire the condition's lock before waiting or notifying. The wait()
method releases the lock and blocks until another thread calls notify()
or notify_all()
. The notify()
method wakes up one waiting thread, while notify_all()
wakes up all waiting threads. In this example, the consumer waits until the producer adds an item to the buffer, demonstrating inter-thread communication.
import threading
import time
condition = threading.Condition()
buffer = []
def consumer():
with condition:
print('Consumer: Waiting for item...')
condition.wait() # Release the lock and wait to be notified
print('Consumer: Consuming item:', buffer.pop())
def producer():
with condition:
print('Producer: Producing item...')
buffer.append('Item')
condition.notify() # Notify one waiting thread
consumer_thread = threading.Thread(target=consumer)
producer_thread = threading.Thread(target=producer)
consumer_thread.start()
time.sleep(1) # Ensure consumer is waiting first
producer_thread.start()
consumer_thread.join()
producer_thread.join()
Barrier: Waiting for Multiple Threads
A Barrier
allows multiple threads to wait until all of them have reached a certain point in their execution. The wait()
method blocks until the specified number of threads have called it. Once all threads have reached the barrier, they are all released simultaneously. This is useful when you need to synchronize the execution of multiple threads, such as in parallel computations where you need to combine results from different threads at a certain stage.
import threading
import time
barrier = threading.Barrier(3) # Wait for 3 threads to reach the barrier
def worker(thread_id):
print(f'Thread {thread_id}: Working...')
time.sleep(thread_id) # Simulate different amounts of work
print(f'Thread {thread_id}: Reaching the barrier...')
barrier.wait()
print(f'Thread {thread_id}: Passed the barrier!')
threads = []
for i in range(3):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
Real-Life Use Case: Thread-Safe Data Structures
Synchronization primitives are crucial when creating thread-safe data structures. By using locks, you can ensure that only one thread can access and modify the data structure at a time, preventing data corruption. This example demonstrates a thread-safe list that uses a lock to protect its internal state from concurrent access.
import threading
class ThreadSafeList:
def __init__(self):
self._list = []
self._lock = threading.Lock()
def append(self, item):
with self._lock:
self._list.append(item)
def get(self, index):
with self._lock:
return self._list[index]
# Example usage
my_list = ThreadSafeList()
def add_items():
for i in range(1000):
my_list.append(i)
thread1 = threading.Thread(target=add_items)
thread2 = threading.Thread(target=add_items)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(f'List length: {len(my_list._list)}') # Expected: 2000
Best Practices
try...finally
blocks to ensure that locks are always released, even if exceptions occur.
Interview Tip
When discussing synchronization primitives in an interview, be prepared to explain the different types of primitives, their use cases, and the potential problems they can solve. Be able to discuss scenarios where race conditions could occur and how specific primitives can prevent them. Also, be ready to discuss the downsides of using synchronization primitives, such as the potential for deadlocks and performance overhead.
When to Use Them
Use synchronization primitives whenever multiple threads or processes need to access shared resources concurrently. This is especially important when modifying shared data, as race conditions can lead to unpredictable and incorrect results. Consider using them when dealing with shared memory, file access, network connections, or any other resource that could be accessed by multiple threads simultaneously.
Memory Footprint
The memory footprint of synchronization primitives is generally small. A lock or semaphore typically requires only a few bytes of memory to store its state. However, excessive use of synchronization primitives can indirectly increase memory usage by increasing the number of threads or processes required to handle concurrency. Proper design and careful use are important to minimize overhead.
Alternatives
While synchronization primitives are fundamental, there are alternative approaches to concurrency that can sometimes avoid the need for explicit locking. These include:
Pros
Cons
FAQ
-
What is a race condition?
A race condition occurs when multiple threads or processes access and modify shared data concurrently, and the final outcome depends on the unpredictable order in which the threads execute. This can lead to data corruption or incorrect results.
-
How can I prevent deadlocks?
Deadlocks can be prevented by ensuring that threads acquire locks in a consistent order, avoiding circular dependencies, and using timeouts when acquiring locks. The resource hierarchy solution is one method.
-
What is the difference between a lock and a semaphore?
A lock allows only one thread to access a resource at a time, while a semaphore allows a specified number of threads to access a resource concurrently. A lock is like a key to a single restroom, while a semaphore is like a permit allowing a certain number of people into an amusement park ride.
-
When should I use a condition variable?
Use a condition variable when you need to signal between threads that a specific condition has become true. For example, you can use a condition variable to signal to a consumer thread that data is available in a buffer.