Go > Concurrency > Concurrency Patterns > Worker pools

Simple Worker Pool Implementation in Go

This code demonstrates a basic worker pool pattern in Go. It shows how to distribute tasks across a fixed number of worker goroutines for concurrent processing, using channels for task distribution and result collection.

Core Concepts Behind the Snippet

This snippet implements the worker pool pattern, a common concurrency design. It revolves around these core ideas: 1. Task Queue: A channel (`jobs` in this case) that holds the tasks to be processed. 2. Workers: A fixed number of goroutines that continuously listen for tasks from the queue and execute them. 3. Result Channel: A channel (`results`) to collect the output from the workers. 4. Synchronization: Channels are used for safe communication and synchronization between the main goroutine and the worker goroutines. The `sync.WaitGroup` ensures the program waits for all workers to complete before exiting.

Code Implementation

The code defines a `Job` struct representing a unit of work (calculating the square of a number) and a `Result` struct to hold the result. The `worker` function is a goroutine that continuously receives jobs from the `jobs` channel, processes them (calculating the square), and sends the results to the `results` channel. The `sync.WaitGroup` is used to signal when all workers have finished. In the `main` function: - A fixed number of workers are launched as goroutines. - Jobs are sent to the `jobs` channel. - The `jobs` channel is closed to signal to the workers that there are no more jobs. - A goroutine is launched to wait for all workers to complete using `wg.Wait()` and then close the `results` channel. - The main goroutine receives results from the `results` channel and prints them. Explanation of Key Parts: - `jobs := make(chan Job, numJobs)`: Creates a buffered channel for jobs. The buffer size `numJobs` prevents the main goroutine from blocking when sending jobs. - `results := make(chan Result, numJobs)`: Creates a buffered channel for results. - `close(jobs)`: Closing the `jobs` channel signals to the worker goroutines that no more jobs will be sent. This is crucial for the workers to exit their loop and signal completion to the `sync.WaitGroup`. - `wg.Wait()`: The `WaitGroup` ensures that the main function waits for all worker goroutines to finish processing all jobs before closing the `results` channel and exiting.

package main

import (
	"fmt"
	"sync"
)

// Job represents a unit of work.
type Job struct {
	ID int
	Number int
}

// Result represents the result of a job.
type Result struct {
	JobID  int
	Square int
}

func worker(jobs <-chan Job, results chan<- Result, wg *sync.WaitGroup) {
	defer wg.Done()
	for job := range jobs {
		square := job.Number * job.Number
		results <- Result{JobID: job.ID, Square: square}
		fmt.Printf("Worker processed job ID %d\n", job.ID)
	}
}

func main() {
	numJobs := 100
	numWorkers := 3

	jobs := make(chan Job, numJobs)
	results := make(chan Result, numJobs)

	var wg sync.WaitGroup

	for i := 0; i < numWorkers; i++ {
		wg.Add(1)
		go worker(jobs, results, &wg)
	}

	// Send jobs
	for i := 1; i <= numJobs; i++ {
		jobs <- Job{ID: i, Number: i}
	}
	close(jobs)

	// Close results channel after all workers are done.
	go func() {
		wg.Wait()
		close(results)
	}()

	// Collect results
	for result := range results {
		fmt.Printf("Job ID: %d, Square: %d\n", result.JobID, result.Square)
	}
}

Real-Life Use Case

Worker pools are ideal for tasks that can be broken down into smaller, independent units of work and processed concurrently. Examples include: - Image processing: Resizing or applying filters to multiple images. - Data processing: Parsing and analyzing large datasets. - Web crawling: Fetching and processing multiple web pages. - API request handling: Making multiple API calls concurrently.

Best Practices

When using worker pools, consider these best practices: - Choose the right number of workers: The optimal number of workers depends on the nature of the tasks and the available hardware resources. Experiment to find the best balance. - Handle errors gracefully: Workers should handle errors that occur during task processing and report them appropriately. - Avoid blocking: Ensure that workers do not block indefinitely, as this can lead to deadlocks. Use timeouts or context cancellation when necessary. - Use buffered channels: Buffered channels can improve performance by allowing the sender to continue without waiting for the receiver to be ready, up to the buffer's capacity. - Consider using a library: Libraries like `github.com/gammazero/workerpool` provide more advanced features and abstractions for managing worker pools.

Interview Tip

During an interview, be prepared to discuss the trade-offs of using worker pools, such as the overhead of creating and managing goroutines versus the benefits of increased concurrency. Also, be ready to explain how channels and `sync.WaitGroup` are used for synchronization.

When to Use Them

Use worker pools when you have a large number of independent, potentially time-consuming tasks that need to be executed concurrently, and you want to limit the number of goroutines running at any given time to prevent resource exhaustion.

Memory Footprint

Worker pools introduce some memory overhead due to the goroutines and channels. The amount of memory used depends on the number of workers, the size of the jobs and results, and the buffer size of the channels. It's important to consider the memory footprint when designing worker pools, especially for high-throughput applications.

Alternatives

Alternatives to worker pools include: - Naive Goroutines: Launching a goroutine for each task can be simpler for small numbers of tasks, but it's less scalable and can lead to resource exhaustion if the number of tasks is large. - Errgroup: The `errgroup` package provides a way to run a collection of goroutines and propagate errors. It's useful when you need to run multiple tasks concurrently and stop all tasks if any of them fail. - Dataflow frameworks: For complex data processing pipelines, consider using dataflow frameworks like Apache Beam or Apache Kafka Streams.

Pros

The pros of using worker pools include: - Increased concurrency and throughput. - Resource control (limiting the number of goroutines). - Improved responsiveness.

Cons

The cons of using worker pools include: - Increased complexity. - Potential for deadlocks if not implemented carefully. - Overhead of creating and managing goroutines and channels.

← Simple Channel Example: Sending and Receiving Data Timeout with Select: Ensuring Responsiveness in Concurrent Go Programs →

FAQ

What happens if I don't close the `jobs` channel?

If you don't close the `jobs` channel, the worker goroutines will block indefinitely, waiting for more jobs. This will prevent them from exiting and signal completion to the `sync.WaitGroup`, leading to a deadlock.
Why use a buffered channel for `jobs` and `results`?

Buffered channels provide a degree of decoupling between the sender and receiver. This can improve performance because the sender doesn't have to wait for the receiver to be ready for every message. The buffer allows the sender to continue sending messages as long as there is space in the buffer.
How do I handle errors in the worker goroutines?

You can add error handling logic within the worker function. One approach is to add an `error` field to the `Result` struct and send any errors that occur during processing to the `results` channel. The main goroutine can then check for errors when processing the results.

Collections

Concurrency

Core Go Basics

Error Handling

File and I/O

Memory Management

Packages and Modules

Reflection and Generics

Structs and Interfaces

Testing and Benchmarking

Variables and Constants

Web Development