Java tutorials > Core Java Fundamentals > Data Structures and Collections > What are the implementations of `Set` (HashSet, TreeSet, LinkedHashSet)?

What are the implementations of `Set` (HashSet, TreeSet, LinkedHashSet)?

This tutorial explains the different implementations of the `Set` interface in Java: `HashSet`, `TreeSet`, and `LinkedHashSet`. It covers their unique characteristics, performance implications, and appropriate use cases. Understanding these implementations is crucial for efficient data management and algorithm design.

Introduction to the Set Interface

The `Set` interface in Java's Collections Framework represents a collection that contains no duplicate elements. More formally, sets contain no pair of elements `e1` and `e2` such that `e1.equals(e2)`, and at most one null element. The `Set` interface models the mathematical set abstraction. The Java Collections Framework provides several implementations of the `Set` interface, each with different performance characteristics and use cases.

HashSet: Unordered and Fast

`HashSet` is a class that implements the `Set` interface, backed by a hash table (actually a `HashMap` instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This implementation offers constant time performance for the basic operations (`add`, `remove`, `contains`, and `size`), assuming the hash function disperses the elements properly among the buckets. Iterating over a `HashSet` takes time proportional to the sum of the number of entries in the `HashSet` instance (the size) plus the number of “buckets” in the backing `HashMap` instance (the capacity).

import java.util.HashSet;
import java.util.Set;

public class HashSetExample {
    public static void main(String[] args) {
        Set<String> hashSet = new HashSet<>();
        hashSet.add("Apple");
        hashSet.add("Banana");
        hashSet.add("Orange");
        hashSet.add("Apple"); // Duplicate element, will be ignored

        System.out.println("HashSet: " + hashSet); // Output: HashSet: [Orange, Banana, Apple] (Order not guaranteed)

        boolean containsBanana = hashSet.contains("Banana");
        System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true

        hashSet.remove("Banana");
        System.out.println("HashSet after removing Banana: " + hashSet); // Output: HashSet after removing Banana: [Orange, Apple]
    }
}

TreeSet: Sorted Set

`TreeSet` is an implementation of the `Set` interface that uses a `TreeMap` for storage. The elements are stored in a sorted order, either according to their natural ordering (if the elements implement the `Comparable` interface) or according to a `Comparator` provided at the time of `TreeSet` creation. This makes `TreeSet` suitable for applications where elements need to be accessed in a sorted order. The `add`, `remove`, and `contains` operations offer `O(log n)` time complexity.

import java.util.TreeSet;
import java.util.Set;

public class TreeSetExample {
    public static void main(String[] args) {
        Set<String> treeSet = new TreeSet<>();
        treeSet.add("Apple");
        treeSet.add("Banana");
        treeSet.add("Orange");
        treeSet.add("Apple"); // Duplicate element, will be ignored

        System.out.println("TreeSet: " + treeSet); // Output: TreeSet: [Apple, Banana, Orange] (Elements are sorted)

        boolean containsBanana = treeSet.contains("Banana");
        System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true

        treeSet.remove("Banana");
        System.out.println("TreeSet after removing Banana: " + treeSet); // Output: TreeSet after removing Banana: [Apple, Orange]
    }
}

LinkedHashSet: Insertion Order

`LinkedHashSet` is a `HashSet` that also maintains a doubly-linked list running through all of its entries. This linked list defines the insertion order, which is the order in which elements were inserted into the set. The `LinkedHashSet` implementation differs from `HashSet` in that it maintains a predictable iteration order, which is the order in which elements were inserted into the set. This is particularly useful when you want to iterate through the elements in the order they were added. The `add`, `remove`, and `contains` methods have constant-time performance on average. Iterating over a `LinkedHashSet` requires time proportional to the size of the set, regardless of the capacity of the underlying hash table.

import java.util.LinkedHashSet;
import java.util.Set;

public class LinkedHashSetExample {
    public static void main(String[] args) {
        Set<String> linkedHashSet = new LinkedHashSet<>();
        linkedHashSet.add("Apple");
        linkedHashSet.add("Banana");
        linkedHashSet.add("Orange");
        linkedHashSet.add("Apple"); // Duplicate element, will be ignored

        System.out.println("LinkedHashSet: " + linkedHashSet); // Output: LinkedHashSet: [Apple, Banana, Orange] (Insertion order maintained)

        boolean containsBanana = linkedHashSet.contains("Banana");
        System.out.println("Contains Banana: " + containsBanana); // Output: Contains Banana: true

        linkedHashSet.remove("Banana");
        System.out.println("LinkedHashSet after removing Banana: " + linkedHashSet); // Output: LinkedHashSet after removing Banana: [Apple, Orange]
    }
}

Concepts Behind the Snippets

All three implementations ensure uniqueness of elements as enforced by the `Set` interface. `HashSet` uses hashing for fast lookups, but provides no ordering guarantee. `TreeSet` uses a tree structure for sorted storage. `LinkedHashSet` uses a hash table with a linked list to maintain insertion order. The choice depends on the specific needs of the application, particularly the importance of order and speed of operations.

Real-Life Use Case Section

Consider an e-commerce application. `HashSet` could be used to store unique product IDs in a user's shopping cart. `TreeSet` could be used to display a list of products sorted alphabetically by name. `LinkedHashSet` could be used to maintain the order in which products were added to a wishlist, useful for displaying 'recently added' items.

Best Practices

  • Choose the appropriate `Set` implementation based on your requirements for ordering and performance.
  • Use `HashSet` when order doesn't matter and speed is paramount.
  • Use `TreeSet` when elements need to be stored in a sorted order.
  • Use `LinkedHashSet` when you need to maintain the insertion order.
  • Consider the memory implications of each implementation, especially for large datasets.

Interview Tip

Be prepared to discuss the differences between `HashSet`, `TreeSet`, and `LinkedHashSet` in terms of their ordering, performance, and underlying data structures. Understand the trade-offs involved in choosing one implementation over another. Explain how these choices impact overall application performance. Be ready to provide use case examples for each.

When to use them

  • HashSet: Use when you need to store unique elements and don't care about the order. It provides the fastest performance for `add`, `remove`, and `contains` operations.
  • TreeSet: Use when you need to store unique elements and maintain them in a sorted order. It's slower than `HashSet` but provides the benefit of sorted iteration.
  • LinkedHashSet: Use when you need to store unique elements and maintain the order in which they were inserted. It provides a good balance between performance and order preservation.

Memory footprint

  • HashSet: Generally has a smaller memory footprint than `TreeSet` and `LinkedHashSet` due to its simple hash table implementation.
  • TreeSet: Has a larger memory footprint than `HashSet` due to the need to store the elements in a sorted tree structure.
  • LinkedHashSet: Has a slightly larger memory footprint than `HashSet` due to the additional overhead of maintaining the linked list for insertion order.

Alternatives

If you need a sorted set and the performance of `TreeSet` is not sufficient, consider using a sorted list (e.g., `ArrayList`) and maintaining its sorted order manually, although this requires more manual coding. For very large datasets, consider using external libraries that provide specialized sorted set implementations with better performance.

Pros and Cons - HashSet

Pros:

  • Fast `add`, `remove`, and `contains` operations (O(1) on average).
  • Simple implementation, generally lower memory overhead compared to `TreeSet` and `LinkedHashSet`.
Cons:
  • No guaranteed order of elements.

Pros and Cons - TreeSet

Pros:

  • Elements are stored in a sorted order.
  • Provides `O(log n)` time complexity for `add`, `remove`, and `contains` operations.
Cons:
  • Slower than `HashSet` for basic operations.
  • Requires elements to be comparable or a comparator to be provided.
  • Higher memory overhead than `HashSet`.

Pros and Cons - LinkedHashSet

Pros:

  • Maintains the insertion order of elements.
  • Provides constant-time performance for `add`, `remove`, and `contains` operations on average.
Cons:
  • Slightly higher memory overhead than `HashSet` due to the linked list.

FAQ

  • What happens if I add a duplicate element to a Set?

    The `Set` interface does not allow duplicate elements. If you attempt to add a duplicate element, the `add()` method will return `false`, and the set will remain unchanged. The duplicate element will not be added.
  • Which Set implementation should I use if I need to iterate through the elements in a specific order?

    If you need to iterate through the elements in the order they were inserted, use `LinkedHashSet`. If you need to iterate through the elements in sorted order, use `TreeSet`. `HashSet` does not guarantee any specific iteration order.
  • Are the Set implementations thread-safe?

    No, none of the standard `Set` implementations (`HashSet`, `TreeSet`, `LinkedHashSet`) are inherently thread-safe. If you need to use a set in a multithreaded environment, you should synchronize access to the set using external synchronization mechanisms (e.g., `Collections.synchronizedSet()`) or use a thread-safe concurrent set implementation (e.g., `ConcurrentSkipListSet` from the `java.util.concurrent` package).