Java > Java 8 Features > Streams API > Stream Pipelines (filter, map, reduce)
Grouping and Counting Words in a String Using Stream Pipelines
This code snippet demonstrates the use of Java 8 Streams API to split a string into words, group them by their value, and then count the occurrences of each word. It showcases the use of flatMap
, collect
, and Collectors.groupingBy
.
Code Snippet
The code initializes a string. It then splits the string into words using whitespace as a delimiter. The map
operation cleans each word by removing non-alphabetic characters and converting it to lowercase. The filter
operation removes any empty strings that might result from the cleaning process. Finally, the collect
operation uses Collectors.groupingBy
to group the words by their value and count the occurrences of each word. The result is a Map
where the keys are the words and the values are their counts.
import java.util.Arrays;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
public class StreamGroupingExample {
public static void main(String[] args) {
String text = "This is a sample string with some words and this is a string.";
Map<String, Long> wordCounts = Arrays.stream(text.split("\\s+")) // Split into words
.map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase()) // Clean and lowercase words
.filter(word -> !word.isEmpty()) // Remove empty strings after cleaning
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())); // Group and count
System.out.println(wordCounts);
// Expected Output (order may vary):
// {this=2, a=1, sample=1, string=2, with=1, some=1, words=1, is=2, and=1}
}
}
Concepts Behind the Snippet
This snippet illustrates these concepts:
* flatMap: While not directly used here, flatMap
is essential for transforming a stream of collections into a single stream of elements.
* Collectors.groupingBy: A powerful collector that groups elements of a stream based on a classifier function.
* Function.identity(): A function that simply returns its input argument. Used here to group words based on their own values.
* Collectors.counting(): A collector that counts the number of elements in each group.
Real-Life Use Case
This pattern is commonly used in text processing and natural language processing applications to analyze the frequency of words in a document, identify popular topics, or build search indexes.
Best Practices
Map
, Multiset
) based on the specific requirements of your application.
Interview Tip
Be prepared to discuss the different types of collectors available in the Collectors
class. Also, be able to explain how to use custom collectors to perform more complex aggregations.
When to Use Them
Use this pattern when you need to group and aggregate data based on some criteria. It's particularly useful for counting occurrences, calculating averages, or performing other statistical analyses.
Memory Footprint
The memory footprint depends on the size of the input data and the number of distinct groups. Large datasets with many distinct groups may require significant memory.
Alternatives
HashMap
to count word frequencies is possible, but less concise.Multiset
, which simplifies counting occurrences.
Pros
Collectors.groupingBy
collector is optimized for grouping and aggregation.
Cons
FAQ
-
What if I want to group by multiple criteria?
You can use nestedgroupingBy
collectors or create a custom object that represents the combined criteria. -
How can I sort the results by the count?
You can convert theMap
to a stream of entries, sort the entries by value (count), and then collect the sorted entries into a newMap
. -
Can I use a different collector for aggregation other than
counting()
?
Yes, you can use other collectors likesummingInt()
,averagingDouble()
, or even custom collectors to perform more complex aggregations within each group.