Introduction
Counting the number of duplicate words in a string is a common task in text analysis, data processing, and natural language processing. Detecting duplicate words can be useful for tasks such as cleaning up user input, analyzing text for patterns, or even optimizing search algorithms. Java 8 provides a powerful and efficient way to accomplish this using Streams. In this guide, we’ll walk you through how to create a Java program that counts the number of duplicate words in a string using Java 8 Streams.
Problem Statement
The task is to create a Java program that:
- Accepts a string as input.
- Uses Java 8 Streams to count how many times each word appears.
- Outputs the number of words that have duplicates in the string.
Example 1:
- Input:
"This is a test. This test is simple." - Output:
Number of Duplicate Words: 3
Example 2:
- Input:
"Java is fun and Java is powerful." - Output:
Number of Duplicate Words: 2
Solution Steps
- Input String: Start with a string that can either be hardcoded or provided by the user.
- Normalize and Split the String: Convert the string to lowercase (for case-insensitivity) and use the
split()method to break the string into individual words. - Count Word Occurrences: Convert the array of words into a stream, and use
Collectors.groupingByto count the occurrences of each word. - Filter and Count Duplicates: Filter the map to retain only words that appear more than once and count them.
- Display the Result: Print the number of duplicate words.
Java Program
Java 8 Program to Count the Number of Duplicate Words in a String
import java.util.Arrays;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
/**
* Java 8 Program to Count the Number of Duplicate Words in a String
* Author: https://www.rameshfadatare.com/
*/
public class DuplicateWordCounter {
public static void main(String[] args) {
// Step 1: Take input string
String input = "This is a test. This test is simple.";
// Step 2: Count the number of duplicate words using streams
long duplicateWordCount = countDuplicateWords(input);
// Step 3: Display the result
System.out.println("Number of Duplicate Words: " + duplicateWordCount);
}
// Method to count the number of duplicate words in a string
public static long countDuplicateWords(String input) {
Map<String, Long> wordCountMap = Arrays.stream(input.toLowerCase().split("\\W+"))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
return wordCountMap.entrySet().stream()
.filter(entry -> entry.getValue() > 1)
.count();
}
}
Explanation of the Program
-
Input Handling: The program uses the string
"This is a test. This test is simple."as an example input. This can be modified to accept input from the user if required. -
Normalization: The input string is converted to lowercase using
toLowerCase()to ensure case-insensitive word counting. -
Splitting the String: The
split("\\W+")method splits the string into words. The\\W+regular expression matches any sequence of non-word characters, which includes spaces and punctuation, ensuring that only words are extracted. -
Counting Words: The
Collectors.groupingBy(Function.identity(), Collectors.counting())method counts the occurrences of each word and stores the results in a map. -
Filtering and Counting Duplicates: The program filters the entries of the map to retain only those words that appear more than once and counts how many such words exist.
-
Output: The program prints the number of words that have duplicates.
Output Example
Example 1:
Input: This is a test. This test is simple.
Output: Number of Duplicate Words: 3
Example 2:
Input: Java is fun and Java is powerful.
Output: Number of Duplicate Words: 2
Advanced Considerations
-
Case Sensitivity: The program is case-insensitive by default due to the
toLowerCase()normalization. If case sensitivity is required, you can remove this step. -
Handling Punctuation: The program uses
\\W+in thesplit()method to handle punctuation and ensure that only words are counted. This can be modified to include or exclude specific characters as needed. -
Performance Considerations: This approach is efficient for typical string lengths and leverages the functional programming features of Java 8. The use of streams and collectors provides a clear and concise method for counting duplicate words.
Conclusion
This Java 8 program efficiently counts the number of duplicate words in a string using streams. By leveraging the power of the Stream API, the solution is both concise and powerful, making it suitable for various text processing tasks. Whether you’re analyzing text data, cleaning up user input, or working on language processing, this method provides an effective approach to identifying and counting duplicate words in Java.