Introduction
Removing duplicate words from a string is a common task in text processing, particularly when you’re dealing with user input, data cleaning, or preparing text for analysis. Duplicates can often clutter data, leading to inaccuracies in analysis or display. Java 8 provides a powerful and concise way to remove duplicate words using Streams. In this guide, we’ll explore how to create a Java program that removes duplicate words from a string using Java 8 Streams.
Problem Statement
The task is to create a Java program that:
- Accepts a string as input.
- Uses Java 8 Streams to remove duplicate words from the string.
- Outputs the string with duplicates removed.
Example 1:
- Input:
"This is is a test test string." - Output:
"This is a test string."
Example 2:
- Input:
"Java Java is powerful powerful." - Output:
"Java is powerful."
Solution Steps
- Input String: Start with a string that can either be hardcoded or provided by the user.
- Normalize and Split the String: Convert the string to lowercase (optional) and use the
split()method to break the string into individual words. - Stream Processing: Convert the array of words into a stream, remove duplicates using
distinct(), and then join the words back into a single string. - Display the Result: Print the string with duplicates removed.
Java Program
Java 8: Remove Duplicate Words from a String
import java.util.Arrays;
import java.util.stream.Collectors;
/**
* Java 8: Remove Duplicate Words from a String
* Author: https://www.rameshfadatare.com/
*/
public class RemoveDuplicateWords {
public static void main(String[] args) {
// Step 1: Take input string
String input = "This is is a test test string.";
// Step 2: Remove duplicate words using streams
String result = removeDuplicateWords(input);
// Step 3: Display the result
System.out.println(result);
}
// Method to remove duplicate words from a string
public static String removeDuplicateWords(String input) {
return Arrays.stream(input.split("\\s+"))
.distinct()
.collect(Collectors.joining(" "));
}
}
Explanation of the Program
-
Input Handling: The program uses the string
"This is is a test test string."as an example input. This can be modified to accept input from the user if required. -
Splitting the String: The
split("\\s+")method splits the string into words, where\\s+is a regular expression that matches any sequence of whitespace characters. -
Removing Duplicates: The
distinct()method is used in the stream to remove duplicate words from the list. -
Joining the Words: The
Collectors.joining(" ")method joins the distinct words back into a single string, with each word separated by a space. -
Output: The program prints the string with duplicate words removed.
Output Example
Example 1:
Input: This is is a test test string.
Output: This is a test string.
Example 2:
Input: Java Java is powerful powerful.
Output: Java is powerful.
Advanced Considerations
-
Case Sensitivity: The program is case-sensitive by default, meaning
"Java"and"java"would be considered different words. If you want the removal process to be case-insensitive, you can convert the string to lowercase before splitting it by addinginput.toLowerCase(). -
Handling Punctuation: The program removes duplicate words but does not remove punctuation attached to words. If you want to remove punctuation as well, you may need to preprocess the string before splitting.
-
Performance Considerations: This approach is efficient for typical string lengths and leverages the functional programming features of Java 8. The use of streams provides a clear and concise method for removing duplicates.
Conclusion
This Java 8 program efficiently removes duplicate words from a string using streams. By leveraging the power of the Stream API, the solution is both concise and powerful, making it suitable for various text processing tasks. Whether you’re cleaning up user input, preparing data for analysis, or improving text readability, this method provides an effective approach to removing duplicates in Java.