Java 8 – Remove Duplicate Words from a String

Introduction

Removing duplicate words from a string is a common task in text processing, particularly when you’re dealing with user input, data cleaning, or preparing text for analysis. Duplicates can often clutter data, leading to inaccuracies in analysis or display. Java 8 provides a powerful and concise way to remove duplicate words using Streams. In this guide, we’ll explore how to create a Java program that removes duplicate words from a string using Java 8 Streams.

Problem Statement

The task is to create a Java program that:

  • Accepts a string as input.
  • Uses Java 8 Streams to remove duplicate words from the string.
  • Outputs the string with duplicates removed.

Example 1:

  • Input: "This is is a test test string."
  • Output: "This is a test string."

Example 2:

  • Input: "Java Java is powerful powerful."
  • Output: "Java is powerful."

Solution Steps

  1. Input String: Start with a string that can either be hardcoded or provided by the user.
  2. Normalize and Split the String: Convert the string to lowercase (optional) and use the split() method to break the string into individual words.
  3. Stream Processing: Convert the array of words into a stream, remove duplicates using distinct(), and then join the words back into a single string.
  4. Display the Result: Print the string with duplicates removed.

Java Program

Java 8: Remove Duplicate Words from a String

import java.util.Arrays;
import java.util.stream.Collectors;

/**
 * Java 8: Remove Duplicate Words from a String
 * Author: https://www.rameshfadatare.com/
 */
public class RemoveDuplicateWords {

    public static void main(String[] args) {
        // Step 1: Take input string
        String input = "This is is a test test string.";

        // Step 2: Remove duplicate words using streams
        String result = removeDuplicateWords(input);

        // Step 3: Display the result
        System.out.println(result);
    }

    // Method to remove duplicate words from a string
    public static String removeDuplicateWords(String input) {
        return Arrays.stream(input.split("\\s+"))
                .distinct()
                .collect(Collectors.joining(" "));
    }
}

Explanation of the Program

  • Input Handling: The program uses the string "This is is a test test string." as an example input. This can be modified to accept input from the user if required.

  • Splitting the String: The split("\\s+") method splits the string into words, where \\s+ is a regular expression that matches any sequence of whitespace characters.

  • Removing Duplicates: The distinct() method is used in the stream to remove duplicate words from the list.

  • Joining the Words: The Collectors.joining(" ") method joins the distinct words back into a single string, with each word separated by a space.

  • Output: The program prints the string with duplicate words removed.

Output Example

Example 1:

Input: This is is a test test string.
Output: This is a test string.

Example 2:

Input: Java Java is powerful powerful.
Output: Java is powerful.

Advanced Considerations

  1. Case Sensitivity: The program is case-sensitive by default, meaning "Java" and "java" would be considered different words. If you want the removal process to be case-insensitive, you can convert the string to lowercase before splitting it by adding input.toLowerCase().

  2. Handling Punctuation: The program removes duplicate words but does not remove punctuation attached to words. If you want to remove punctuation as well, you may need to preprocess the string before splitting.

  3. Performance Considerations: This approach is efficient for typical string lengths and leverages the functional programming features of Java 8. The use of streams provides a clear and concise method for removing duplicates.

Conclusion

This Java 8 program efficiently removes duplicate words from a string using streams. By leveraging the power of the Stream API, the solution is both concise and powerful, making it suitable for various text processing tasks. Whether you’re cleaning up user input, preparing data for analysis, or improving text readability, this method provides an effective approach to removing duplicates in Java.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top