Java 8 – Find Duplicate Words in a String

Introduction

Finding duplicate words in a string is a common task in text processing. In Java 8, the Stream API provides an efficient way to handle such tasks. You can split the string into individual words, count their occurrences, and filter out the duplicates.

Solution Steps

  1. Define the Input String: The string to be processed, which may contain duplicate words.
  2. Split the String: Break the string into individual words using a method that handles non-alphabetic characters.
  3. Count Word Occurrences: Use Collectors.groupingBy() to count how often each word appears.
  4. Filter and Display Duplicates: Retain and display words that appear more than once.

Java Program

import java.util.Arrays;
import java.util.Map;
import java.util.List;
import java.util.stream.Collectors;

public class FindDuplicateWords {
    public static void main(String[] args) {
        // Step 1: Define the input string
        String input = "Java is a programming language, and Java is also a coffee.";

        // Step 2: Split the string into words
        List<String> words = Arrays.asList(input.toLowerCase().split("\\W+"));

        // Step 3: Convert the list of words to a stream and count word occurrences
        Map<String, Long> wordCount = words.stream()
                                           .collect(Collectors.groupingBy(word -> word, Collectors.counting()));

        // Step 4: Filter and display duplicate words
        wordCount.entrySet().stream()
                 .filter(entry -> entry.getValue() > 1)  // Retain only words with more than 1 occurrence
                 .forEach(entry -> System.out.println("Word: '" + entry.getKey() + "' appears " + entry.getValue() + " times."));
    }
}

Explanation

Step 1: Define the Input String

You define the input string, which contains words that might be repeated:

String input = "Java is a programming language, and Java is also a coffee.";

This string contains words, some of which may appear more than once.

Step 2: Split the String

The string is split into individual words, ignoring punctuation and other non-alphabetic characters:

List<String> words = Arrays.asList(input.toLowerCase().split("\\W+"));

Here, the string is converted to lowercase to make the comparison case-insensitive, and the words are split into a list.

Step 3: Count Word Occurrences

The words are then counted using a stream, with the help of Collectors.groupingBy() and Collectors.counting():

Map<String, Long> wordCount = words.stream()
                                   .collect(Collectors.groupingBy(word -> word, Collectors.counting()));

This collects each word and counts its occurrences in the input string.

Step 4: Filter and Display Duplicate Words

The stream filters the words that have more than one occurrence and prints them:

wordCount.entrySet().stream()
         .filter(entry -> entry.getValue() > 1)
         .forEach(entry -> System.out.println("Word: '" + entry.getKey() + "' appears " + entry.getValue() + " times."));

Here, the program identifies and prints only the words that are repeated.


Conclusion

In Java 8, finding duplicate words in a string is simplified using the Stream API. By splitting the string, counting the occurrences of each word, and filtering the duplicates, you can easily process text data and extract the needed information. This approach handles both case-insensitive words and non-word characters efficiently.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top