Java String codePointAt() Method

The String.codePointAt() method in Java is used to return the Unicode code point of the character at a specified index.

Table of Contents

  1. Introduction
  2. codePointAt Method Syntax
  3. Examples
    • Basic Usage
    • Handling Edge Cases
    • Working with Surrogate Pairs
    • Real-World Use Case
  4. Conclusion

Introduction

The String.codePointAt() method is a member of the String class in Java. It allows you to retrieve the Unicode code point of the character at a specified index. This is particularly useful for working with Unicode characters and understanding their numeric representations.

codePointAt() Method Syntax

The syntax for the codePointAt method is as follows:

public int codePointAt(int index)
  • index: The index of the character to be retrieved.

Examples

Basic Usage

The codePointAt method can be used to get the Unicode code point of the character at a specified index.

Example

public class CodePointAtExample {
    public static void main(String[] args) {
        String str = "Hello, World!";
        int codePoint = str.codePointAt(7);
        System.out.println("Code point at index 7: " + codePoint);
    }
}

Output:

Code point at index 7: 87

Handling Edge Cases

Example: Index Out of Bounds

If the specified index is out of bounds, the codePointAt method throws an IndexOutOfBoundsException.

public class CodePointAtOutOfBoundsExample {
    public static void main(String[] args) {
        String str = "Hello";
        try {
            int codePoint = str.codePointAt(10);
            System.out.println("Code point at index 10: " + codePoint);
        } catch (IndexOutOfBoundsException e) {
            System.out.println("Error: " + e.getMessage());
        }
    }
}

Output:

Error: String index out of range: 10

Working with Surrogate Pairs

Java uses UTF-16 to represent characters, which means some characters are represented by a pair of char values (surrogate pairs). The codePointAt method correctly handles these surrogate pairs.

Example

public class CodePointAtSurrogatePairExample {
    public static void main(String[] args) {
        String str = "A\uD835\uDD0A";
        int codePoint = str.codePointAt(1);
        System.out.println("Code point at index 1: " + codePoint);
    }
}

Output:

Code point at index 1: 119834

In this example, the character at index 1 is the high surrogate of a surrogate pair. The codePointAt method correctly identifies the full Unicode code point.

Real-World Use Case

Example: Counting Unicode Characters

One common use case for codePointAt is counting the number of Unicode characters in a string, considering surrogate pairs.

public class CountUnicodeCharactersExample {
    public static void main(String[] args) {
        String str = "A\uD835\uDD0A B\uD835\uDD0B";
        int count = 0;

        for (int i = 0; i < str.length(); i++) {
            if (Character.isHighSurrogate(str.charAt(i))) {
                count++;
                i++; // Skip the low surrogate
            } else {
                count++;
            }
        }

        System.out.println("Number of Unicode characters: " + count);
    }
}

Output:

Number of Unicode characters: 4

In this example, the codePointAt method helps in correctly identifying and counting surrogate pairs as single Unicode characters.

Conclusion

The String.codePointAt() method in Java is used for retrieving the Unicode code point of a character at a specified index. It correctly handles surrogate pairs and provides a numeric representation of characters, which is useful for various applications such as text processing and data analysis. By understanding and utilizing the codePointAt method, you can efficiently manage Unicode characters in your Java programs.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top