The String.codePointAt()
method in Java is used to return the Unicode code point of the character at a specified index.
Table of Contents
- Introduction
codePointAt
Method Syntax- Examples
- Basic Usage
- Handling Edge Cases
- Working with Surrogate Pairs
- Real-World Use Case
- Conclusion
Introduction
The String.codePointAt()
method is a member of the String
class in Java. It allows you to retrieve the Unicode code point of the character at a specified index. This is particularly useful for working with Unicode characters and understanding their numeric representations.
codePointAt() Method Syntax
The syntax for the codePointAt
method is as follows:
public int codePointAt(int index)
- index: The index of the character to be retrieved.
Examples
Basic Usage
The codePointAt
method can be used to get the Unicode code point of the character at a specified index.
Example
public class CodePointAtExample {
public static void main(String[] args) {
String str = "Hello, World!";
int codePoint = str.codePointAt(7);
System.out.println("Code point at index 7: " + codePoint);
}
}
Output:
Code point at index 7: 87
Handling Edge Cases
Example: Index Out of Bounds
If the specified index is out of bounds, the codePointAt
method throws an IndexOutOfBoundsException
.
public class CodePointAtOutOfBoundsExample {
public static void main(String[] args) {
String str = "Hello";
try {
int codePoint = str.codePointAt(10);
System.out.println("Code point at index 10: " + codePoint);
} catch (IndexOutOfBoundsException e) {
System.out.println("Error: " + e.getMessage());
}
}
}
Output:
Error: String index out of range: 10
Working with Surrogate Pairs
Java uses UTF-16 to represent characters, which means some characters are represented by a pair of char
values (surrogate pairs). The codePointAt
method correctly handles these surrogate pairs.
Example
public class CodePointAtSurrogatePairExample {
public static void main(String[] args) {
String str = "A\uD835\uDD0A";
int codePoint = str.codePointAt(1);
System.out.println("Code point at index 1: " + codePoint);
}
}
Output:
Code point at index 1: 119834
In this example, the character at index 1 is the high surrogate of a surrogate pair. The codePointAt
method correctly identifies the full Unicode code point.
Real-World Use Case
Example: Counting Unicode Characters
One common use case for codePointAt
is counting the number of Unicode characters in a string, considering surrogate pairs.
public class CountUnicodeCharactersExample {
public static void main(String[] args) {
String str = "A\uD835\uDD0A B\uD835\uDD0B";
int count = 0;
for (int i = 0; i < str.length(); i++) {
if (Character.isHighSurrogate(str.charAt(i))) {
count++;
i++; // Skip the low surrogate
} else {
count++;
}
}
System.out.println("Number of Unicode characters: " + count);
}
}
Output:
Number of Unicode characters: 4
In this example, the codePointAt
method helps in correctly identifying and counting surrogate pairs as single Unicode characters.
Conclusion
The String.codePointAt()
method in Java is used for retrieving the Unicode code point of a character at a specified index. It correctly handles surrogate pairs and provides a numeric representation of characters, which is useful for various applications such as text processing and data analysis. By understanding and utilizing the codePointAt
method, you can efficiently manage Unicode characters in your Java programs.