Skip to content

Encoding

UCS-2

Every character occupy 2 bytes.

UTF-8

A variant-length unicode encoding method. First byte imply bytes length.

Byte LengthByte 1Byte 2Byte 3Byte 4Byte 5Byte 6
1 (ASCII)0xxxxxxx
2110xxxxx10xxxxxx
31110xxxx10xxxxxx10xxxxxx
411110xxx10xxxxxx10xxxxxx10xxxxxx
5111110xx10xxxxxx10xxxxxx10xxxxxx10xxxxxx
61111110x10xxxxxx10xxxxxx10xxxxxx10xxxxxx10xxxxxx

Example:

E8B387 = 11101000 10110011 10000111

Unicode = 1000 1100 1100 0111 = 8CC7

https://leetcode.com/problems/utf-8-validation/

Changelog

Just observe 👀