
UTF8 Encoder
Easily convert any text to UTF-8 encoded hexadecimal with Qodex's UTF-8 Encoder. Whether you're preparing input for hashing algorithms, debugging byte streams, or sending multilingual data over networks, this tool ensures safe and accurate encoding. You can also decode encoded text using our UTF-8 Decoder for round-trip validation.
UTF8 Encoder - Documentation
What is UTF-8 Encoding?
UTF-8 encoding is the process of converting readable characters into byte sequences that computers can understand and store. UTF-8 stands for "Unicode Transformation Format - 8 bit", and it's the most widely used encoding system on the web.
With UTF-8 encoding, every letter, number, emoji, or symbol is mapped to a specific hexadecimal representation. For example, the letter A becomes 41 and the emoji ✔ becomes E2 9C 94.
UTF-8 Encoding Reference Table
Use this table to look up common characters and their UTF-8 hex byte representations:
Character | Description | Code Point | UTF-8 Hex Bytes | Byte Count |
|---|---|---|---|---|
A | Latin capital A | U+0041 | 41 | 1 |
Z | Latin capital Z | U+005A | 5A | 1 |
0 | Digit zero | U+0030 | 30 | 1 |
~ | Tilde | U+007E | 7E | 1 |
© | Copyright sign | U+00A9 | C2 A9 | 2 |
é | Latin e with acute | U+00E9 | C3 A9 | 2 |
ü | Latin u with diaeresis | U+00FC | C3 BC | 2 |
£ | Pound sign | U+00A3 | C2 A3 | 2 |
€ | Euro sign | U+20AC | E2 82 AC | 3 |
✔ | Heavy check mark | U+2714 | E2 9C 94 | 3 |
中 | CJK "middle" | U+4E2D | E4 B8 AD | 3 |
界 | CJK "world/boundary" | U+754C | E7 95 8C | 3 |
🚀 | Rocket emoji | U+1F680 | F0 9F 9A 80 | 4 |
𝄞 | Musical G clef | U+1D11E | F0 9D 84 9E | 4 |
UTF-8 vs. ASCII and UTF-16
Feature | ASCII | UTF-8 | UTF-16 |
|---|---|---|---|
Character range | 128 characters (English only) | All Unicode (1.1M+ characters) | All Unicode |
Bytes per char | Always 1 | 1 to 4 (variable) | 2 or 4 |
ASCII compatible | Yes (it IS ASCII) | Yes (backward compatible) | No |
Best for | English-only legacy systems | Web, APIs, most modern apps | Java/Windows internals, CJK-heavy text |
Web usage | Declining | 98%+ of websites | Rare on the web |
How UTF-8 Encoding Works (Behind the Scenes)
UTF-8 uses different byte patterns depending on the Unicode code point:
Unicode Range | Bytes | Encoding Format | Example |
|---|---|---|---|
U+0000 to U+007F | 1 | 0xxxxxxx | A = 41 |
U+0080 to U+07FF | 2 | 110xxxxx 10xxxxxx | é = C3 A9 |
U+0800 to U+FFFF | 3 | 1110xxxx 10xxxxxx 10xxxxxx | € = E2 82 AC |
U+10000 to U+10FFFF | 4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | 🚀 = F0 9F 9A 80 |
Encoding Flow:
Read each character from the input string
Find the Unicode code point (e.g., 'A' = U+0041)
Convert to binary and fit into the correct UTF-8 structure based on byte count
Output as hex — space-separated values (e.g.,
41for 'A')
Practical Examples
Example 1: Simple ASCII (1 byte)
Input: A | Code Point: U+0041 | UTF-8 Hex: 41
Example 2: Accented Latin (2 bytes)
Input: é | Code Point: U+00E9 | UTF-8 Hex: C3 A9
Example 3: Emoji (4 bytes)
Input: 🚀 | Code Point: U+1F680 | UTF-8 Hex: F0 9F 9A 80
Example 4: Japanese Character (3 bytes)
Input: 界 | Code Point: U+754C | UTF-8 Hex: E7 95 8C
UTF-8 Encoding in PHP, Python, and JavaScript
Here is how to handle UTF-8 encoding in the three most popular web development languages:
PHP
// Encode string to UTF-8 (from another encoding) $text = "Cafe"; $utf8 = mb_convert_encoding($text, 'UTF-8', 'ISO-8859-1');// Get hex representation of UTF-8 bytes $hex = bin2hex("Cafe"); // Output: 436166c3a9
// Check string length in characters vs bytes echo mb_strlen("Cafe", 'UTF-8'); // 4 characters echo strlen("Cafe"); // 5 bytes
// Always use multibyte functions for UTF-8 strings echo mb_strtoupper("cafe", 'UTF-8'); // CAFE
// Pro tip: Set internal encoding globally mb_internal_encoding('UTF-8');
Python
# Encode a string to UTF-8 bytes text = "Cafe" utf8_bytes = text.encode("utf-8") print(utf8_bytes) # b'Caf\xc3\xa9'Get hex representation
hex_string = utf8_bytes.hex() print(hex_string) # 436166c3a9
Encode emoji
rocket = "\U0001F680" print(rocket.encode("utf-8").hex()) # f09f9a80
Read a file with explicit UTF-8 encoding
with open("data.txt", "r", encoding="utf-8") as f: content = f.read()
JavaScript
// Using TextEncoder (modern browsers and Node.js) const encoder = new TextEncoder(); const bytes = encoder.encode("Cafe"); console.log(bytes); // Uint8Array [67, 97, 102, 195, 169]// Convert to hex string const hex = Array.from(bytes) .map(b => b.toString(16).padStart(2, '0')) .join(' '); console.log(hex); // "43 61 66 c3 a9"
// URL-safe encoding (percent-encoded UTF-8) console.log(encodeURIComponent("Cafe")); // Output: Caf%C3%A9
// Encode emoji const rocketBytes = new TextEncoder().encode("\uD83D\uDE80"); console.log(Array.from(rocketBytes).map(b => b.toString(16)).join(' ')); // f0 9f 9a 80
Common UTF-8 Encoding Errors and How to Fix Them
Error | Symptom | Cause | Fix |
|---|---|---|---|
Mojibake | "Cafe" shows as "Café" | UTF-8 bytes read as Latin-1 | Set charset to UTF-8 in HTTP headers and HTML meta tag |
Replacement characters | Text shows as "Caf?" | Invalid byte sequences | Re-encode source data as valid UTF-8 |
Double encoding | "Cafe" shows as "Caf�©" | UTF-8 text encoded to UTF-8 again | Encode only once; check for existing encoding before converting |
Truncated characters | Emoji or CJK chars missing/broken | String cut mid-sequence (e.g., SUBSTR on bytes) | Use character-aware functions (mb_substr in PHP, not substr) |
BOM issues | Extra characters at file start | UTF-8 BOM (EF BB BF) prepended to file | Save files as "UTF-8 without BOM" in your editor |
Database garbling | Characters corrupted on storage/retrieval | DB or connection not set to utf8mb4 | Use |
Ensuring Proper UTF-8 in HTML and HTTP Headers
To make sure your web content displays correctly across every browser and language:
HTML5: Add
<meta charset="utf-8">inside the<head>sectionHTTP Headers: Set
Content-Type: text/html; charset=utf-8on your serverDatabase: Use
utf8mb4charset in MySQL (not justutf8, which only supports 3-byte characters)Files: Save source files as UTF-8 without BOM in your editor
When and Where to Use UTF-8 Encoding
APIs and Web Requests: Safely transmit multilingual or emoji-rich data
Data Exporting: Store byte-accurate versions of input
Encoding Debugging: Check whether text corruption is due to encoding errors
Cryptography and Hashing: Convert strings into bytes for hashing (e.g., SHA-256)
Database Insertion: Some databases expect UTF-8 encoded strings as hex
Combine with These Tools
UTF8 Decoder -- Convert the encoded hex back into readable text
Base64 Encoder -- Base64-encode the UTF-8 bytes for safe transfer
URL Encoder -- Make the hex URL-safe for browser communication
Pro Tips
ASCII characters (A-Z, 0-9, punctuation) are just one byte; emojis or special characters take 2-4 bytes.
Use this tool to verify byte-level integrity when debugging network or API communication.
If a character doesn't show up properly in other systems, encode it here and check the byte breakdown.
Copy encoded output directly into HTTP headers, cookies, or tokens when required.
Always test with multi-byte characters (accented letters, CJK, emojis) to catch encoding issues early.
Frequently Asked Questions
What input formats are supported?
Why do some characters produce longer output?
Is the tool secure?
Can I encode binary data?
How many bytes does a UTF-8 character use?
What is a UTF-8 BOM?
What is the difference between UTF-8 encoding and URL encoding?
What encoding format does it use internally?
Related Articles

Test your APIs today!
Write in plain English — Qodex turns it into secure, ready-to-run tests.



