C# tutorials > Modern C# Features > C# 6.0 and Later > What are UTF-8 string literals in C# 11?
What are UTF-8 string literals in C# 11?
UTF-8 string literals, introduced in C# 11, provide a concise and efficient way to represent strings encoded in UTF-8. Instead of representing strings as UTF-16 (the standard .NET encoding), these literals store the string directly as UTF-8 bytes. This can be particularly useful when interacting with systems or APIs that expect UTF-8 encoded data.
Basic Usage
This snippet demonstrates the simplest form of a UTF-8 string literal. The `u8` suffix appended to the string literal indicates that the compiler should encode the string as UTF-8. The resulting `utf8Bytes` variable is a `byte[]` containing the UTF-8 encoded representation of "Hello, World!". The `System.Text.Encoding.UTF8.GetString()` method is then used to decode the UTF-8 byte array back into a standard C# string for display.
byte[] utf8Bytes = "Hello, World!"u8;
Console.WriteLine(System.Text.Encoding.UTF8.GetString(utf8Bytes));
Concepts Behind the Snippet
Prior to C# 11, you would typically create a UTF-8 byte array from a string using `System.Text.Encoding.UTF8.GetBytes()`. UTF-8 string literals offer a more direct and convenient syntax. The `u8` suffix tells the compiler to directly generate the UTF-8 byte representation at compile time, potentially improving performance by avoiding runtime encoding costs. It's important to understand that you're dealing with a `byte[]`, not a `string` in the traditional sense, after defining the literal.
Real-Life Use Case Section
A common scenario is when interacting with web APIs that require or return data encoded as UTF-8, such as JSON payloads or message queues like Kafka. Another use case is when writing data to files or network streams where UTF-8 is the desired encoding. For example, if you are building a microservice that needs to send messages to another service that expects UTF-8 encoding, using UTF-8 string literals can simplify the process and potentially improve performance.
// Sending a UTF-8 encoded JSON payload
using System.Text.Json;
var data = new { Message = "Hello from C#" };
byte[] jsonData = JsonSerializer.SerializeToUtf8Bytes(data);
//Alternatively
//byte[] jsonData = JsonSerializer.Serialize("{\"Message\":\"Hello from C#\"}"u8);
// Send jsonData to your API endpoint
Best Practices
Interview Tip
When asked about new features in C#, mention UTF-8 string literals as a convenient way to work with UTF-8 encoded data directly. Explain how they can improve performance by encoding at compile time and reduce boilerplate code. Be prepared to discuss scenarios where they would be particularly useful, such as interacting with web APIs or handling data streams.
When to use them
Use UTF-8 string literals primarily when:
Memory Footprint
UTF-8 can sometimes use less memory than UTF-16, especially for strings containing primarily ASCII characters. ASCII characters require only 1 byte in UTF-8, while they always require 2 bytes in UTF-16. For strings containing many non-ASCII characters, UTF-8 can use 2-4 bytes per character. If your application processes a large number of strings consisting mainly of ASCII characters, using UTF-8 string literals might lead to a small reduction in memory consumption.
Alternatives
The primary alternative to UTF-8 string literals is using `System.Text.Encoding.UTF8.GetBytes()` to convert a regular C# string to a UTF-8 byte array at runtime. Another alternative, although less common, involves using character arrays and manually constructing the UTF-8 byte sequence. UTF-8 string literals are generally preferred for their simplicity and potential performance benefits.
// Alternative: Using Encoding.UTF8.GetBytes()
string myString = "Hello, World!";
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(myString);
Pros
Cons
FAQ
-
How do I convert a UTF-8 byte array created from a UTF-8 string literal back to a regular C# string?
Use the `System.Text.Encoding.UTF8.GetString()` method. For example: csharp byte[] utf8Bytes = "Hello, World!"u8; string regularString = System.Text.Encoding.UTF8.GetString(utf8Bytes); -
Can I use escape sequences in UTF-8 string literals?
Yes, you can use escape sequences like `\n` (newline), `\t` (tab), and `\uXXXX` (Unicode character) within UTF-8 string literals, just like in regular string literals. The compiler will interpret these escape sequences and encode the resulting characters in UTF-8. -
Are UTF-8 string literals automatically null-terminated?
No, UTF-8 string literals in C# are not automatically null-terminated. The resulting `byte[]` contains only the UTF-8 encoded bytes of the string. If you require a null-terminated UTF-8 string for compatibility with certain C APIs or other systems, you need to add the null terminator (a byte with the value 0) manually to the `byte[]`.