Go > Core Go Basics > Fundamental Data Types > Byte and Rune (byte, rune)
Understanding Bytes and Runes in Go
This example demonstrates the difference between bytes and runes in Go, and how they are used to represent characters. It showcases how runes handle Unicode characters, including those outside the ASCII range, while bytes are suitable for basic ASCII or binary data.
Introduction to Bytes and Runes
In Go, both This distinction is crucial when dealing with text, especially when handling characters outside the basic ASCII range (0-127), such as those found in many international languages.byte
and rune
are fundamental data types used to represent characters. However, they differ in their representation and intended use. byte
is an alias for uint8
, representing an 8-bit unsigned integer. rune
, on the other hand, is an alias for int32
, representing a Unicode code point.
Code Example: Iterating Through a String
This code snippet iterates through the string 'Hello, 世界!' in two ways. First, it iterates by byte, printing the hexadecimal representation of each byte. Notice that the Unicode character '世' is represented by multiple bytes. Second, it iterates by rune, using the range
keyword. This correctly handles Unicode characters, printing the Unicode code point and the character itself. The index reflects the byte position of the beginning of each rune.
package main
import "fmt"
func main() {
str := "Hello, 世界!"
fmt.Println("Iterating by byte:")
for i := 0; i < len(str); i++ {
fmt.Printf("Index: %d, Byte: %x\n", i, str[i])
}
fmt.Println("\nIterating by rune:")
for index, runeValue := range str {
fmt.Printf("Index: %d, Rune: %U, Character: %c\n", index, runeValue, runeValue)
}
}
Understanding the Output
When iterating by byte, you'll see that ASCII characters are represented by a single byte each. However, Unicode characters like '世' and '界' require multiple bytes. This is because they fall outside the ASCII range and are encoded using UTF-8, a variable-width encoding. Iterating by rune ensures that each character, regardless of its encoding, is treated as a single unit. This is essential for correct text processing.
Real-Life Use Case: Text Processing
Runes are particularly useful when you need to process text where characters can be represented by multiple bytes, such as in multilingual applications. For example, if you're counting the number of characters in a string, you should iterate by rune to ensure accurate results. Bytes are suitable when you want to treat the string as a sequence of raw bytes, for instance in binary data processing, or when working with ASCII-only text.
Best Practices
rune
when you need to work with individual Unicode characters.byte
when you're dealing with raw byte streams or ASCII-only text.range
to iterate by rune.
Interview Tip
Be prepared to explain the difference between byte
and rune
in Go. Understand their underlying types (uint8
and int32
, respectively) and when each should be used. Demonstrate your understanding with examples of handling Unicode characters.
When to use them
Use byte
for handling binary data, ASCII characters, or when storage space is a primary concern and you're certain you won't encounter non-ASCII characters. Use rune
when working with text that might contain Unicode characters and when you need to process text character by character.
Memory footprint
A byte
occupies 1 byte of memory (8 bits), while a rune
occupies 4 bytes of memory (32 bits). Therefore, using byte
can be more memory-efficient if you're only dealing with ASCII characters. However, using rune
ensures proper handling of all Unicode characters, at the cost of increased memory consumption.
Alternatives
Instead of iterating by bytes you can convert the string to a rune array, which would allocate more memory, but it's useful to get the size and index based on rune instead of byte. runes := []rune(str)
Pros
byte
: Memory efficient for ASCII-only data and raw byte streams.rune
: Correctly handles all Unicode characters, enabling robust text processing.
Cons
byte
: Cannot properly represent Unicode characters outside the ASCII range.rune
: Consumes more memory compared to byte
.
FAQ
-
What is the underlying type of `byte` in Go?
The underlying type ofbyte
isuint8
, which represents an 8-bit unsigned integer. -
What is the underlying type of `rune` in Go?
The underlying type ofrune
isint32
, which represents a 32-bit integer and is used to store Unicode code points. -
Why should I use `range` when iterating over a string in Go?
Usingrange
when iterating over a string ensures that you correctly handle Unicode characters, as it iterates by rune rather than by byte. -
When should I use `byte` instead of `rune`?
Usebyte
when you are working with raw byte streams, ASCII-only text, or when memory efficiency is a primary concern and you are certain that you won't encounter non-ASCII characters.