Go > Core Go Basics > Fundamental Data Types > Byte and Rune (byte, rune)

Understanding Bytes and Runes in Go

This example demonstrates the difference between bytes and runes in Go, and how they are used to represent characters. It showcases how runes handle Unicode characters, including those outside the ASCII range, while bytes are suitable for basic ASCII or binary data.

Introduction to Bytes and Runes

In Go, both byte and rune are fundamental data types used to represent characters. However, they differ in their representation and intended use. byte is an alias for uint8, representing an 8-bit unsigned integer. rune, on the other hand, is an alias for int32, representing a Unicode code point.

This distinction is crucial when dealing with text, especially when handling characters outside the basic ASCII range (0-127), such as those found in many international languages.

Code Example: Iterating Through a String

This code snippet iterates through the string 'Hello, 世界!' in two ways. First, it iterates by byte, printing the hexadecimal representation of each byte. Notice that the Unicode character '世' is represented by multiple bytes.

Second, it iterates by rune, using the range keyword. This correctly handles Unicode characters, printing the Unicode code point and the character itself. The index reflects the byte position of the beginning of each rune.

package main

import "fmt"

func main() {
	str := "Hello, 世界!"

	fmt.Println("Iterating by byte:")
	for i := 0; i < len(str); i++ {
		fmt.Printf("Index: %d, Byte: %x\n", i, str[i])
	}

	fmt.Println("\nIterating by rune:")
	for index, runeValue := range str {
		fmt.Printf("Index: %d, Rune: %U, Character: %c\n", index, runeValue, runeValue)
	}
}

Understanding the Output

When iterating by byte, you'll see that ASCII characters are represented by a single byte each. However, Unicode characters like '世' and '界' require multiple bytes. This is because they fall outside the ASCII range and are encoded using UTF-8, a variable-width encoding.

Iterating by rune ensures that each character, regardless of its encoding, is treated as a single unit. This is essential for correct text processing.

Real-Life Use Case: Text Processing

Runes are particularly useful when you need to process text where characters can be represented by multiple bytes, such as in multilingual applications. For example, if you're counting the number of characters in a string, you should iterate by rune to ensure accurate results. Bytes are suitable when you want to treat the string as a sequence of raw bytes, for instance in binary data processing, or when working with ASCII-only text.

Best Practices

  • Use rune when you need to work with individual Unicode characters.
  • Use byte when you're dealing with raw byte streams or ASCII-only text.
  • When iterating over strings containing potentially non-ASCII characters, always use range to iterate by rune.

Interview Tip

Be prepared to explain the difference between byte and rune in Go. Understand their underlying types (uint8 and int32, respectively) and when each should be used. Demonstrate your understanding with examples of handling Unicode characters.

When to use them

Use byte for handling binary data, ASCII characters, or when storage space is a primary concern and you're certain you won't encounter non-ASCII characters. Use rune when working with text that might contain Unicode characters and when you need to process text character by character.

Memory footprint

A byte occupies 1 byte of memory (8 bits), while a rune occupies 4 bytes of memory (32 bits). Therefore, using byte can be more memory-efficient if you're only dealing with ASCII characters. However, using rune ensures proper handling of all Unicode characters, at the cost of increased memory consumption.

Alternatives

Instead of iterating by bytes you can convert the string to a rune array, which would allocate more memory, but it's useful to get the size and index based on rune instead of byte. runes := []rune(str)

Pros

  • byte: Memory efficient for ASCII-only data and raw byte streams.
  • rune: Correctly handles all Unicode characters, enabling robust text processing.

Cons

  • byte: Cannot properly represent Unicode characters outside the ASCII range.
  • rune: Consumes more memory compared to byte.

FAQ

  • What is the underlying type of `byte` in Go?

    The underlying type of byte is uint8, which represents an 8-bit unsigned integer.
  • What is the underlying type of `rune` in Go?

    The underlying type of rune is int32, which represents a 32-bit integer and is used to store Unicode code points.
  • Why should I use `range` when iterating over a string in Go?

    Using range when iterating over a string ensures that you correctly handle Unicode characters, as it iterates by rune rather than by byte.
  • When should I use `byte` instead of `rune`?

    Use byte when you are working with raw byte streams, ASCII-only text, or when memory efficiency is a primary concern and you are certain that you won't encounter non-ASCII characters.