Null Character

The null character, also known as the null terminator, is a character with the value of zero. Besides representing a NOP, nowadays it is known as the control character that indicates the end of an string in C-like data formats. In essence, the null terminator is a way to encode the end of a string using its own contents. The common alternative to null terminators is storing the length of the string explicitly; this is how languages other than C do it, which is safer and less error prone.

How the string "osdev" would get encoded in a C-like data format

In ASCII, Unicode, and other character sets, it is encoded as a string of n low bits (or in other words, a representation of the value zero with n bits). From this point forward, when we mention the null character/terminator, we will refer to a 8-bit ASCII NULL character, represented as 0x00 in hexadecimal or, in C, as the escape sequence '\0' (not to confuse with "\0", which in C is a null-terminated string containing one null character).

It is important to mention that encoding a NULL value to indicate an end is not limited to strings: arrays of n strings can also have a NULL pointer at it's n-th element to indicate it's end, as argv[] does (the standard way of passing arguments to main() in C).

How to use and detect a null terminator

In C, the null terminator is automatically attached at the end of a string when you create it with double quotation marks ("Hello, OSDev!" is an array of 14 characters, as after the '!', there is a '\0' or null byte). The standard library uses this to detect the length of strings and operate consequentially with them, but behind the header files, the code that detects the null terminator is quite easy but important to understand.

size_t strlen(const char *str)
{
    size_t n = 0;
    
    while (str[n] != '\0') {
        n++;
    }
    
    return n;
}

This C code is a quick-and-easy implementation of the strlen() standard function. All it does is increment n while the n-th character of the string str is not a null terminator, and then returns it. If no null byte is found in the string, the loop will keep executing, until it finds a null byte somewhere in memory. If there is memory protection, it may end up reading from an invalid address, generating an exception. Keep in mind that detecting a null terminator is only useful in the context of strings, as memory in general can have arbitrary zeros that don't mean the end of its contents.

In assembly

We talked about the null terminator in C, but not in assembly. In most of the dialects of assembly, strings variables (or more accurately, labels) can be defined to be null terminated when created. Some assemblers accept the tag .asciiz followed by an string between double quotation marks to create (usually in the data section) a string, that then you can treat as a normal C-like string. Example under 64 bit Intel assembly, with NASM:

bits 64

global main
extern printf

section .data

str:    db  "Hello, OSDev!"         ; NON-NULL-TERMINATED STRING
strnll: db  ", and goodbye!", 0     ; NULL-TERMINATED STRING

section .text

main:
    lea     rdi, [str]
    xor     rax, rax
    call    printf

As printf will start reading the characters to print from address "str" until a NULL, the output of the program will be:

"Hello, OSDev!, and goodbye!"

OS design considerations

Null terminators are inherently error-prone, given that a missing null character will cause string-handling code to just keep reading memory until it hits a null terminator somewhere. Additionally, it causes some string operations to be more inefficient than they should since the string length is not explicitly recorded, a strlen()-like function must be called every time the string length is needed, which means iterating the entire string.

The proper alternative is to record the string length explicitly. This is how programming languages other than C, specially more modern ones, like Zig or Rust. Pascal for example, records the string length in a single byte before the start of the string data; other languages store it as part of a structure, along with the string data pointer. These strategies can be implemented on C too.

Null Character

Contents

How to use and detect a null terminator

In assembly

OS design considerations

See also

External Links

Navigation menu

Null Character

How to use and detect a null terminator

In assembly

OS design considerations

See also

External Links

Navigation menu

Search