GCC Inline Assembly

Photo by amoon ra on Unsplash
Photo by amoon ra on Unsplash
GCC’s inline assembly allows us to embed assembly code in C code. C code is much more readable than writing assembly code directly.

GCC’s inline assembly allows us to embed assembly code in C code. C code is much more readable than writing assembly code directly. This article will introduce how to use GCC inline assembly.

Basic Asm

GCC supports two inline assembly syntaxes. The first is basic asm, which only contains assembler instructions without operators. Its syntax is as follows.

asm [qualifiers](AssemblerInstructions);

asm keyword is a GNU extension. When compiling C code with the -ansi or -std option and selecting C dialects without GNU extensions, use __asm__ instead. For C++, asm is a standard keyword.

Basic asm has two qualifiers.

  • volatile: No effect. All basic asm are implicitly volatile.
  • inline: Please refer to Size of an asm.

AssemblerInstructions can be any assembly code, including instructions and directives. GCC does not parse the content, but directly output the AssemblerInstructions string. In AssemblerInstructions, we can include multiple lines of assembly code and use \n\t to separate each line of assembly code.

GCC recommends using extended asm instead of basic asm. But one advantage of basic asm is that it can appear anywhere, not necessarily inside a function.

asm(".code16gcc");

void func()
{
    asm("xorw %ax, %axnt"
        "movw %ax, %ds");
}

Extended Asm

The second type of inline assembly supported by GCC is extended asm. It contains assembler instructions and operators, and its syntax is as follows.

asm [qualifiers](AssemblerTemplate
                 : OutputOperands
                 [: InputOperands
                 [: Clobbers]]);

asm [qualifiers](AssemblerTemplate
                 : OutputOperands
                 : InputOperands
                 : Clobbers
                 : GotoLables);

asm keyword is a GNU extension. When compiling C code with the -ansi or -std option and selecting C dialects without GNU extensions, use __asm__ instead. For C++, asm is a standard keyword.

Extended asm has three qualifiers.

  • volatile: The typical usage of extended asm is to accept input values ​​and produce output values. However, your asm statements may have some side effects. So, you can use volatile to disable optimizations.
  • inline: Please refer to Size of an asm.
  • goto: Tell GCC that this asm statement may perform a jump to a label listed in GotoLabels.

Unlike basic asm, GCC parses AssemblerTemplate. Therefore, when accessing registers in AssemblerTemplate, use %% instead of %, so that GCC will output a %. In addition, extended asm can only appear inside functions. The same as basic asm, we can include multiple lines of assembly code and use \n\t to separate each line of assembly code.

void func()
{
    asm("xorw %%ax, %%axnt"
        "movw %%ax, %%ds"
        : );
}

Constraints

Before talking about output and input operands, we need to talk about constraints first, because they will be used in operands. Constraints are strings used to describe operands. An operand can have multiple constraints. The following are several commonly used constraints. For other constraints, please refer to Simple Constraints. In addition to these simple constraints, each architecture supports some specific constraints, please refer to Constraints for Particular Machines.

ConstraintDescription
mA memory operand is allowed.
rA general-purpose register operand is allowed
iAn immediate integer operand is allowed.
gAny general-purpose register, memory or immediate integer operand is allowed, 
0, 1, …, 9An operand that matches the specified operand number is allowed.
Simple Constraints.

A constraint string can contain multiple constraints. For example, rm means that the operand can be a general-purpose register or a memory address. When the constraint string contains multiple constraints, GCC will select the most efficient one based on the current context.

In addition, GCC also provides constraint modifier to modify operands. Two constraint modifiers are listed below. For other constraint modifiers, please refer to Constraint Modifier Characters.

Constraint ModifierDescription
=This operand is written to by this instruction.
+This operand is both read and written by the instruction.
Constraint Modifiers.

Operands

An asm statement has 0 or more operands, separated by commas. The format of each operand is as follows.

[[asmSymbolicName]] constraint (cVariableName)
[[asmSymbolicName]] constraint (cExpression)

Each operand must have a constraint and a corresponding C variable name or a c expression. In an asm statement, GCC numbers each operand, starting from 0. Therefore, we can refer to an operand in the asm statement through a number. In the following code, we can use %1 to refer to mask and %0 to refer to index.

uint32_t foo = 1234;
uint32_t bar;
asm("movl %1, %0"
    : "=r" (bar)
    : "r" (foo));

Each asm statement can only have a maximum of 30 operands. Also, when an operand uses the + constraint modifier, it counts as 2 operands.

In addition to using numbers to refer to operands, GCC also allows us to use names to refer to operands, as follows.

uint32_t foo = 1234;
uint32_t bar;
asm("movl %[aFoo], %[aBar]"
    : [aBar] "=r" (bar)
    : [aFoo] "r" (foo));

Output Operands

An asm statement has 0 or more output operands, separated by commas. They are used to indicate the names of C variables that the assembly code will modify. An output operand must have a constraint modifier, and it must be = (write-only) or + (read-write).

[[asmSymbolicName]] constraint (cVariableName)

In the following code, foo must be writable and readable, so its constraint modifier is +.

uint32_t foo = 1234;
asm("xorl %0, %0" : "+r" (foo));

The following code is equivalent to the above code. In the output operands, we set foo to be write-only. In read operands, set foo to use the same location as %0. Therefore, %0 and %1 will use the same register.

uint32_t foo = 1234;
asm("xorl %1, %0" : "=r" (foo) : "0" (foo));

Input Operand

An asm statement has 0 or more input operands, separated by commas. They are used to indicate the names of C variables that this assembly code will read, or a c expression. An input operand can have no constraint modifier. If anything, they cannot be = and +.

[[asmSymbolicName]] constraint (cExpression)

下面程式碼顯示如何使用 input operands。

uint32_t c = 1;
uint32_t d;
uint32_t *e = &c;
asm("mov %1, %0"
    : "=rm" (d)
    : "rm" (*e));

Clobbers

Clobbers means that in addition to modifying the values in the locations listed in the output operands, an asm statement will also produce some side effects and modify other locations. We must list these locations that are not listed in the output operands in clobbers to inform GCC that they will be modified.

Clobbers cannot overlap input and output operands. Clobbers can contain the following values.

  • Register names: Indicates that the assembly code will modify other registers, such as rax, r1, but not the stack pointer register.
  • cc : Indicates that the assembly code will modify the flags register.
  • memory : Tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands. To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the memory clobber effectively forms a read/write memory barrier for the compiler.

In the following code, the bsfl instruction will modify the flags register.

uint32_t mask = 1234;
uint32_t index;
asm("bsfl %1, %0"
    : "=r" (index)
    : "r" (mask)
    : "cc");

In the following code, we directly modify eax, so eax must be listed in the clobber.

asm("xorl %%eax, %%eax"
    : /* no output */
    : /* no input */
    : "eax");

Goto Labels

asm’s goto qualifier allows assembly code to jump to one or more C labels. The GotoLabels of the asm statement contains the C labels that will be jumped in the assembly code.

When referencing a label in assembly code, use %l (lowercase L) plus the label number. GCC numbers the output and input operands starting from 0, and also continue to number the labels. The following code shows how to reference a label. In addition, when counting the number, remember to the operand using + counts as 2 operands.

asm goto("btl %1, %0nt"
         "jc %l2"
         : /* No outputs. */
         : "r" (p1), "r" (p2) 
         : "cc" 
         : carry);
return 0;

carry:
return 1;

Conclusion

This article introduces basic constraints. Each architecture also has many specific constraints. Before starting to use GCC inline assembly, you can first understand what additional constraints are available based on your current architecture.

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like
Photo by Timothée Geenens on Unsplash
Read More

x86 Memory Map

After the x86 PC boots, it will be in real mode. At this time, we can access memory below 1 MB. However, the BIOS also uses some memory. Therefore, we must know which areas the BIOS occupies in order to avoid them.
Read More
Photo by Patrick on Unsplash
Read More

x86-64 Calling Conventions

Calling conventions refers to the specifications that the two functions should follow when one function calls another function. For example, how to pass parameters and a return value ​​between them. Calling conventions are part of the application binary interface (ABI).
Read More
Photo by Lanju Fotografie on Unsplash
Read More

Makefile

Makefile is the most commonly used compilation tool in Linux. Stuart Feldman created it at Bell Labs in 1967. Although it may be older than you and me, it is still active nowadays.
Read More