GCC’s inline assembly allows us to embed assembly code in C code. C code is much more readable than writing assembly code directly. This article will introduce how to use GCC inline assembly.
Table of Contents
Basic Asm
GCC supports two inline assembly syntaxes. The first is basic asm, which only contains assembler instructions without operators. Its syntax is as follows.
asm [qualifiers](AssemblerInstructions);
asm
keyword is a GNU extension. When compiling C code with the -ansi
or -std
option and selecting C dialects without GNU extensions, use __asm__
instead. For C++, asm
is a standard keyword.
Basic asm has two qualifiers.
- volatile: No effect. All basic asm are implicitly volatile.
- inline: Please refer to Size of an asm.
AssemblerInstructions
can be any assembly code, including instructions and directives. GCC does not parse the content, but directly output the AssemblerInstructions
string. In AssemblerInstructions
, we can include multiple lines of assembly code and use \n\t
to separate each line of assembly code.
GCC recommends using extended asm instead of basic asm. But one advantage of basic asm is that it can appear anywhere, not necessarily inside a function.
asm(".code16gcc"); void func() { asm("xorw %ax, %axnt" "movw %ax, %ds"); }
Extended Asm
The second type of inline assembly supported by GCC is extended asm. It contains assembler instructions and operators, and its syntax is as follows.
asm [qualifiers](AssemblerTemplate : OutputOperands [: InputOperands [: Clobbers]]); asm [qualifiers](AssemblerTemplate : OutputOperands : InputOperands : Clobbers : GotoLables);
asm
keyword is a GNU extension. When compiling C code with the -ansi
or -std
option and selecting C dialects without GNU extensions, use __asm__
instead. For C++, asm
is a standard keyword.
Extended asm has three qualifiers.
- volatile: The typical usage of extended asm is to accept input values and produce output values. However, your asm statements may have some side effects. So, you can use volatile to disable optimizations.
- inline: Please refer to Size of an asm.
- goto: Tell GCC that this asm statement may perform a jump to a label listed in GotoLabels.
Unlike basic asm, GCC parses AssemblerTemplate
. Therefore, when accessing registers in AssemblerTemplate
, use %%
instead of %
, so that GCC will output a %
. In addition, extended asm can only appear inside functions. The same as basic asm, we can include multiple lines of assembly code and use \n\t
to separate each line of assembly code.
void func() { asm("xorw %%ax, %%axnt" "movw %%ax, %%ds" : ); }
Constraints
Before talking about output and input operands, we need to talk about constraints first, because they will be used in operands. Constraints are strings used to describe operands. An operand can have multiple constraints. The following are several commonly used constraints. For other constraints, please refer to Simple Constraints. In addition to these simple constraints, each architecture supports some specific constraints, please refer to Constraints for Particular Machines.
Constraint | Description |
---|---|
m | A memory operand is allowed. |
r | A general-purpose register operand is allowed |
i | An immediate integer operand is allowed. |
g | Any general-purpose register, memory or immediate integer operand is allowed, |
0 , 1 , …, 9 | An operand that matches the specified operand number is allowed. |
A constraint string can contain multiple constraints. For example, rm
means that the operand can be a general-purpose register or a memory address. When the constraint string contains multiple constraints, GCC will select the most efficient one based on the current context.
In addition, GCC also provides constraint modifier to modify operands. Two constraint modifiers are listed below. For other constraint modifiers, please refer to Constraint Modifier Characters.
Constraint Modifier | Description |
---|---|
= | This operand is written to by this instruction. |
+ | This operand is both read and written by the instruction. |
Operands
An asm statement has 0 or more operands, separated by commas. The format of each operand is as follows.
[[asmSymbolicName]] constraint (cVariableName) [[asmSymbolicName]] constraint (cExpression)
Each operand must have a constraint and a corresponding C variable name or a c expression. In an asm statement, GCC numbers each operand, starting from 0. Therefore, we can refer to an operand in the asm statement through a number. In the following code, we can use %1
to refer to mask
and %0
to refer to index
.
uint32_t foo = 1234; uint32_t bar; asm("movl %1, %0" : "=r" (bar) : "r" (foo));
Each asm statement can only have a maximum of 30 operands. Also, when an operand uses the +
constraint modifier, it counts as 2 operands.
In addition to using numbers to refer to operands, GCC also allows us to use names to refer to operands, as follows.
uint32_t foo = 1234; uint32_t bar; asm("movl %[aFoo], %[aBar]" : [aBar] "=r" (bar) : [aFoo] "r" (foo));
Output Operands
An asm statement has 0 or more output operands, separated by commas. They are used to indicate the names of C variables that the assembly code will modify. An output operand must have a constraint modifier, and it must be =
(write-only) or +
(read-write).
[[asmSymbolicName]] constraint (cVariableName)
In the following code, foo
must be writable and readable, so its constraint modifier is +
.
uint32_t foo = 1234; asm("xorl %0, %0" : "+r" (foo));
The following code is equivalent to the above code. In the output operands, we set foo
to be write-only. In read operands, set foo
to use the same location as %0
. Therefore, %0
and %1
will use the same register.
uint32_t foo = 1234; asm("xorl %1, %0" : "=r" (foo) : "0" (foo));
Input Operand
An asm statement has 0 or more input operands, separated by commas. They are used to indicate the names of C variables that this assembly code will read, or a c expression. An input operand can have no constraint modifier. If anything, they cannot be = and +.
[[asmSymbolicName]] constraint (cExpression)
下面程式碼顯示如何使用 input operands。
uint32_t c = 1; uint32_t d; uint32_t *e = &c; asm("mov %1, %0" : "=rm" (d) : "rm" (*e));
Clobbers
Clobbers means that in addition to modifying the values in the locations listed in the output operands, an asm statement will also produce some side effects and modify other locations. We must list these locations that are not listed in the output operands in clobbers to inform GCC that they will be modified.
Clobbers cannot overlap input and output operands. Clobbers can contain the following values.
- Register names: Indicates that the assembly code will modify other registers, such as
rax
,r1
, but not the stack pointer register. - cc : Indicates that the assembly code will modify the flags register.
- memory : Tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands. To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the memory clobber effectively forms a read/write memory barrier for the compiler.
In the following code, the bsfl
instruction will modify the flags register.
uint32_t mask = 1234; uint32_t index; asm("bsfl %1, %0" : "=r" (index) : "r" (mask) : "cc");
In the following code, we directly modify eax
, so eax
must be listed in the clobber.
asm("xorl %%eax, %%eax" : /* no output */ : /* no input */ : "eax");
Goto Labels
asm’s goto
qualifier allows assembly code to jump to one or more C labels. The GotoLabels
of the asm statement contains the C labels that will be jumped in the assembly code.
When referencing a label in assembly code, use %l
(lowercase L) plus the label number. GCC numbers the output and input operands starting from 0, and also continue to number the labels. The following code shows how to reference a label. In addition, when counting the number, remember to the operand using +
counts as 2 operands.
asm goto("btl %1, %0nt" "jc %l2" : /* No outputs. */ : "r" (p1), "r" (p2) : "cc" : carry); return 0; carry: return 1;
Conclusion
This article introduces basic constraints. Each architecture also has many specific constraints. Before starting to use GCC inline assembly, you can first understand what additional constraints are available based on your current architecture.
Reference
- How to Use Inline Assembly Language in C Code, Using the GNU Complier Collection.
- GCC’s assembler syntax.