When compiling source files into executable files, GCC compiles source files into object files. Then, the linker links the object files into a executable file with a linker script. This article will introduce the basic syntax of linker script.
Table of Contents
- Linkers
- Linker Script
- VMA & LMA
- Defining Symbols
- Location Counter
- SECTIONS – Mapping Input Sections to Output Sections
- PROVIDE – Defining Symbols If Not Defined Yet
- KEEP – Keeping Symbols
- ENTRY – Setting Entry Point
- INPUT – Specifying Input Files
- OUTPUT – Specifying Output Files
- OUTPUT_FORMAT – Setting the Output BFD Format
- OUTPUT_ARCH – Setting the Output Machine Architecture
- Built-in Functions
- Conclusion
- Reference
Linkers
Linkers and Object Files
Linker combines several input files into one output file. The input files can be relocatable files or shared object files, while the output files are executable files, as shown below.
Relocatable files, shared object files, and executable files are all object files. An object file is a file in a special file format, called object file format. There are several object file formats today, the one used on Linux is ELF. To understand Linkers, you must first understand an object file format. Because Linkers reads data from several object files and outputs it into one object file. If you want to know more about ELF, you can refer to the following articles.
After understanding ELF, we know that an object file consists of multiple sections. Therefore, the linkers read sections from input files, and after some processing, output these sections to an output file. In this process, we can use linker script to tell the linker how to place these sections into the output file. We can use ld -T linker_script_file
to specify a linker script.
Default Linker Script Used by GCC
In the Linux command line, we can execute ld –verbose
to get the default linker script, as follows. When we use GCC to compile source files into an executable file, GCC will compile each source file into a relocatable file, whose file extension is .o
. These relocatable files and libraries such as libc.so are then linked together into an executable file. When we do not specify a linker script, GCC will use this default linker script to link into an executable file.
/* Script for -z combreloc: combine and sort reloc sections */ /* Copyright (C) 2014-2016 Free Software Foundation, Inc. Copying and distribution of this script, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. */ OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64") OUTPUT_ARCH(i386:x86-64) ENTRY(_start) SEARCH_DIR("=/usr/x86_64-redhat-linux/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/x86_64-redhat-linux/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); SECTIONS { /* Read-only sections, merged into text segment: */ PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; .interp : { *(.interp) } .note.gnu.build-id : { *(.note.gnu.build-id) } .hash : { *(.hash) } .gnu.hash : { *(.gnu.hash) } .dynsym : { *(.dynsym) } .dynstr : { *(.dynstr) } .gnu.version : { *(.gnu.version) } .gnu.version_d : { *(.gnu.version_d) } .gnu.version_r : { *(.gnu.version_r) } .rela.dyn : { *(.rela.init) *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) *(.rela.fini) *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) *(.rela.ctors) *(.rela.dtors) *(.rela.got) *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) *(.rela.ifunc) } .rela.plt : { *(.rela.plt) PROVIDE_HIDDEN (__rela_iplt_start = .); *(.rela.iplt) PROVIDE_HIDDEN (__rela_iplt_end = .); } .init : { KEEP (*(SORT_NONE(.init))) } .plt : { *(.plt) *(.iplt) } .plt.got : { *(.plt.got) } .plt.bnd : { *(.plt.bnd) } .text : { *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(.text .stub .text.* .gnu.linkonce.t.*) /* .gnu.warning sections are handled specially by elf32.em. */ *(.gnu.warning) } .fini : { KEEP (*(SORT_NONE(.fini))) } PROVIDE (__etext = .); PROVIDE (_etext = .); PROVIDE (etext = .); .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } .rodata1 : { *(.rodata1) } .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table .gcc_except_table.*) } .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } /* These sections are generated by the Sun/Oracle C++ compiler. */ .exception_ranges : ONLY_IF_RO { *(.exception_ranges .exception_ranges*) } /* Adjust the address for the data segment. We want to adjust up to the same address within the page on the next page up. */ . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); /* Exception handling */ .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } /* Thread Local Storage sections */ .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } .preinit_array : { PROVIDE_HIDDEN (__preinit_array_start = .); KEEP (*(.preinit_array)) PROVIDE_HIDDEN (__preinit_array_end = .); } .init_array : { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) PROVIDE_HIDDEN (__init_array_end = .); } .fini_array : { PROVIDE_HIDDEN (__fini_array_start = .); KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) PROVIDE_HIDDEN (__fini_array_end = .); } .ctors : { /* gcc uses crtbegin.o to find the start of the constructors, so we make sure it is first. Because this is a wildcard, it doesn't matter if the user does not actually link against crtbegin.o; the linker won't look for a file to match a wildcard. The wildcard also means that it doesn't matter which directory crtbegin.o is in. */ KEEP (*crtbegin.o(.ctors)) KEEP (*crtbegin?.o(.ctors)) /* We don't want to include the .ctor section from the crtend.o file until after the sorted ctors. The .ctor section from the crtend file contains the end of ctors marker and it must be last */ KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) KEEP (*(SORT(.ctors.*))) KEEP (*(.ctors)) } .dtors : { KEEP (*crtbegin.o(.dtors)) KEEP (*crtbegin?.o(.dtors)) KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) KEEP (*(SORT(.dtors.*))) KEEP (*(.dtors)) } .jcr : { KEEP (*(.jcr)) } .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } .dynamic : { *(.dynamic) } .got : { *(.got) *(.igot) } . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); .got.plt : { *(.got.plt) *(.igot.plt) } .data : { *(.data .data.* .gnu.linkonce.d.*) SORT(CONSTRUCTORS) } .data1 : { *(.data1) } _edata = .; PROVIDE (edata = .); . = .; __bss_start = .; .bss : { *(.dynbss) *(.bss .bss.* .gnu.linkonce.b.*) *(COMMON) /* Align here to ensure that the .bss section occupies space up to _end. Align after .bss to ensure correct alignment even if the .bss section disappears because there are no input sections. FIXME: Why do we need it? When there is no .bss section, we don't pad the .data section. */ . = ALIGN(. != 0 ? 64 / 8 : 1); } .lbss : { *(.dynlbss) *(.lbss .lbss.* .gnu.linkonce.lb.*) *(LARGE_COMMON) } . = ALIGN(64 / 8); . = SEGMENT_START("ldata-segment", .); .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : { *(.lrodata .lrodata.* .gnu.linkonce.lr.*) } .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : { *(.ldata .ldata.* .gnu.linkonce.l.*) . = ALIGN(. != 0 ? 64 / 8 : 1); } . = ALIGN(64 / 8); _end = .; PROVIDE (end = .); . = DATA_SEGMENT_END (.); /* Stabs debugging sections. */ .stab 0 : { *(.stab) } .stabstr 0 : { *(.stabstr) } .stab.excl 0 : { *(.stab.excl) } .stab.exclstr 0 : { *(.stab.exclstr) } .stab.index 0 : { *(.stab.index) } .stab.indexstr 0 : { *(.stab.indexstr) } .comment 0 : { *(.comment) } /* DWARF debug sections. Symbols in the DWARF debugging sections are relative to the beginning of the section so we begin them at 0. */ /* DWARF 1 */ .debug 0 : { *(.debug) } .line 0 : { *(.line) } /* GNU DWARF 1 extensions */ .debug_srcinfo 0 : { *(.debug_srcinfo) } .debug_sfnames 0 : { *(.debug_sfnames) } /* DWARF 1.1 and DWARF 2 */ .debug_aranges 0 : { *(.debug_aranges) } .debug_pubnames 0 : { *(.debug_pubnames) } /* DWARF 2 */ .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } .debug_abbrev 0 : { *(.debug_abbrev) } .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } .debug_frame 0 : { *(.debug_frame) } .debug_str 0 : { *(.debug_str) } .debug_loc 0 : { *(.debug_loc) } .debug_macinfo 0 : { *(.debug_macinfo) } /* SGI/MIPS DWARF 2 extensions */ .debug_weaknames 0 : { *(.debug_weaknames) } .debug_funcnames 0 : { *(.debug_funcnames) } .debug_typenames 0 : { *(.debug_typenames) } .debug_varnames 0 : { *(.debug_varnames) } /* DWARF 3 */ .debug_pubtypes 0 : { *(.debug_pubtypes) } .debug_ranges 0 : { *(.debug_ranges) } /* DWARF Extension. */ .debug_macro 0 : { *(.debug_macro) } .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } }
Linker Script
We will introduce some linker script syntax. Please refer to Linker Scripts for other details. After understanding this, you should be able to understand the linker scripts listed above that GCC uses by default.
VMA & LMA
Each loadable and allocatable sections have two addresses, namely VMA and LMA. VMA (virtual memory address) is the address when the section runs. LMA (load memory address) is the address when the section is loaded.
In most cases, these two addresses are the same. An example where they are different is when the data is loaded into ROM and its address is LMA. Later, when it starts to run, it will be loaded into RAM, and its address at this time is VMA.
Defining Symbols
In a linker script, we can define symbols and specify the addresses of symbols. In the following example, we define a symbol called start
, and its address is set to 0x7C00.
start = 0x7C00;
Location Counter
Location counter is a special symbol '.'
. It refers to the address of the next output section. In a linker script, the starting value of the location counter is 0. When we output a section, the location counter will plus the size of the section.
SECTIONS – Mapping Input Sections to Output Sections
SECTIONS command is the most important command. It tells the linker how to map input sections to output sections, and how to place output sections in memory. The syntax is as follows.
SECTIONS { SectionName1 [VMA] [(type)] : [AT(LMA)] { output-sections-command } [> region] [: phdr : phdr ...] [=fillexp] SectionName2 [address] [(type)] : [AT(LMA)] { output-sections-command } [> region] [: phdr : phdr ...] [=fillexp] ... }
In the SECTIONS command, we can define many sections. When defining an output section, you must specify its name and which input sections it contains. In the following example, we define an output section called .text
, which will contain the .text
sections of all input files. The *
here refers to all input files. We can also specify sections of a file. As in the following line, we define an output section called .data
, and it contains the .data
section of the file data.o
.
SECTIONS { .text : { *(.text) } .data : { data.o(.data) } }
In the above example, we did not specify the address of the output section, then it will use the address of the location counter. Therefore, the above example is equivalent to the following example. The linker will place the output sections in the output file sequentially according to the order in which the sections are defined in the linker script.
SECTIONS { .text . : { *(.text) } .data . : { data.o(.data) } }
We can also change the VMS of the output section. In the following example, we set the VMA of the output section .text
by changing the address of the location counter.
SECTIONS { . = 0x7C00; .text : { *(.text) } }
Or, in the example below, we directly set the VMA of the output section .text
.
SECTIONS { .text 0x7C00 : { *(.text) } }
If you understand the syntax of the SECTIONS command mentioned above, and look at the linker script used by GCC by default, you should be able to understand more than half of it. For details on other SECTIONS commands, please refer to Linker Script.
In addition, if we do not set LMA, the linker will use the value of VMA to set LMA. Since in most cases, VMA and LMA are the same, we only need to set the VMA.
PROVIDE – Defining Symbols If Not Defined Yet
If the symbol has not been defined yet, define the symbol. If the symbol has been defined in an input file, it is ignored.
SECTIONS { PROVIDE (__etext = .); PROVIDE (_etext = .); PROVIDE (etext = .); }
KEEP – Keeping Symbols
The KEEP command is used to tell the linker to keep symbols even if the symbols are not referenced.
SECTIONS { .jcr : { KEEP (*(.jcr)) } }
ENTRY – Setting Entry Point
ENTRY command can be used to set the entry point of executable file. Its syntax is as follows. Its parameter is a symbol name. In addition, we can also use ld -e symbol
to set the entry point.
ENTRY(symbol)
INPUT – Specifying Input Files
INPUT command can be used to specify the input file, but we usually specify the input .o
file after the ld command.
INPUT(file, file, ...) INPUT(file file ...)
OUTPUT – Specifying Output Files
OUTPUT command can be used to specify the output file name, but we generally use ld -o file
to specify the output file name. If both are used, ld -o file will be used first. In this case, we can use the OUTPUT command to specify the default output file name.
OUTPUT(filename)
OUTPUT_FORMAT – Setting the Output BFD Format
OUTPUT_FORMAT can be used to specify the BFD format of the output file. We can also use ld -oformat bfdname
.
OUTPUT_FORMAT(bfdname) OUTPUT_FORMAT(default, big, little) // Example OUTPUT_FORMAT(elf64-x86-64, elf64-x86-64, elf64-x86-64)
OUTPUT_ARCH – Setting the Output Machine Architecture
OUTPUT_ARCH can be used to specify the output machine architecture.
OUTPUT_ARCH(bfdarch) // Example OUTPUT_ARCH(i386:x86-64)
Built-in Functions
ALIGN – Aligning Location Counter
The ALIGN function will align the location counter value to the specified alignment and return the new value.
SECTIONS { .text . : { *(.text) } .data ALIGN(0x8) : { *(.data) } }
ADDR – Obtaining the Address of a Section
ADDR function returns the VMA of the section.
SECTIONS { .text . : { *(.text) } .data ADDR(.text) + 0x200 : { *(.data) } }
SIZEOF – Obtaining the Size of a Section
SIZEOF function returns the size of the section.
SECTIONS { .text . : { *(.text) } .data ADDR(.text) + SIZEOF(.text) : { *(.data) } }
Conclusion
Now when compiling the source code, GCC not only compiles the source code, but also links the object files into executable files. Therefore, although we know what the linking does, we do not understand its internal detail. By understanding linker script, we can understand more specifically what linking is doing.