Linker Script

Photo by Dominik Kempf on Unsplash
Photo by Dominik Kempf on Unsplash
When compiling source files into executable files, GCC compiles source files into object files. Then, the linker links the object files into a executable file with a linker script.

When compiling source files into executable files, GCC compiles source files into object files. Then, the linker links the object files into a executable file with a linker script. This article will introduce the basic syntax of linker script.

Linkers

Linkers and Object Files

Linker combines several input files into one output file. The input files can be relocatable files or shared object files, while the output files are executable files, as shown below.

Object code libraries, from Linkers & Loaders.
Object code libraries, from Linkers & Loaders.

Relocatable files, shared object files, and executable files are all object files. An object file is a file in a special file format, called object file format. There are several object file formats today, the one used on Linux is ELF. To understand Linkers, you must first understand an object file format. Because Linkers reads data from several object files and outputs it into one object file. If you want to know more about ELF, you can refer to the following articles.

After understanding ELF, we know that an object file consists of multiple sections. Therefore, the linkers read sections from input files, and after some processing, output these sections to an output file. In this process, we can use linker script to tell the linker how to place these sections into the output file. We can use ld -T linker_script_file to specify a linker script.

Default Linker Script Used by GCC

In the Linux command line, we can execute ld –verbose to get the default linker script, as follows. When we use GCC to compile source files into an executable file, GCC will compile each source file into a relocatable file, whose file extension is .o. These relocatable files and libraries such as libc.so are then linked together into an executable file. When we do not specify a linker script, GCC will use this default linker script to link into an executable file.

/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2016 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/x86_64-redhat-linux/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/x86_64-redhat-linux/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib");
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }
  .rela.dyn       :
  {
      *(.rela.init)
      *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
      *(.rela.fini)
      *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*)
      *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*)
      *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*)
      *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*)
      *(.rela.ctors)
      *(.rela.dtors)
      *(.rela.got)
      *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*)
      *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*)
      *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*)
      *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*)
      *(.rela.ifunc)
  }
  .rela.plt       :
  {
      *(.rela.plt)
      PROVIDE_HIDDEN (__rela_iplt_start = .);
      *(.rela.iplt)
      PROVIDE_HIDDEN (__rela_iplt_end = .);
  }
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
  }
  .plt            : { *(.plt) *(.iplt) }
  .plt.got        : { *(.plt.got) }
  .plt.bnd        : { *(.plt.bnd) }
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  PROVIDE (__etext = .);
  PROVIDE (_etext = .);
  PROVIDE (etext = .);
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  .rodata1        : { *(.rodata1) }
  .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) }
  .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table .gcc_except_table.*) }
  .gnu_extab   : ONLY_IF_RO { *(.gnu_extab*) }
  /* These sections are generated by the Sun/Oracle C++ compiler.  */
  .exception_ranges   : ONLY_IF_RO { *(.exception_ranges .exception_ranges*) }
  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
  /* Exception handling  */
  .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gnu_extab      : ONLY_IF_RW { *(.gnu_extab) }
  .gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
  .exception_ranges   : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) }
  /* Thread Local Storage sections  */
  .tdata	  : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
  .tbss		  : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  }
  .init_array     :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array     :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }
  .jcr            : { KEEP (*(.jcr)) }
  .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) }
  .dynamic        : { *(.dynamic) }
  .got            : { *(.got) *(.igot) }
  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
  .got.plt        : { *(.got.plt)  *(.igot.plt) }
  .data           :
  {
    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)
  }
  .data1          : { *(.data1) }
  _edata = .; PROVIDE (edata = .);
  . = .;
  __bss_start = .;
  .bss            :
  {
    *(.dynbss)
    *(.bss .bss.* .gnu.linkonce.b.*)
    *(COMMON)
    /* Align here to ensure that the .bss section occupies space up to
       _end.  Align after .bss to ensure correct alignment even if the
       .bss section disappears because there are no input sections.
       FIXME: Why do we need it? When there is no .bss section, we don't
       pad the .data section.  */
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  .lbss   :
  {
    *(.dynlbss)
    *(.lbss .lbss.* .gnu.linkonce.lb.*)
    *(LARGE_COMMON)
  }
  . = ALIGN(64 / 8);
  . = SEGMENT_START("ldata-segment", .);
  .lrodata   ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.lrodata .lrodata.* .gnu.linkonce.lr.*)
  }
  .ldata   ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.ldata .ldata.* .gnu.linkonce.l.*)
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
  /* GNU DWARF 1 extensions */
  .debug_srcinfo  0 : { *(.debug_srcinfo) }
  .debug_sfnames  0 : { *(.debug_sfnames) }
  /* DWARF 1.1 and DWARF 2 */
  .debug_aranges  0 : { *(.debug_aranges) }
  .debug_pubnames 0 : { *(.debug_pubnames) }
  /* DWARF 2 */
  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
  .debug_abbrev   0 : { *(.debug_abbrev) }
  .debug_line     0 : { *(.debug_line .debug_line.* .debug_line_end ) }
  .debug_frame    0 : { *(.debug_frame) }
  .debug_str      0 : { *(.debug_str) }
  .debug_loc      0 : { *(.debug_loc) }
  .debug_macinfo  0 : { *(.debug_macinfo) }
  /* SGI/MIPS DWARF 2 extensions */
  .debug_weaknames 0 : { *(.debug_weaknames) }
  .debug_funcnames 0 : { *(.debug_funcnames) }
  .debug_typenames 0 : { *(.debug_typenames) }
  .debug_varnames  0 : { *(.debug_varnames) }
  /* DWARF 3 */
  .debug_pubtypes 0 : { *(.debug_pubtypes) }
  .debug_ranges   0 : { *(.debug_ranges) }
  /* DWARF Extension.  */
  .debug_macro    0 : { *(.debug_macro) }
  .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
}

Linker Script

We will introduce some linker script syntax. Please refer to Linker Scripts for other details. After understanding this, you should be able to understand the linker scripts listed above that GCC uses by default.

VMA & LMA

Each loadable and allocatable sections have two addresses, namely VMA and LMA. VMA (virtual memory address) is the address when the section runs. LMA (load memory address) is the address when the section is loaded.

In most cases, these two addresses are the same. An example where they are different is when the data is loaded into ROM and its address is LMA. Later, when it starts to run, it will be loaded into RAM, and its address at this time is VMA.

Defining Symbols

In a linker script, we can define symbols and specify the addresses of symbols. In the following example, we define a symbol called start, and its address is set to 0x7C00.

start = 0x7C00;

Location Counter

Location counter is a special symbol '.'. It refers to the address of the next output section. In a linker script, the starting value of the location counter is 0. When we output a section, the location counter will plus the size of the section.

SECTIONS – Mapping Input Sections to Output Sections

SECTIONS command is the most important command. It tells the linker how to map input sections to output sections, and how to place output sections in memory. The syntax is as follows.

SECTIONS
{
    SectionName1 [VMA] [(type)] : [AT(LMA)]
    {
        output-sections-command
    } [> region] [: phdr : phdr ...] [=fillexp]

    SectionName2 [address] [(type)] : [AT(LMA)]
    {
        output-sections-command
    } [> region] [: phdr : phdr ...] [=fillexp]

    ...
} 

In the SECTIONS command, we can define many sections. When defining an output section, you must specify its name and which input sections it contains. In the following example, we define an output section called .text, which will contain the .text sections of all input files. The * here refers to all input files. We can also specify sections of a file. As in the following line, we define an output section called .data, and it contains the .data section of the file data.o.

SECTIONS
{
    .text : { *(.text) }
    .data : { data.o(.data) }
}

In the above example, we did not specify the address of the output section, then it will use the address of the location counter. Therefore, the above example is equivalent to the following example. The linker will place the output sections in the output file sequentially according to the order in which the sections are defined in the linker script.

SECTIONS
{
    .text . : { *(.text) }
    .data . : { data.o(.data) }
}

We can also change the VMS of the output section. In the following example, we set the VMA of the output section .text by changing the address of the location counter.

SECTIONS
{
    . = 0x7C00;
    .text : { *(.text) }
}

Or, in the example below, we directly set the VMA of the output section .text.

SECTIONS
{
    .text 0x7C00 : { *(.text) }
}

If you understand the syntax of the SECTIONS command mentioned above, and look at the linker script used by GCC by default, you should be able to understand more than half of it. For details on other SECTIONS commands, please refer to Linker Script.

In addition, if we do not set LMA, the linker will use the value of VMA to set LMA. Since in most cases, VMA and LMA are the same, we only need to set the VMA.

PROVIDE – Defining Symbols If Not Defined Yet

If the symbol has not been defined yet, define the symbol. If the symbol has been defined in an input file, it is ignored.

SECTIONS
{
    PROVIDE (__etext = .);
    PROVIDE (_etext = .);
    PROVIDE (etext = .);
}

KEEP – Keeping Symbols

The KEEP command is used to tell the linker to keep symbols even if the symbols are not referenced.

SECTIONS
{
    .jcr : { KEEP (*(.jcr)) }
}

ENTRY – Setting Entry Point

ENTRY command can be used to set the entry point of executable file. Its syntax is as follows. Its parameter is a symbol name. In addition, we can also use ld -e symbol to set the entry point.

ENTRY(symbol)

INPUT – Specifying Input Files

INPUT command can be used to specify the input file, but we usually specify the input .o file after the ld command.

INPUT(file, file, ...)
INPUT(file file ...)

OUTPUT – Specifying Output Files

OUTPUT command can be used to specify the output file name, but we generally use ld -o file to specify the output file name. If both are used, ld -o file will be used first. In this case, we can use the OUTPUT command to specify the default output file name.

OUTPUT(filename)

OUTPUT_FORMAT – Setting the Output BFD Format

OUTPUT_FORMAT can be used to specify the BFD format of the output file. We can also use ld -oformat bfdname.

OUTPUT_FORMAT(bfdname)
OUTPUT_FORMAT(default, big, little)
// Example
OUTPUT_FORMAT(elf64-x86-64, elf64-x86-64, elf64-x86-64)

OUTPUT_ARCH – Setting the Output Machine Architecture

OUTPUT_ARCH can be used to specify the output machine architecture.

OUTPUT_ARCH(bfdarch)
// Example
OUTPUT_ARCH(i386:x86-64)

Built-in Functions

ALIGN – Aligning Location Counter

The ALIGN function will align the location counter value to the specified alignment and return the new value.

SECTIONS
{
    .text . : { *(.text) }
    .data ALIGN(0x8) : { *(.data) }
}

ADDR – Obtaining the Address of a Section

ADDR function returns the VMA of the section.

SECTIONS
{
    .text . : { *(.text) }
    .data ADDR(.text) + 0x200 : { *(.data) }
}

SIZEOF – Obtaining the Size of a Section

SIZEOF function returns the size of the section.

SECTIONS
{
    .text . : { *(.text) }
    .data ADDR(.text) + SIZEOF(.text) : { *(.data) }
}

Conclusion

Now when compiling the source code, GCC not only compiles the source code, but also links the object files into executable files. Therefore, although we know what the linking does, we do not understand its internal detail. By understanding linker script, we can understand more specifically what linking is doing.

Reference

  • ld, Linux manual page.
  • LD.
  • Using ld.
  • John R. Levine, Linkers and Loaders.
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like
Photo by Timothée Geenens on Unsplash
Read More

x86 Memory Map

After the x86 PC boots, it will be in real mode. At this time, we can access memory below 1 MB. However, the BIOS also uses some memory. Therefore, we must know which areas the BIOS occupies in order to avoid them.
Read More
Photo by Patrick on Unsplash
Read More

x86-64 Calling Conventions

Calling conventions refers to the specifications that the two functions should follow when one function calls another function. For example, how to pass parameters and a return value ​​between them. Calling conventions are part of the application binary interface (ABI).
Read More
Photo by Lanju Fotografie on Unsplash
Read More

Makefile

Makefile is the most commonly used compilation tool in Linux. Stuart Feldman created it at Bell Labs in 1967. Although it may be older than you and me, it is still active nowadays.
Read More