“What code model should I use?” - a frequently arising, but infrequently dealt with question when writing code for the x64 architecture. Nevertheless, this is a rather interesting problem, and it’s useful to have an idea of ​​the code models to understand the x64 machine code generated by compilers. In addition, for those who care about performance down to the smallest commands, choosing a code model also affects optimization.

Information on this topic on the network, or anywhere else, is rare. The most important resource available is the official x64 ABI, which can be downloaded at link (hereinafter referred to as “ABI”). Some of the information can also be found on the CDMY0CDMY pages of CDMY1CDMY. The purpose of this article is to provide accessible recommendations on the topic, discuss related issues, as well as demonstrate some concepts through the code used in the work.

Important Note: This article is not a tutorial for beginners. Before acquaintance, it is recommended that you have a good command of C and assembler, as well as basic familiarity with x64 architecture.



Also see our previous post on a similar topic: How x86_x64 addresses memory



Code Models. Motivational part


In the x64 architecture, both code and data are sent via command-relative (or, using x64 jargon, RIP-relative) addressing models. In these commands, the shift from RIP is limited to 32 bits, however, there may be cases when a command when trying to address part of the memory or data simply does not have a 32-bit shift, for example, when working with programs more than two gigabytes.

One way to solve this problem is to completely abandon the RIP-relative addressing mode in favor of a full 64-bit shift for all data and code references. However, this step will be very expensive: to cover the (rather rare) case of incredibly large programs and libraries, even the simplest operations within the framework of generally all the code will require more than usual number of commands.

Thus, code models become a compromise. [1] A code model is a formal agreement between a programmer and a compiler in which a programmer indicates his intentions regarding the size of the expected program (or programs) into which the object module being compiled will be placed. [2] Code models are needed so that the programmer can tell the compiler: “Don’t worry, this object module will go only to small programs, so you can use fast RIP-relative addressing modes”. On the other hand, he can tell the compiler the following: “we are going to compile this module into large programs, so please use leisurely and safe absolute addressing modes with a full 64-bit shift.”

What this article will talk about


We will talk about the two scenarios described above, a small code model and a large code model: the first model tells the compiler that a 32-bit relative shift should be enough for all references to the code and data in the object module; the second insists that the compiler uses absolute 64-bit addressing modes. In addition, there is also an intermediate version, the so-called middle code model .

Each of these code models is represented in independent PIC and non-PIC variations, and we will talk about each of the six.

Original C example


To demonstrate the concepts discussed in this article, I will use the C program below and compile it with various code models. As you can see, the CDMY2CDMY function gains access to four different global arrays and one global function. Arrays differ in two parameters: size and visibility. Size is important for explaining the average code model and will not be needed to work with small and large models. Visibility is important for the operation of PIC code models and can be either static (visible only in the source file) or global (visible to all objects compiled into the program).

int global_arr[100]={2, 3}; static int static_arr[100]={9, 7}; int global_arr_big[50000]={5, 6}; static int static_arr_big[50000]={10, 20}; int global_func(int param) { return param * 10; } int main(int argc, const char* argv[]) { int t=global_func(argc); t += global_arr[7]; t += static_arr[7]; t += global_arr_big[7]; t += static_arr_big[7]; return t; } 

CDMY3CDMY uses the code model as the value of the CDMY4CDMY option. In addition, the PIC compilation can be set with the CDMY5CDMY flag.

An example of compilation into an object module through a large code model using PIC:

> gcc -g -O0 -c codemodel1.c -fpic -mcmodel=large -o codemodel1_large_pic.o 

Small code model


Translation of a quotation from man gcc on a small code model:

-mcmodel=small
Code generation for a small model: the program and its symbols must be arranged in the lower two gigabytes of address space. Pointers are 64 bits in size. Programs can be built both statically and dynamically. This is the basic code model.


In other words, the compiler can safely assume that code and data are accessible through a 32-bit RIP relative shift from any command in the code. Let's take a look at a disassembled example of a C program that we compiled through a non-PIC small code model:

> objdump -dS codemodel1_small.o [...] int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 45: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 48: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 4e: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 51: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 57: 01 45 fc add %eax,-0x4(%rbp) return t; 5a: 8b 45 fc mov -0x4(%rbp),%eax } 5d: c9 leaveq 5e: c3 retq 

As you can see, access to all arrays is organized in the same way - using the RIP-relative shift. However, the shift in the code is 0, because the compiler does not know where the data segment will be located, so for each such access it creates a relocation:

> readelf -r codemodel1_small.o Relocation section '.rela.text' at offset 0x62bd8 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000002f 001500000002 R_X86_64_PC32 0000000000000000 global_func - 4 000000000038 001100000002 R_X86_64_PC32 0000000000000000 global_arr + 18 000000000041 000300000002 R_X86_64_PC32 0000000000000000.data + 1b8 00000000004a 001200000002 R_X86_64_PC32 0000000000000340 global_arr_big + 18 000000000053 000300000002 R_X86_64_PC32 0000000000000000.data + 31098 

Let us, as an example, fully decode access to CDMY6CDMY. Disassembled segment of interest to us:

t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) 

RIP-relative addressing is relative to the next command, so the shift must be patched into the CDMY7CDMY command so that it corresponds to 0x3s. We are interested in the second relocation, CDMY8CDMY, it points to the operand CDMY9CDMY at the address CDMY10CDMY and means the following: we take the symbol value, add the term and subtract the shift indicated by the relocation. If you calculated everything correctly, you will see how the result places a relative shift between the next team and CDMY11CDMY, plus CDMY12CDMY. Since CDMY13CDMY means "the seventh int in the array" (in the x64 architecture, the size of each CDMY14CDMY is 4 bytes), we need this relative shift. Thus, using RIP-relative addressing, the command correctly references CDMY15CDMY.

It is also interesting to note the following: even though the access commands to CDMY16CDMY are similar here, its forwarding uses a different character, thereby pointing to the CDMY17CDMY section instead of a specific character. This is due to the actions of the linker, it places a static array in a known place in the section, and thus the array cannot be used in conjunction with other shared libraries. As a result, the linker will resolve the situation with this relocation. On the other hand, since CDMY18CDMY can be used (or overwritten) by another shared library, the already dynamic loader will need to deal with the link to CDMY19CDMY. [3]

Finally, let's take a look at the link to CDMY20CDMY:

int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) 

Since the operand CDMY21CDMY is also RIP-relative, the relocation CDMY22CDMY works here similarly to placing the actual relative shift to global_func in the operand.

In conclusion, we note that due to the small code model, the compiler perceives all the data and code of the future program as accessible through a 32-bit shift, and thereby creates simple and efficient code to access all kinds of objects.

Large Code Model


Translation of a quote from CDMY23CDMY CDMY24CDMY on the topic of a large code model:

-mcmodel=large
Generating code for a large model: This model makes no assumptions about addresses and section sizes.

Example of disassembled CDMY25CDMY code compiled using a non-PIC large model:

int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 35: 00 00 00 38: ff d2 callq *%rdx 3a: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 44: 00 00 00 47: 8b 40 1c mov 0x1c(%rax),%eax 4a: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 4d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 54: 00 00 00 57: 8b 40 1c mov 0x1c(%rax),%eax 5a: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 5d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 64: 00 00 00 67: 8b 40 1c mov 0x1c(%rax),%eax 6a: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 6d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 74: 00 00 00 77: 8b 40 1c mov 0x1c(%rax),%eax 7a: 01 45 fc add %eax,-0x4(%rbp) return t; 7d: 8b 45 fc mov -0x4(%rbp),%eax } 80: c9 leaveq 81: c3 retq 

Once again, it’s useful to look at the relocations:

Relocation section '.rela.text' at offset 0x62c18 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000030 001500000001 R_X86_64_64 0000000000000000 global_func + 0 00000000003f 001100000001 R_X86_64_64 0000000000000000 global_arr + 0 00000000004f 000300000001 R_X86_64_64 0000000000000000.data + 1a0 00000000005f 001200000001 R_X86_64_64 0000000000000340 global_arr_big + 0 00000000006f 000300000001 R_X86_64_64 0000000000000000.data + 31080 

Since there is no need to make assumptions about the size of code sections and data, the large code model is fairly unified and identifies access to all data in the same way. Let's take a look at CDMY26CDMY again:

t += global_arr[7]; 3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 44: 00 00 00 47: 8b 40 1c mov 0x1c(%rax),%eax 4a: 01 45 fc add %eax,-0x4(%rbp) 

Two teams need to get the desired value from the array. The first command places the absolute 64-bit address in CDMY27CDMY, which, as we will soon see, will be the address CDMY28CDMY, while the second command loads the word from CDMY29CDMY into CDMY30CDMY.

So let's focus on the team at CDMY31CDMY, CDMY32CDMY, the absolute 64-bit version of CDMY33CDMY in the x64 architecture.It can drop the full 64-bit constant directly into the register, and since in our disassembled code the value of this constant is zero, we will have to refer to the relocation table for the answer. In it we will find the absolute relocation CDMY34CDMY for the operand at address CDMY35CDMY, with the following value: placing the value of the symbol plus the term back in the shift. In other words, CDMY36CDMY will contain the absolute address CDMY37CDMY.

What about the call function?

int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 35: 00 00 00 38: ff d2 callq *%rdx 3a: 89 45 fc mov %eax,-0x4(%rbp) 

The CDMY38CDMY command we already know is followed by the CDMY39CDMY command, which calls the function at the address in CDMY40CDMY. Just look at the corresponding relocation to see how much it resembles data access.

As you can see, in the large code model there are no assumptions about the size of code and data sections, as well as about the final arrangement of characters, it simply refers to characters through absolute 64-bit steps, a kind of “safe track”. However, note how, compared to a small code model, a large model is forced to use an additional command when accessing each character. That's the price of security.

So, we met with two completely opposite models: while the small model of the code assumes that everything fits in the lower two gigabytes of memory, the large model assumes that nothing is impossible and any character can be anywhere in full 64- bit address space. The tradeoff between the two models is the middle code model.

Medium Code Model


As before, let's take a look at the translation of a quote from CDMY41CDMY CDMY42CDMY:

-mcmodel=medium
Code generation for the middle model: The program is arranged in the lower two gigabytes of address space. Small symbols are also located here. Characters larger than specified through -mlarge-data-threshold fall into more data or bss sections and may be above two gigabytes. Programs can be built both statically and dynamically.

Similar to the small code model, the middle model assumes that the entire code is arranged in two lower gigabytes. However, the data is divided into “small data” supposedly arranged in the lower two gigabytes and unlimited in memory “big data”. Data falls into the large category when they exceed a limit of 64 kilobytes by definition.

It is also important to note that when working with the middle code model for big data, by analogy with the CDMY43CDMY and CDMY44CDMY sections, special sections are created: CDMY45CDMY and CDMY46CDMY. This is not so important in the prism of the topic of the current article, however I am going to deviate a little from it. This issue can be found in more detail at ABI.

Now it becomes clear why the CDMY47CDMY arrays appeared in the example: the middle model needs them to interpret the "big data" that they are, with a size of 200 kilobytes each. Below you can see the result of disassembly:

int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 45: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 48: 48 b8 00 00 00 00 00 movabs $0x0,%rax 4f: 00 00 00 52: 8b 40 1c mov 0x1c(%rax),%eax 55: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 58: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5f: 00 00 00 62: 8b 40 1c mov 0x1c(%rax),%eax 65: 01 45 fc add %eax,-0x4(%rbp) return t; 68: 8b 45 fc mov -0x4(%rbp),%eax } 6b: c9 leaveq 6c: c3 retq 

Pay attention to how access to arrays is conducted: access to arrays CDMY48CDMY goes through methods of a large code model, while access to other arrays goes through methods of a small model. The function is also called using the small code model method, and the relocations are so similar to the previous examples that I will not even demonstrate them.

The middle code model is a skillful compromise between the large and small models. It is unlikely that the program code will turn out to be too large [4], so moving large chunks of data statically arranged into it beyond the two gigabyte limit is possible, possibly as part of some voluminous table search. Since the average code model filters out such large chunks of data and processes them in a special way, calls to functions and small characters by the code will be just as effective as in the small code model. Only calls to large characters, by analogy with a large model, will require the code to use the full 64-bit method of the large model.

Small PIC Code Model


Now let's look at the PIC code model options, and as before, we will start with a small model.[5] Below you can see an example of code compiled through a small PIC model:

int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 3d: 8b 40 1c mov 0x1c(%rax),%eax 40: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 43: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 49: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 4c: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 53: 8b 40 1c mov 0x1c(%rax),%eax 56: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 59: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 5f: 01 45 fc add %eax,-0x4(%rbp) return t; 62: 8b 45 fc mov -0x4(%rbp),%eax } 65: c9 leaveq 66: c3 retq 

Relocation:

Relocation section '.rela.text' at offset 0x62ce8 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000002f 001600000004 R_X86_64_PLT32 0000000000000000 global_func - 4 000000000039 001100000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4 000000000045 000300000002 R_X86_64_PC32 0000000000000000.data + 1b8 00000000004f 001200000009 R_X86_64_GOTPCREL 0000000000000340 global_arr_big - 4 00000000005b 000300000002 R_X86_64_PC32 0000000000000000.data + 31098 

Since the differences between big and small data do not play any role in a small code model, we will focus on the points that are important when generating code through PIC: the differences between local (static) and global characters.

As you can see, there is no difference between the code generated for static arrays and the code in the non-PIC case. This is one of the advantages of x64 architecture: thanks to IP-relative access to data, we get a PIC as a bonus, at least until external access to characters is required. All commands and relocations remain the same, so you don’t need to process them again.

It is interesting to pay attention to global arrays: it is worth recalling that in PIC global data must pass through the GOT, because at some point they can be stored, or shared, by shared libraries [6]. Below you can see the code to access CDMY49CDMY:

t += global_arr[7]; 36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 3d: 8b 40 1c mov 0x1c(%rax),%eax 40: 01 45 fc add %eax,-0x4(%rbp) 

The relocation we are interested in is CDMY50CDMY: the position of the entry of the symbol in the GOT plus the term, minus the shift for applying the relocation. In other words, the relative shift between the RIP (next instruction) and the reserved for CDMY51CDMY in the GOT slot is patched into the command. Thus, in CDMY52CDMY, the actual address CDMY53CDMY is placed in the command at address CDMY54CDMY. Following this step, the link to CDMY55CDMY is reset, plus a shift to its seventh element in CDMY56CDMY.

Now let's take a look at the function call:

int t=global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) 

It has the relocation of the operand CDMY57CDMY at the address CDMY58CDMY, CDMY59CDMY: the PLT address of the input for the symbol plus the term, minus the shift for applying the relocation. In other words, CDMY60CDMY must correctly call the PLT springboard for CDMY61CDMY.

Note what implicit assumptions the compiler makes: that GOT and PLT can be accessed through RIP-relative addressing. This will be important when comparing this model with other PIC versions of code models.

Large PIC Code Model


Disassembly:

int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 53 push %rbx 1a: 48 83 ec 28 sub $0x28,%rsp 1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx 25: 49 bb 00 00 00 00 00 movabs $0x0,%r11 2c: 00 00 00 2f: 4c 01 db add %r11,%rbx 32: 89 7d dc mov %edi,-0x24(%rbp) 35: 48 89 75 d0 mov %rsi,-0x30(%rbp) int t=global_func(argc); 39: 8b 45 dc mov -0x24(%rbp),%eax 3c: 89 c7 mov %eax,%edi 3e: b8 00 00 00 00 mov $0x0,%eax 43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 4a: 00 00 00 4d: 48 01 da add %rbx,%rdx 50: ff d2 callq *%rdx 52: 89 45 ec mov %eax,-0x14(%rbp) t += global_arr[7]; 55: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5c: 00 00 00 5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax 63: 8b 40 1c mov 0x1c(%rax),%eax 66: 01 45 ec add %eax,-0x14(%rbp) t += static_arr[7]; 69: 48 b8 00 00 00 00 00 movabs $0x0,%rax 70: 00 00 00 73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 77: 01 45 ec add %eax,-0x14(%rbp) t += global_arr_big[7]; 7a: 48 b8 00 00 00 00 00 movabs $0x0,%rax 81: 00 00 00 84: 48 8b 04 03 mov (%rbx,%rax,1),%rax 88: 8b 40 1c mov 0x1c(%rax),%eax 8b: 01 45 ec add %eax,-0x14(%rbp) t += static_arr_big[7]; 8e: 48 b8 00 00 00 00 00 movabs $0x0,%rax 95: 00 00 00 98: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 9c: 01 45 ec add %eax,-0x14(%rbp) return t; 9f: 8b 45 ec mov -0x14(%rbp),%eax } a2: 48 83 c4 28 add $0x28,%rsp a6: 5b pop %rbx a7: c9 leaveq a8: c3 retq 

Relocation:

CDMY62CDMY
This time, the differences between big and small data are still irrelevant, so we will focus on CDMY63CDMY and CDMY64CDMY. But first, you need to pay attention to the prolog in this code, previously we did not encounter this:

1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx 25: 49 bb 00 00 00 00 00 movabs $0x0,%r11 2c: 00 00 00 2f: 4c 01 db add %r11,%rbx 

Below you can read the translation of the related quote from ABI:

In the small code model, all addresses (including GOT) can be accessed via AMD64-provided IP-relative addressing. That is why there is no need for an explicit GOT pointer, and thus there is no need for a prolog to set it in a function. In large and medium code models, it is necessary to define a register for storing the GOT address in location-independent objects, since the AMD64 ISA does not support instantaneous movement larger than 32 bits.

Let's take a look at how the prolog described above calculates the GOT address. First, the command at CDMY65CDMY loads its own address into CDMY66CDMY. Then, together with the relocation CDMY67CDMY, an absolute 64-bit step is performed in CDMY68CDMY. This relocation means the following: take the GOT address, subtract the shifted shift, and add the term. Finally, the command at CDMY69CDMY adds both results together. The result is the absolute GOT address in CDMY70CDMY. [7]

Why bother with calculating the GOT address? Firstly, as noted in the quote, in a large code model, we cannot assume that a 32-bit RIP-relative shift will be enough for GOT addressing, which is why we need a full 64-bit address. Secondly, we still want to work with PIC variation, so we cannot simply put the absolute address in the register. Rather, the address itself should be calculated relative to the RIP. For this, a prologue is needed: it performs a 64-bit RIP-relative calculation.

In any case, since we now have a GOT address in CDMY71CDMY, let's look at how to access CDMY72CDMY:

t += static_arr[7]; 69: 48 b8 00 00 00 00 00 movabs $0x0,%rax 70: 00 00 00 73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 77: 01 45 ec add %eax,-0x14(%rbp) 

The relocation of the first command is CDMY73CDMY: symbol plus term minus GOT. In our case, this is a relative shift between the CDMY74CDMY address and the GOT address.The following instruction adds the result to CDMY75CDMY (absolute GOT address) and resets the link with a shift by CDMY76CDMY. For ease of visualization of such a calculation, see the pseudo-C example below:

//char* static_arr//char* GOT rax=static_arr + 0 - GOT;//rax now contains an offset eax=*(rbx + rax + 0x1c);//rbx == GOT, so eax now contains//*(GOT + static_arr - GOT + 0x1c) or//*(static_arr + 0x1c) 

Pay attention to an interesting point: the GOT address is used as a binding to CDMY77CDMY. Typically, a GOT does not contain the address of a character, and since CDMY78CDMY is not an external character, there is no reason to store it inside a GOT. However, in this case, the GOT is used as a binding to the relative symbol address of the data section. This address, which, among other things, is location independent, can be found with a full 64-bit shift. The linker is able to handle this relocation, so there is no need to modify the code section at boot time.

But what about CDMY79CDMY?

t += global_arr[7]; 55: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5c: 00 00 00 5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax 63: 8b 40 1c mov 0x1c(%rax),%eax 66: 01 45 ec add %eax,-0x14(%rbp) 

This code is slightly longer, and the relocation is different from the usual one. In fact, GOT is used in a more traditional way: the relocation CDMY80CDMY for CDMY81CDMY only tells the function to place the shift in the GOT, where CDMY82CDMY is located in CDMY83CDMY. The command at address CDMY84CDMY takes the address CDMY85CDMY from the GOT and places it in CDMY86CDMY. The following command resets the link to CDMY87CDMY and places the value in CDMY88CDMY.

Now let's take a look at the code link for CDMY89CDMY. Recall that in a large code model, we could not make assumptions about the size of code sections, so we should assume that even for access to PLT we need an absolute 64-bit address:

int t=global_func(argc); 39: 8b 45 dc mov -0x24(%rbp),%eax 3c: 89 c7 mov %eax,%edi 3e: b8 00 00 00 00 mov $0x0,%eax 43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 4a: 00 00 00 4d: 48 01 da add %rbx,%rdx 50: ff d2 callq *%rdx 52: 89 45 ec mov %eax,-0x14(%rbp) 

The relocation we are interested in is CDMY90CDMY: PLT entry address for CDMY91CDMY minus the GOT address. The result is placed in CDMY92CDMY, where CDMY93CDMY (absolute GOT address) is then placed. As a result, we get the PLT input address for CDMY94CDMY in CDMY95CDMY.

Note that again the GOT is used as a binding, this time to provide an address-independent reference to the shift of the input PLT.

Middle PIC Code Model


Finally, we will analyze the code generated for the average PIC model:

int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 53 push %rbx 1a: 48 83 ec 28 sub $0x28,%rsp 1e: 48 8d 1d 00 00 00 00 lea 0x0(%rip),%rbx 25: 89 7d dc mov %edi,-0x24(%rbp) 28: 48 89 75 d0 mov %rsi,-0x30(%rbp) int t=global_func(argc); 2c: 8b 45 dc mov -0x24(%rbp),%eax 2f: 89 c7 mov %eax,%edi 31: b8 00 00 00 00 mov $0x0,%eax 36: e8 00 00 00 00 callq 3b <main+0x26> 3b: 89 45 ec mov %eax,-0x14(%rbp) t += global_arr[7]; 3e: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 45: 8b 40 1c mov 0x1c(%rax),%eax 48: 01 45 ec add %eax,-0x14(%rbp) t += static_arr[7]; 4b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 51: 01 45 ec add %eax,-0x14(%rbp) t += global_arr_big[7]; 54: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 5b: 8b 40 1c mov 0x1c(%rax),%eax 5e: 01 45 ec add %eax,-0x14(%rbp) t += static_arr_big[7]; 61: 48 b8 00 00 00 00 00 movabs $0x0,%rax 68: 00 00 00 6b: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 6f: 01 45 ec add %eax,-0x14(%rbp) return t; 72: 8b 45 ec mov -0x14(%rbp),%eax } 75: 48 83 c4 28 add $0x28,%rsp 79: 5b pop %rbx 7a: c9 leaveq 7b: c3 retq 

Relocation:

Relocation section '.rela.text' at offset 0x62d60 contains 6 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000021 00160000001a R_X86_64_GOTPC32 0000000000000000 _GLOBAL_OFFSET_TABLE_ - 4 000000000037 001700000004 R_X86_64_PLT32 0000000000000000 global_func - 4 000000000041 001200000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4 00000000004d 000300000002 R_X86_64_PC32 0000000000000000.data + 1b8 000000000057 001300000009 R_X86_64_GOTPCREL 0000000000000000 global_arr_big - 4 000000000063 000a00000019 R_X86_64_GOTOFF64 0000000000030d40 static_arr_big + 0 

First, let's remove the function call. Similarly to the small model, in the middle model we assume that the code references do not exceed the limits of the 32-bit RIP shift, therefore, the code for calling CDMY96CDMY is completely similar to the same code in the small PIC model, as well as for the cases of small data arrays CDMY97CDMY and CDMY98CDMY. Therefore, we will focus on big data arrays, but first let's talk about the prologue: here it differs from the prologue of the big data model.

1e: 48 8d 1d 00 00 00 00 lea 0x0(%rip),%rbx 

This is the whole prologue: in order to use the CDMY99CDMY relocation to place the GOT address in CDMY100CDMY, only one command was required (compared to three in the large model). What is the difference? The fact is that since in the middle model the GOT is not part of the "big data sections", we assume that it is available within the 32-bit shift. In the large model, we could not make such assumptions, and were forced to use the full 64-bit shift.

Of interest is the fact that the code for accessing the CDMY101CDMY is similar to the same code in the small PIC model. This happens for the same reason that the prologue of the middle model is shorter than the prologue of the large model: we consider the availability of GOT as part of the 32-bit RIP-relative addressing. Indeed, it is impossible to get such access to CDMY102CDMY itself, but this case still covers the GOT, since actually CDMY103CDMY is located in it, moreover, in the form of a full 64-bit address.

The situation, however, is different for CDMY104CDMY:

t += static_arr_big[7]; 61: 48 b8 00 00 00 00 00 movabs $0x0,%rax 68: 00 00 00 6b: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 6f: 01 45 ec add %eax,-0x14(%rbp) 

This case is similar to the large PIC model of the code, because here we still get the absolute address of the character, which is not in the GOT itself. Since this is a large character, which cannot be assumed to be in the lower two gigabytes, we, like in the large model, need a 64-bit PIC shift.

Notes:


[1] Do not confuse code models with 64-bit data models and Intel memory models , these are all different topics.

[2] It is important to remember: the compiler creates the commands themselves, and the addressing modes are fixed at this step.The compiler cannot know which programs or shared libraries the object module will fall into, some may be small, while others may be large. The linker knows the size of the final program, but it's too late: the linker can only patch the shift of commands with relocation, and not change the commands themselves. Thus, the “agreement” of the code model must be “signed” by the programmer at the compilation stage.

[3] If something remains unclear, check out the next article .

[4] However, volumes are gradually increasing. The last time I checked Debug + Asserts Clang build, it almost reached one gigabyte, for which many thanks to the auto-generated code.

[5] If you still do not know how PIC works (both in general and in particular for x64 architecture), it’s time to read the following related articles: times and two .

[6] Thus, the linker cannot resolve the links on its own, and is forced to shift the GOT processing to the dynamic loader.

[7] 0x25 - 0x7 + GOT - 0x27 + 0x9=GOT



ITKarma picture.

Source