Skip to content

Commit 4b07e31

Browse files
authored
Start expanding the memory management part (#76)
* Update 01_Overview.md * Update MM 01_Overview.md typo fix * Expoand direct map and recursion in paging chapter * Typo fixes (feeling like the Captain Julio Sham) * Changes requested in review * changes requested in review
1 parent e1b3f11 commit 4b07e31

40 files changed

Lines changed: 507 additions & 464 deletions

01_Build_Process/01_Overview.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Both C and C++ have several freestanding headers. The common ones are `stdint.h`
3434

3535
## Cross Compilation
3636

37-
Often this is not necessary for hobby os projects, as we are running our code on the same cpu architecture that we're compiling on. However it's still recommended to use one as we can configure the cross compiler to the specs we want, rather than relying on the one provided by the host os.
37+
Often this is not necessary for hobby os projects, as we are running our code on the same cpu architecture that we're compiling on. However it's still recommended to use one as we can configure the cross compiler to the specs we want, rather than relying on the one provided by the host os.
3838

3939
A cross compiler is always required when building the os for a different cpu architecture. Building code for a `risc-v` cpu, while running on an `x86` cpu would require a cross compiler for example.
4040

@@ -81,7 +81,7 @@ If using clang be sure to remember to pass `--target=xyz` with each command. Thi
8181

8282
### Building C Source Files
8383
Now that we have a toolchain setup we can test it all works by compiling a C file.
84-
Create a C source file, it's contents don't matter here as we wont be running it, just telling it compiles.
84+
Create a C source file, its contents don't matter here as we wont be running it, just telling it compiles.
8585

8686
Run the following to compile the file into an object file, and then to link that into the final executable.
8787

@@ -131,21 +131,21 @@ The GCC Linker (`ld`) and the compatible clang linker (`lld.ld`) can accept link
131131
These describe the layout of the final executable to the linker: what things go where, with what alignment and permissions.
132132
This is incredibly important for a kernel, as it's the file that will be loaded by the bootloader, which may impose certain restrictions or provide certain features.
133133

134-
These are their own topic, and have a full chapter dedicated to them later in this chapter. We likely haven't used these when building userspace programs, as our compiler/os installation provides a default one. However since we're building a freestanding program (the kernel) now we need to be explicit about these things.
134+
These are their own topic, and have a full chapter dedicated to them later in this chapter. We likely haven't used these when building userspace programs, as our compiler/os installation provides a default one. However since we're building a freestanding program (the kernel) now we need to be explicit about these things.
135135

136136
A linker script can be simply added appending the `-T script_name_here.ld` to the linker command.
137137

138138
Outside of linker scripts, the linking process goes as following:
139139

140140
```sh
141-
$(LD) $(OBJS) -o output_filename_here.elf
141+
$(LD) $(OBJS) -o output_filename_here.elf
142142
-nostdlib -static -pie --no-dynamic-linker
143143
```
144144

145145
For an explanation of the above linker flags used:
146146

147147
- `-nostdlib`: this is crucial for building a freestanding program, as it stops the linker automatically including the default libraries for the host platform. Otherwise the program will contain a bunch of code that wants to make syscalls to the host OS.
148-
- `-static`: A safeguard for linking against other libraries. The linker will error if we try to dynamically link with anything (i.e static linking only). Because again there is no runtime, there is no dynamic linker.
148+
- `-static`: A safeguard for linking against other libraries. The linker will error if we try to dynamically link with anything (i.e static linking only). Because again there is no runtime, there is no dynamic linker.
149149
- `-pie` and `--no-dynamic-linker`: Not strictly necessary, but forces the linker to output a relocatable program with a very narrow set of relocations. This is useful as it allows some bootloaders to perform relocations on the kernel.
150150

151151
One other linker option to keep in mind is `-M`, which displays the link map that was generated. This is a description of how and where the linker allocated everything in the final file. It can be seen as a manual symbol table.
@@ -156,13 +156,13 @@ Now compiling and building one file isn't so bad, but the same process for multi
156156

157157
_Make_ is a common tool used for building many pieces of software due to how easy and common `make` is. Specifically GNU make. GNU make is also chosen as it comes installed by default in many linux distros, and is almost always available if it's not already installed.
158158

159-
There are other make-like tools out there (xmake, nmake) but these are less popular, and therefore less standardized. For the lowest common denominator we'll stick with the original GNU make, which is discussed later on in it's chapter.
159+
There are other make-like tools out there (xmake, nmake) but these are less popular, and therefore less standardized. For the lowest common denominator we'll stick with the original GNU make, which is discussed later on in its chapter.
160160

161161
## Quick Addendum: Easily Generating a Bootable Iso
162162

163163
There are more details to this, however most bootloaders will provide a tool that lets us create a bootable iso, with the kernel, the bootloader itself and any other files we might want. For grub this is `grub-mkrescue` and limine provides `limine-install` for version 2.x or `limine-deploy` for version 3.x.
164164

165-
While the process of generating an iso is straightforward enough when using something like xorisso, the process of installing a bootloader into that iso is usually bootloader dependent. This is covered more in detail in it's own chapter.
165+
While the process of generating an iso is straightforward enough when using something like xorisso, the process of installing a bootloader into that iso is usually bootloader dependent. This is covered more in detail in its own chapter.
166166

167167
If just here for a quick reference, grub uses `grub-mkrescue` and a `grub.cfg` file, limine reqiures us to build the iso by yourselves with a `limine.cfg` on it, and then run `limine-deploy`.
168168

@@ -202,7 +202,7 @@ There are a few other qemu flags we might want to be aware of:
202202

203203
We'll never know when we need to debug your kernel, especially when running in a virtualized environment. Having debug symbols included in the kernel will increase the file size, but can be useful. If we want to remove them from an already compiled kernel the `strip` program can be used to strip excess info from a file.
204204

205-
Including debug info in the kernel is the same as any other program, simply compile with the `-g` flag.
205+
Including debug info in the kernel is the same as any other program, simply compile with the `-g` flag.
206206

207207
There are different versions of DWARF (the debugging format used by elf files), and by default the compiler will use the most recent one for our target platform. However this can be overridden and the compiler can be forced to use a different debug format (if needed). Sometimes there can be issues if the debugger is from a different vendor to our compiler, or is much older.
208208

@@ -235,7 +235,7 @@ Next we'll want to find the `.symtab` section header, who's contents are an arra
235235

236236
Now to get the name of a section, we'll need to find the matching symbol entry, which will give us the offset of the associated string in the string table. With that we can now access mostly human-readable names for our kernel.
237237

238-
Languages built around the C model will usually perform some kind of name mangling to enable features like function overloading, namespaces and so on. This is a whole topic on it's own. Name mangling can be through of as a translation that takes place, to allow things like function overloading and templates to work in the C naming model.
238+
Languages built around the C model will usually perform some kind of name mangling to enable features like function overloading, namespaces and so on. This is a whole topic on its own. Name mangling can be through of as a translation that takes place, to allow things like function overloading and templates to work in the C naming model.
239239

240240
### Locating The Symbol Table
241241

@@ -263,7 +263,7 @@ const char* name = strtab_data[example_shdr->sh_name];
263263

264264
Now all that's left is a function that parses the symbol table. It's important to note that some symbols only occupy a single address, like a label or a variable, while others will occupy a range of addresses. Fortunately symbols have a size field.
265265

266-
An example function is included below, showing how a symbol can be looked up by it's address. The name of this symbol is then printed, using a fictional `print` function.
266+
An example function is included below, showing how a symbol can be looked up by its address. The name of this symbol is then printed, using a fictional `print` function.
267267

268268
```c
269269
Elf64_Shdr* sym_tab;
@@ -280,12 +280,12 @@ void print_symbol(uint64_t addr)
280280

281281
if (addr < syms[i].st_value || addr > sym_top)
282282
continue;
283-
283+
284284
//addr is inside of syms[i], let's print the symbol name
285285
print(strtab_data[syms[i].st_name]);
286286
return;
287287
}
288288
}
289289
```
290290
291-
A quick note about getting the symbol table data address: On multiboot `sym_tab->sh_offset` will be the physical address of the data, while stivale2 will return the original value, which is an offset from the beginning of the file. This means for stivale 2 we would add `file_tag->kernel_base` to this address to get it's location in memory.
291+
A quick note about getting the symbol table data address: On multiboot `sym_tab->sh_offset` will be the physical address of the data, while stivale2 will return the original value, which is an offset from the beginning of the file. This means for stivale 2 we would add `file_tag->kernel_base` to this address to get its location in memory.

01_Build_Process/02_Boot_Protocols.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Both protocols have their earlier versions (_multiboot 1 & stivale 1_), but thes
2020

2121
It's a fair question. In the world of testing on qemu/bochs/vmware/vbox, its easy to write a bootloader directly against UEFI or BIOS. Things get more complicated on real hardware though.
2222

23-
Unlike CPUs, where the manufacturers follow the spec exactly and everything works as described, manufacturers of PCs generally follow *most* of the specs, with every machine having its minor caveats. Some assumptions can't be assumed everywhere, and some machines sometimes outright break spec. This leads to a few edge cases on some machines, and more or less on some others. It's a big mess.
23+
Unlike CPUs, where the manufacturers follow the spec exactly and everything works as described, manufacturers of PCs generally follow *most* of the specs, with every machine having its minor caveats. Some assumptions can't be assumed everywhere, and some machines sometimes outright break spec. This leads to a few edge cases on some machines, and more or less on some others. It's a big mess.
2424

2525
This is where a bootloader comes in: a layer of abstraction between the kernel and the mess of PC hardware. It provides a boot protocol (often many we can choose from), and then ensures that everything in the hardware world is setup to allow that protocol to function. This is until the kernel has enough drivers set up to take full control of the hardware itself.
2626

@@ -40,7 +40,7 @@ One of the major differences between the two protocols is how info is passed bet
4040
- _Multiboot 2_ uses a fixed sized header that includes a `size` field, which contains the _number of bytes of the header + all of the following requests_. Each request contains an `identifier` field and then some request specific fields. This has slightly more overhead, but is more flexible. The requests are terminated with a special `null request` (see the specs on this).
4141

4242
- _Multiboot 1_ returns info to the kernel via a single large structure, with a bitmap indicating which sections of the structure are considered valid.
43-
- _Multiboot 2_ returns a pointer to a series of tags. Each tag has an `identifier` field, used to determine it's contents, and a size field that can be used to calculate the address of the next tag. This list is also terminated with a special `null` tag.
43+
- _Multiboot 2_ returns a pointer to a series of tags. Each tag has an `identifier` field, used to determine its contents, and a size field that can be used to calculate the address of the next tag. This list is also terminated with a special `null` tag.
4444

4545
One important note about multiboot 2: the memory map is essentially the map given by the bios/uefi. The areas used by bootloader memory (like the current gdt/idt), kernel and info structure given to the kernel are all allocated in *free* regions of memory. The specification does not say that these regions must then be marked as *used* before giving the memory map to the kernel. This is actually how grub handles this, so should definitely do a sanity check on the memory map.
4646

@@ -62,10 +62,10 @@ Since we have enabled paging, we'll also need to populate `cr3` with a valid pag
6262

6363
We will be operating in compatibility mode, a subset of long mode that pretends to be a protected mode cpu. This is to allow legacy programs to run in long mode. However we can enter full 64-bit long mode by reloading the CS register with a far jump or far return. See the [GDT notes](../GDT.md) for details on doing that.
6464

65-
It's worth noting that this boot shim will need it's own linker sections for code and data, since until we have entered long mode the higher half sections used by the rest of the kernel won't be available, as we have no memory at those addresses yet.
65+
It's worth noting that this boot shim will need its own linker sections for code and data, since until we have entered long mode the higher half sections used by the rest of the kernel won't be available, as we have no memory at those addresses yet.
6666

6767
### Creating a Multiboot 2 Header
68-
Multiboot 2 has a header available at the bottom of it's specification that we're going to use here.
68+
Multiboot 2 has a header available at the bottom of its specification that we're going to use here.
6969

7070
We'll need to modify our linker script a little since we boot up in protected mode, with no virtual memory:
7171

@@ -77,7 +77,7 @@ SECTIONS
7777
KERNEL_START = .;
7878
KERNEL_VIRT_BASE = 0xFFFFFFFF8000000;
7979
80-
.mb2_hdr :
80+
.mb2_hdr :
8181
{
8282
/* Be sure that the multiboot2 header is at the beginning */
8383
KEEP(*(.mb2_hdr))
@@ -111,7 +111,7 @@ SECTIONS
111111
}
112112
```
113113

114-
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xFFFF'FFFF'8000'0000` lower than it's virtual address.
114+
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xFFFF'FFFF'8000'0000` lower than its virtual address.
115115

116116
However the first two sections are both loaded and linked at lower memory addresses. The first is our multiboot header, this is just static data, it doesn't really matter where it's loaded, as long as it's in the final file somewhere. The second section contains our protected mode boot shim: a small bit of code that sets up paging, and boots into long mode.
117117

@@ -145,7 +145,7 @@ mb2_framebuffer_end:
145145
mb2_hdr_end:
146146
```
147147

148-
A full boot shim is left as an exercise to the reader, we may want to do extra things before moving into long mode. Or may not, but a skeleton of what's required is provided below.
148+
A full boot shim is left as an exercise to the reader, we may want to do extra things before moving into long mode. Or may not, but a skeleton of what's required is provided below.
149149

150150
```x86asm
151151
.section .data
@@ -155,7 +155,7 @@ boot_stack_base:
155155
# backup the address of mb2 info struct, since ebx may be clobbered
156156
.section .mb_text
157157
mov %ebx, %edi
158-
158+
159159
# setup a stack, and reset flags
160160
mov $(boot_stack_base + 0x1000), %esp
161161
pushl $0x2
@@ -197,7 +197,7 @@ After performing the long-return (`lret`) we'll be running `target_function` in
197197

198198
Some of the things were glossed there, like paging and setting up a gdt, are explained in their own chapters.
199199

200-
We'll also want to pass the multiboot info structure to the kernel's main function.
200+
We'll also want to pass the multiboot info structure to the kernel's main function.
201201

202202
The interface between a higher level language like C and assembly (or another high level language) is called the ABI (application binary interface). This is discussed more in the chapter about C, but for now to pass a single `uint64_t` (or a pointer of any kind, which the info structure is) simply move it to `rdi`, and it'll be available as the first argument in C.
203203

@@ -240,7 +240,7 @@ Stivale 2 also provides some more advanced features:
240240
The limine bootloader not only supports x86, but tentatively supports aarch64 as well (uefi is required). There is also a stivale2-compatible bootloader called Sabaton, providing broader support for ARM platforms.
241241

242242
### Creating a Stivale2 Header
243-
The limine bootloader provides a `stivale2.h` file which contains a number of nice definitions for us, otherwise everything else here can be placed inside of a c/c++ file.
243+
The limine bootloader provides a `stivale2.h` file which contains a number of nice definitions for us, otherwise everything else here can be placed inside of a c/c++ file.
244244

245245
*Authors Note: I like to place my limine header tags in a separate file, for organisation purposes, but as long as they appear in the final binary, they can be anywhere. You can also implement this in assembly if you really want.*
246246

@@ -264,7 +264,7 @@ Next we'll need to create space for our stack (stivale2 requires us to provide o
264264
static uint8_t init_stack[0x2000];
265265

266266
__attribute__((section(".stivale2hdr)))
267-
static stivale2_header stivale2_hdr =
267+
static stivale2_header stivale2_hdr =
268268
{
269269
.entry_point = 0,
270270
.stack = (uintptr_t)init_stack + 0x2000,
@@ -285,9 +285,9 @@ Next we set some fields in the stivale2 header:
285285
In the example above we actually set the first tag to a framebuffer request, so lets see what that would look like:
286286
287287
```c
288-
static stivale2_header_tag_framebuffer framebuffer_tag =
288+
static stivale2_header_tag_framebuffer framebuffer_tag =
289289
{
290-
.tag =
290+
.tag =
291291
{
292292
.identifier = STIVALE2_HEADER_TAG_FRAMEBUFFER,
293293
.next = 0,
@@ -325,10 +325,10 @@ void* multiboot2_find_tag(uint32_t type)
325325
{
326326
if (tag->type == 0 && size == 8)
327327
return NULL; //we've reached the terminating tag
328-
328+
329329
if (tag->type == type)
330330
return tag;
331-
331+
332332
uintptr_t next_addr = (uintptr_t)tag + tag->size;
333333
next_addr = (next_addr / 8 + 1) * 8;
334334
tag = (multiboot_tag*)next_addr;

01_Build_Process/03_Gnu_Makefiles.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ Whew, there's a lot going on there! Let's look at why the various parts exist:
181181
- When the user runs `make` in the shell, the root makefile is run. This file is mostly configuration, specifying the toolchain and the options it'll use.
182182
183183
- This makefile then recursively calls make on each of the sub-projects.
184-
- For example, the kernel makefile will be run, and it will have all of the make variables specified in the root makefile in it's environment.
184+
- For example, the kernel makefile will be run, and it will have all of the make variables specified in the root makefile in its environment.
185185
- This means if we decide to change the toolchain, or want to add debug symbols to *all* projects, we can do it in a single change.
186186
- Libraries and userland apps work in a similar way, but there is an extra layer. What I've called the glue makefile. It's very simple, it just passes through the make commands from above to each sub project.
187187
- This means we don't need to update the root makefile every time a new userland app is updated, or a new library.

0 commit comments

Comments
 (0)