You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update 01_Overview.md
* Update MM 01_Overview.md
typo fix
* Expoand direct map and recursion in paging chapter
* Typo fixes (feeling like the Captain Julio Sham)
* Changes requested in review
* changes requested in review
Copy file name to clipboardExpand all lines: 01_Build_Process/01_Overview.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ Both C and C++ have several freestanding headers. The common ones are `stdint.h`
34
34
35
35
## Cross Compilation
36
36
37
-
Often this is not necessary for hobby os projects, as we are running our code on the same cpu architecture that we're compiling on. However it's still recommended to use one as we can configure the cross compiler to the specs we want, rather than relying on the one provided by the host os.
37
+
Often this is not necessary for hobby os projects, as we are running our code on the same cpu architecture that we're compiling on. However it's still recommended to use one as we can configure the cross compiler to the specs we want, rather than relying on the one provided by the host os.
38
38
39
39
A cross compiler is always required when building the os for a different cpu architecture. Building code for a `risc-v` cpu, while running on an `x86` cpu would require a cross compiler for example.
40
40
@@ -81,7 +81,7 @@ If using clang be sure to remember to pass `--target=xyz` with each command. Thi
81
81
82
82
### Building C Source Files
83
83
Now that we have a toolchain setup we can test it all works by compiling a C file.
84
-
Create a C source file, it's contents don't matter here as we wont be running it, just telling it compiles.
84
+
Create a C source file, its contents don't matter here as we wont be running it, just telling it compiles.
85
85
86
86
Run the following to compile the file into an object file, and then to link that into the final executable.
87
87
@@ -131,21 +131,21 @@ The GCC Linker (`ld`) and the compatible clang linker (`lld.ld`) can accept link
131
131
These describe the layout of the final executable to the linker: what things go where, with what alignment and permissions.
132
132
This is incredibly important for a kernel, as it's the file that will be loaded by the bootloader, which may impose certain restrictions or provide certain features.
133
133
134
-
These are their own topic, and have a full chapter dedicated to them later in this chapter. We likely haven't used these when building userspace programs, as our compiler/os installation provides a default one. However since we're building a freestanding program (the kernel) now we need to be explicit about these things.
134
+
These are their own topic, and have a full chapter dedicated to them later in this chapter. We likely haven't used these when building userspace programs, as our compiler/os installation provides a default one. However since we're building a freestanding program (the kernel) now we need to be explicit about these things.
135
135
136
136
A linker script can be simply added appending the `-T script_name_here.ld` to the linker command.
137
137
138
138
Outside of linker scripts, the linking process goes as following:
139
139
140
140
```sh
141
-
$(LD)$(OBJS) -o output_filename_here.elf
141
+
$(LD)$(OBJS) -o output_filename_here.elf
142
142
-nostdlib -static -pie --no-dynamic-linker
143
143
```
144
144
145
145
For an explanation of the above linker flags used:
146
146
147
147
-`-nostdlib`: this is crucial for building a freestanding program, as it stops the linker automatically including the default libraries for the host platform. Otherwise the program will contain a bunch of code that wants to make syscalls to the host OS.
148
-
-`-static`: A safeguard for linking against other libraries. The linker will error if we try to dynamically link with anything (i.e static linking only). Because again there is no runtime, there is no dynamic linker.
148
+
-`-static`: A safeguard for linking against other libraries. The linker will error if we try to dynamically link with anything (i.e static linking only). Because again there is no runtime, there is no dynamic linker.
149
149
-`-pie` and `--no-dynamic-linker`: Not strictly necessary, but forces the linker to output a relocatable program with a very narrow set of relocations. This is useful as it allows some bootloaders to perform relocations on the kernel.
150
150
151
151
One other linker option to keep in mind is `-M`, which displays the link map that was generated. This is a description of how and where the linker allocated everything in the final file. It can be seen as a manual symbol table.
@@ -156,13 +156,13 @@ Now compiling and building one file isn't so bad, but the same process for multi
156
156
157
157
_Make_ is a common tool used for building many pieces of software due to how easy and common `make` is. Specifically GNU make. GNU make is also chosen as it comes installed by default in many linux distros, and is almost always available if it's not already installed.
158
158
159
-
There are other make-like tools out there (xmake, nmake) but these are less popular, and therefore less standardized. For the lowest common denominator we'll stick with the original GNU make, which is discussed later on in it's chapter.
159
+
There are other make-like tools out there (xmake, nmake) but these are less popular, and therefore less standardized. For the lowest common denominator we'll stick with the original GNU make, which is discussed later on in its chapter.
160
160
161
161
## Quick Addendum: Easily Generating a Bootable Iso
162
162
163
163
There are more details to this, however most bootloaders will provide a tool that lets us create a bootable iso, with the kernel, the bootloader itself and any other files we might want. For grub this is `grub-mkrescue` and limine provides `limine-install` for version 2.x or `limine-deploy` for version 3.x.
164
164
165
-
While the process of generating an iso is straightforward enough when using something like xorisso, the process of installing a bootloader into that iso is usually bootloader dependent. This is covered more in detail in it's own chapter.
165
+
While the process of generating an iso is straightforward enough when using something like xorisso, the process of installing a bootloader into that iso is usually bootloader dependent. This is covered more in detail in its own chapter.
166
166
167
167
If just here for a quick reference, grub uses `grub-mkrescue` and a `grub.cfg` file, limine reqiures us to build the iso by yourselves with a `limine.cfg` on it, and then run `limine-deploy`.
168
168
@@ -202,7 +202,7 @@ There are a few other qemu flags we might want to be aware of:
202
202
203
203
We'll never know when we need to debug your kernel, especially when running in a virtualized environment. Having debug symbols included in the kernel will increase the file size, but can be useful. If we want to remove them from an already compiled kernel the `strip` program can be used to strip excess info from a file.
204
204
205
-
Including debug info in the kernel is the same as any other program, simply compile with the `-g` flag.
205
+
Including debug info in the kernel is the same as any other program, simply compile with the `-g` flag.
206
206
207
207
There are different versions of DWARF (the debugging format used by elf files), and by default the compiler will use the most recent one for our target platform. However this can be overridden and the compiler can be forced to use a different debug format (if needed). Sometimes there can be issues if the debugger is from a different vendor to our compiler, or is much older.
208
208
@@ -235,7 +235,7 @@ Next we'll want to find the `.symtab` section header, who's contents are an arra
235
235
236
236
Now to get the name of a section, we'll need to find the matching symbol entry, which will give us the offset of the associated string in the string table. With that we can now access mostly human-readable names for our kernel.
237
237
238
-
Languages built around the C model will usually perform some kind of name mangling to enable features like function overloading, namespaces and so on. This is a whole topic on it's own. Name mangling can be through of as a translation that takes place, to allow things like function overloading and templates to work in the C naming model.
238
+
Languages built around the C model will usually perform some kind of name mangling to enable features like function overloading, namespaces and so on. This is a whole topic on its own. Name mangling can be through of as a translation that takes place, to allow things like function overloading and templates to work in the C naming model.
239
239
240
240
### Locating The Symbol Table
241
241
@@ -263,7 +263,7 @@ const char* name = strtab_data[example_shdr->sh_name];
263
263
264
264
Now all that's left is a function that parses the symbol table. It's important to note that some symbols only occupy a single address, like a label or a variable, while others will occupy a range of addresses. Fortunately symbols have a size field.
265
265
266
-
An example function is included below, showing how a symbol can be looked up by it's address. The name of this symbol is then printed, using a fictional `print` function.
266
+
An example function is included below, showing how a symbol can be looked up by its address. The name of this symbol is then printed, using a fictional `print` function.
//addr is inside of syms[i], let's print the symbol name
285
285
print(strtab_data[syms[i].st_name]);
286
286
return;
287
287
}
288
288
}
289
289
```
290
290
291
-
A quick note about getting the symbol table data address: On multiboot `sym_tab->sh_offset` will be the physical address of the data, while stivale2 will return the original value, which is an offset from the beginning of the file. This means for stivale 2 we would add `file_tag->kernel_base` to this address to get it's location in memory.
291
+
A quick note about getting the symbol table data address: On multiboot `sym_tab->sh_offset` will be the physical address of the data, while stivale2 will return the original value, which is an offset from the beginning of the file. This means for stivale 2 we would add `file_tag->kernel_base` to this address to get its location in memory.
Copy file name to clipboardExpand all lines: 01_Build_Process/02_Boot_Protocols.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Both protocols have their earlier versions (_multiboot 1 & stivale 1_), but thes
20
20
21
21
It's a fair question. In the world of testing on qemu/bochs/vmware/vbox, its easy to write a bootloader directly against UEFI or BIOS. Things get more complicated on real hardware though.
22
22
23
-
Unlike CPUs, where the manufacturers follow the spec exactly and everything works as described, manufacturers of PCs generally follow *most* of the specs, with every machine having its minor caveats. Some assumptions can't be assumed everywhere, and some machines sometimes outright break spec. This leads to a few edge cases on some machines, and more or less on some others. It's a big mess.
23
+
Unlike CPUs, where the manufacturers follow the spec exactly and everything works as described, manufacturers of PCs generally follow *most* of the specs, with every machine having its minor caveats. Some assumptions can't be assumed everywhere, and some machines sometimes outright break spec. This leads to a few edge cases on some machines, and more or less on some others. It's a big mess.
24
24
25
25
This is where a bootloader comes in: a layer of abstraction between the kernel and the mess of PC hardware. It provides a boot protocol (often many we can choose from), and then ensures that everything in the hardware world is setup to allow that protocol to function. This is until the kernel has enough drivers set up to take full control of the hardware itself.
26
26
@@ -40,7 +40,7 @@ One of the major differences between the two protocols is how info is passed bet
40
40
-_Multiboot 2_ uses a fixed sized header that includes a `size` field, which contains the _number of bytes of the header + all of the following requests_. Each request contains an `identifier` field and then some request specific fields. This has slightly more overhead, but is more flexible. The requests are terminated with a special `null request` (see the specs on this).
41
41
42
42
-_Multiboot 1_ returns info to the kernel via a single large structure, with a bitmap indicating which sections of the structure are considered valid.
43
-
-_Multiboot 2_ returns a pointer to a series of tags. Each tag has an `identifier` field, used to determine it's contents, and a size field that can be used to calculate the address of the next tag. This list is also terminated with a special `null` tag.
43
+
-_Multiboot 2_ returns a pointer to a series of tags. Each tag has an `identifier` field, used to determine its contents, and a size field that can be used to calculate the address of the next tag. This list is also terminated with a special `null` tag.
44
44
45
45
One important note about multiboot 2: the memory map is essentially the map given by the bios/uefi. The areas used by bootloader memory (like the current gdt/idt), kernel and info structure given to the kernel are all allocated in *free* regions of memory. The specification does not say that these regions must then be marked as *used* before giving the memory map to the kernel. This is actually how grub handles this, so should definitely do a sanity check on the memory map.
46
46
@@ -62,10 +62,10 @@ Since we have enabled paging, we'll also need to populate `cr3` with a valid pag
62
62
63
63
We will be operating in compatibility mode, a subset of long mode that pretends to be a protected mode cpu. This is to allow legacy programs to run in long mode. However we can enter full 64-bit long mode by reloading the CS register with a far jump or far return. See the [GDT notes](../GDT.md) for details on doing that.
64
64
65
-
It's worth noting that this boot shim will need it's own linker sections for code and data, since until we have entered long mode the higher half sections used by the rest of the kernel won't be available, as we have no memory at those addresses yet.
65
+
It's worth noting that this boot shim will need its own linker sections for code and data, since until we have entered long mode the higher half sections used by the rest of the kernel won't be available, as we have no memory at those addresses yet.
66
66
67
67
### Creating a Multiboot 2 Header
68
-
Multiboot 2 has a header available at the bottom of it's specification that we're going to use here.
68
+
Multiboot 2 has a header available at the bottom of its specification that we're going to use here.
69
69
70
70
We'll need to modify our linker script a little since we boot up in protected mode, with no virtual memory:
71
71
@@ -77,7 +77,7 @@ SECTIONS
77
77
KERNEL_START = .;
78
78
KERNEL_VIRT_BASE = 0xFFFFFFFF8000000;
79
79
80
-
.mb2_hdr :
80
+
.mb2_hdr :
81
81
{
82
82
/* Be sure that the multiboot2 header is at the beginning */
83
83
KEEP(*(.mb2_hdr))
@@ -111,7 +111,7 @@ SECTIONS
111
111
}
112
112
```
113
113
114
-
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xFFFF'FFFF'8000'0000` lower than it's virtual address.
114
+
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xFFFF'FFFF'8000'0000` lower than its virtual address.
115
115
116
116
However the first two sections are both loaded and linked at lower memory addresses. The first is our multiboot header, this is just static data, it doesn't really matter where it's loaded, as long as it's in the final file somewhere. The second section contains our protected mode boot shim: a small bit of code that sets up paging, and boots into long mode.
117
117
@@ -145,7 +145,7 @@ mb2_framebuffer_end:
145
145
mb2_hdr_end:
146
146
```
147
147
148
-
A full boot shim is left as an exercise to the reader, we may want to do extra things before moving into long mode. Or may not, but a skeleton of what's required is provided below.
148
+
A full boot shim is left as an exercise to the reader, we may want to do extra things before moving into long mode. Or may not, but a skeleton of what's required is provided below.
149
149
150
150
```x86asm
151
151
.section .data
@@ -155,7 +155,7 @@ boot_stack_base:
155
155
# backup the address of mb2 info struct, since ebx may be clobbered
156
156
.section .mb_text
157
157
mov %ebx, %edi
158
-
158
+
159
159
# setup a stack, and reset flags
160
160
mov $(boot_stack_base + 0x1000), %esp
161
161
pushl $0x2
@@ -197,7 +197,7 @@ After performing the long-return (`lret`) we'll be running `target_function` in
197
197
198
198
Some of the things were glossed there, like paging and setting up a gdt, are explained in their own chapters.
199
199
200
-
We'll also want to pass the multiboot info structure to the kernel's main function.
200
+
We'll also want to pass the multiboot info structure to the kernel's main function.
201
201
202
202
The interface between a higher level language like C and assembly (or another high level language) is called the ABI (application binary interface). This is discussed more in the chapter about C, but for now to pass a single `uint64_t` (or a pointer of any kind, which the info structure is) simply move it to `rdi`, and it'll be available as the first argument in C.
203
203
@@ -240,7 +240,7 @@ Stivale 2 also provides some more advanced features:
240
240
The limine bootloader not only supports x86, but tentatively supports aarch64 as well (uefi is required). There is also a stivale2-compatible bootloader called Sabaton, providing broader support for ARM platforms.
241
241
242
242
### Creating a Stivale2 Header
243
-
The limine bootloader provides a `stivale2.h` file which contains a number of nice definitions for us, otherwise everything else here can be placed inside of a c/c++ file.
243
+
The limine bootloader provides a `stivale2.h` file which contains a number of nice definitions for us, otherwise everything else here can be placed inside of a c/c++ file.
244
244
245
245
*Authors Note: I like to place my limine header tags in a separate file, for organisation purposes, but as long as they appear in the final binary, they can be anywhere. You can also implement this in assembly if you really want.*
246
246
@@ -264,7 +264,7 @@ Next we'll need to create space for our stack (stivale2 requires us to provide o
264
264
staticuint8_t init_stack[0x2000];
265
265
266
266
__attribute__((section(".stivale2hdr)))
267
-
static stivale2_header stivale2_hdr =
267
+
static stivale2_header stivale2_hdr =
268
268
{
269
269
.entry_point = 0,
270
270
.stack = (uintptr_t)init_stack + 0x2000,
@@ -285,9 +285,9 @@ Next we set some fields in the stivale2 header:
285
285
In the example above we actually set the first tag to a framebuffer request, so lets see what that would look like:
Copy file name to clipboardExpand all lines: 01_Build_Process/03_Gnu_Makefiles.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -181,7 +181,7 @@ Whew, there's a lot going on there! Let's look at why the various parts exist:
181
181
- When the user runs `make` in the shell, the root makefile is run. This file is mostly configuration, specifying the toolchain and the options it'll use.
182
182
183
183
- This makefile then recursively calls make on each of the sub-projects.
184
-
- For example, the kernel makefile will be run, and it will have all of the make variables specified in the root makefile in it's environment.
184
+
- For example, the kernel makefile will be run, and it will have all of the make variables specified in the root makefile in its environment.
185
185
- This means if we decide to change the toolchain, or want to add debug symbols to *all* projects, we can do it in a single change.
186
186
- Libraries and userland apps work in a similar way, but there is an extra layer. What I've called the glue makefile. It's very simple, it just passes through the make commands from above to each sub project.
187
187
- This means we don't need to update the root makefile every time a new userland app is updated, or a new library.
0 commit comments