Skip to content

Commit 1dd6105

Browse files
Merged refactor branch: big changes
* Tidying, moved minichapters into folders * Added chapter 0, license placeholder file, file renaming. * renamed license file, added contents * added assumed knowledge, started about the authors section * added architecture intro, update higher half part * added info about structure of book * Fix introduction chapter * Add about me... * Start welcome section * Minor fix and cleanup * Add plotting pixel information * Add draw image * update framebuffer section * Fixing typos and minor updates * Expand timers section * Minor changes on ACPITables and introduction * Add PIC disable section in apic and Fix syntax on timer section * Expand APIC * expand lapic * Expand ioapic * Expand ACPI section * Minor changes * Update Introduction, and minor fixes * sentence flow, added x2apic and ipi notes * Initial work on timers * Minor changes to introduction * Expanded PIT section, began work on HPET * Add pandoc specific stuff * Fix typo * Minor updates in debugging section * Update 04_Virtual_Memory_Manager.md * Update 03_Driver_Implementation.md fix syntax * Fix Syntax * Use math rendering for formulas * forgot one formula * Fix rendering problems of some of the new formulas * timers: expanded section on hpet * more work on timers * finished timer section * sentence flow and grammar * Fixing some typos and adding some formulas * Update 02_Physical_Memory.md * Update 02_Physical_Memory.md * Update Framebuffer.md * Fix some formulas * Fix few formulas * bullet point list fixes * Replace remote stored image files with in-project files * requested changed, framebuffer updates, inline constants are now inside code blocks. * 03_Memory_Management/03_Paging.md * spelling: arent/dont --------- Co-authored-by: Dean <dean243@hotmail.com>
1 parent 1dc55bd commit 1dd6105

49 files changed

Lines changed: 1235 additions & 726 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.pandoc/makerelative.lua

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
function Image (img)
2+
img.src = pandoc.path.make_relative(img.src, '/')
3+
return img
4+
end
5+

.pandoc/pandoc.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: 'Osdev Notes'
3+
author:
4+
- Ivan G.
5+
- Dean T.
6+
header-includes:
7+
- \usepackage{fvextra}
8+
- \usepackage{appendix}
9+
- \DefineVerbatimEnvironment{Highlighting}{Verbatim}{breaklines,commandchars=\\\{\}}
10+
book: true
11+
---

00_Introduction/01_README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Welcome
2+
3+
Whether you're reading this online, or in a book, welcome to our collection of notes about operating systems development! We've written these while writing (and re-writing) our own operating systems, with the intent of guiding a reader through the various stages of building an operating system from scratch. We've tried to focus more on the concepts and theory behind the various components, with the code only provided to help solidify some concepts.
4+
5+
We hope you enjoy, and find something interesting here!
6+
7+
## Structure Of The Book
8+
9+
Each numbered chapter adds a new layer to the kernel, expanding it's capabilities. While it's not strictly necessary to read them in order, it is encouraged as some later chapters may reference earlier ones.
10+
11+
There are also a pair of special chapters at the end of the book: one containing a series of unrelated but useful topics (appendices), and one containing descriptions of various hardware devices you might want to support.
12+
13+
The appendices chapter is intended to be used a reference, and can be read at any time. The drivers chapter can also be read at any time, but implementing support for these devices should come after the memory management chapter (when a VMM has been implemented).
14+
15+
### Topics covered
16+
17+
As we've already mentioned, our main purpose here is the guide the reader through the general process of building a kernel (and surrounding operating system). We're using `x86_64` as our reference architecture, but most of the concepts should transfer to other architectures, with the exception of the very early states of booting.
18+
19+
Below a short list of all the topics that are covered so far:
20+
21+
* *Build Process* - The first part is all about getting an osdev environment up and running, explaining what tools are needed, and the steps to build and run a kernel.
22+
* *Architecture/Drivers* - This part contains most the architecture specific parts, as well as most of the data structures and unerlying mechanisms of the hardware we'll need. It also includes some early drivers that are very useful during further development (like the keyboard and timer).
23+
* *Memory Management* - This chapter offer an overview of the memory management layers of a kernel. We cover all the layers from the physical memory magager, virtual memory manager and the heap. We'll look at how these fit into the memory management stack, and how they work together.
24+
* *Scheduling* - A modern operating system should support running multiple programs at once. In this part we're going to look at how processes and threads are implemented, implement a simple scheduler and have a look at some of the typical concurrency issues that arise.
25+
* *Userspace* - Many modern architectures support different level of privileges, that means that programs that are running on lower levels can't access resources/data reserved for higher levels.
26+
* *IPC* - Also known as inter-process communication, is a mechanism for programs to communicate with each other in a safe and controlled way. We're going to take a look at a few ways to implement this.
27+
* *Virtual File System* - This will cover how a kernel presents different file systems to the rest of the OS. We'll also take a look at implementing a 'tempfs' that is loaded from a tape archive (tar), similar to initrd.
28+
* *The Elf format* - Once we have a file system we can load files from it, why not a program? This chapter looks at writing a simple program loader for ELF64 binaries, and why you would want to use this format.
29+
* *Going beyond* - The final part (for now): we have implemented all the core components of a kernel, and we are free to go from here. This final chapter contains some ideas for new components that you might want to add, or at least begin thinking about.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Assumed Knowledge
2+
3+
This book is written for beginners in operating system development, but some prior experience with programming is recommended. It is not intended to teach you C, or how to use a compiler or linker.
4+
5+
Code can have bugs and freestanding code can be hard (or impossible) to debug in some cases. Some hardware does not include serial ports, real CPUs can have bugs in hardware, or architectural quirks you're unaware of that interfere with developing for them.
6+
7+
As such, below is a list of the recommended prior experience before continuing with this book:
8+
9+
- Intermediate understanding of the C programming language. Mastery is not required, but you should be very familiar with the ins and outs of the language, especially pointers and pointer arithmetic.
10+
- You should be comfortable compiling and debugging code in userspace. GDB is recommended as several emulators provide a GDB server you can use to step through your kernel.
11+
- Knowledge and experience using common data structures like intrusive linked lists.
12+
13+
If you feel confident in your knowledge of the above, please read on! If not, don't be discouraged. There are plenty of resources available for learning, and you can always come back later.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## About The Authors
2+
3+
*Nerd and in free time programmer!! - I just love software development and everything is about computer, the lower level the better. My main job is software developer, as well as my main hobby (not the only one). I wanted to write my own kernel since i discovered programming. Even though it took years to understand how to do it. My first attempt in writing a kernel was DreamOS (32 bit kernel). My main programming language is C, although i used also Java, Python, Go, Assembly.* - Ivan.
4+
5+
*I'm a hobbyist programmer, and have been working on my operating system kernel since 2021, called northport. I've experimented with a few other projects in that time, namely a micro-kernel and a window manager. Before getting into osdev my programming interests were game engines and system utilities. My first programming project that I finished was a task manager clone in C#. These days C++ is my language of choice, I like the freedom the language offers, even if its the freedom to cause a triple fault.* - Dean.
6+

01_Build_Process/02_Overview.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ If you're using clang be sure to remember to pass `--target=xyz` with each comma
6464

6565
### Building C Source Files
6666
Now that we have a toolchain setup we can test it all works by compiling a C file.
67-
Create a C source file, it's contents dont matter here as we wont be running it, just telling it compiles.
67+
Create a C source file, it's contents don't matter here as we wont be running it, just telling it compiles.
6868

6969
Run the following to compile the file into an object file, and then to link that into the final executable.
7070

@@ -87,6 +87,7 @@ Telling the compiler to not use these features can be done by passing some extra
8787
- `-mno-mmx`: Disables using the FPU registers for 64-bit integer calculations.
8888
- `-mno-3dnow`: Disables 3dnow! extensions, similar to MMX.
8989
- `-mno-sse -mno-sse2`: Disables SSE and SSE2, which use the 128-bit xmm registers, and require setup before use.
90+
- `-mcmodel=kernel`: The compiler uses 'code models' to help optimize code generation depending on where in memory the code might run. The `medium` cmodel runs in the lower 2GiB, while the `large` runs anywhere in the 64-bit address space. You could use `large` for your kernel, but if you are loading your kernel in the top-most 2GiB you can use `kernel` which allows similar optimizations to `medium`.
9091

9192
There are also a few other compiler flags that are useful, but not necessary:
9293

@@ -100,7 +101,7 @@ This section should be seen as an extension to the section above on compiling C
100101
When compiling C++ for a freestanding environment, there are a few extra flags that are required:
101102

102103
- `-fno-rtti`: Tells the compiler not to generate **R**un**t**ime **t**ype **i**nformation. This requires runtime support from the compiler libaries, and the os. Neither of which we have in a freestanding environment.
103-
- `-fno-exceptions`: Requires the compiler libraries to work, again which we dont have. Means you can't use C++ exceptions in your code. Some standard functions (like the `delete` operator) still require you to declare them `noexcept` so the correct symbols are generated.
104+
- `-fno-exceptions`: Requires the compiler libraries to work, again which we don't have. Means you can't use C++ exceptions in your code. Some standard functions (like the `delete` operator) still require you to declare them `noexcept` so the correct symbols are generated.
104105

105106
And a few flags that are not required, but can be nice to have:
106107

01_Build_Process/03_Boot_Protocols.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ One important note about multiboot 2: the memory map is essentially the map give
4343
### Creating a Boot Shim
4444
The major caveat of multiboot when first getting started is that it drops you into 32-bit protected mode, meaning that you must set up long mode yourself. This also means you'll need to create a set of page tables to map the kernel into the higher half, since in pmode it'll be running with paging disabled, and therefore no translation.
4545

46-
Most implementations will use an assembly stub, linked at a lower address so it can be placed in physical memory properly. While the main kernel code is linked against the standard address used for higher half kernels: 0xffff'ffff'8000'0000. This address is sometimes referred to as the -2GB region(yes that's a minus), as a catch-all term for the upper-most 2GB of any address space. Since the exact address will be different depending on the number of bits used for the address space (32-bit vs 64-bit for example), referring to it as an underflow value is more portable.
46+
Most implementations will use an assembly stub, linked at a lower address so it can be placed in physical memory properly. While the main kernel code is linked against the standard address used for higher half kernels: `0xFFFF'FFFF'8000'0000`. This address is sometimes referred to as the -2GB region(yes that's a minus), as a catch-all term for the upper-most 2GB of any address space. Since the exact address will be different depending on the number of bits used for the address space (32-bit vs 64-bit for example), referring to it as an underflow value is more portable.
4747

4848
If you're curious as to why it's referred to as *minus* 2GB, it's a catch-all term for the upper-most 2GB of the address space, regardless of how big the address space may be (the upper 2GB of 32-bit and 64-bit addresses spaces are different addresses!).
4949

@@ -69,16 +69,20 @@ We'll need to modify our linker script a little since we boot up in protected mo
6969
```
7070
SECTIONS
7171
{
72-
KERNEL_VIRT_BASE = 0xffffffff8000000;
7372
. = 1M;
7473
74+
KERNEL_START = .;
75+
KERNEL_VIRT_BASE = 0xFFFFFFFF8000000;
76+
7577
.mb2_hdr :
7678
{
79+
/* Be sure that the multiboot2 header is at the beginning */
7780
KEEP(*(.mb2_hdr))
7881
}
7982
8083
.mb2_text :
8184
{
85+
/* Space for the assembly stub to get us into long mode */
8286
.mb2_text
8387
}
8488
@@ -100,10 +104,11 @@ SECTIONS
100104
*(.data)
101105
*(.bss)
102106
}
107+
KERNEL_END = .;
103108
}
104109
```
105110

106-
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xffff'ffff'8000'0000` lower than it's virtual address.
111+
This is very similar to a default linker script, but we make use of the `AT()` directive to set the LMA (load memory address) of each section. What this does is allow us to have the kernel loaded at a lower memory address so we can boot (in this case we set `. = 1M`, so 1MiB), but still have most of our kernel linked as higher half. The higher half kernel will just be loaded at a physical memory address that is `0xFFFF'FFFF'8000'0000` lower than it's virtual address.
107112

108113
However the first two sections are both loaded and linked at lower memory addresses. The first is our multiboot header, this is just static data, it dosnt really matter where it's loaded, as long as it's in the final file somewhere. The second section contains our protected mode boot shim: a small bit of code that sets up paging, and boots into long mode.
109114

@@ -114,18 +119,18 @@ The next thing is to create our multiboot2 header and boot shim. Multiboot2 head
114119
115120
# multiboot2 header: magic number, mode, length, checksum
116121
mb2_hdr_begin:
117-
.long 0xE85250d6
122+
.long 0xE85250D6
118123
.long 0
119124
.long (mb2_hdr_end - mb2_hdr_begin)
120-
.long -(0xE85250d6 + (mb2_hdr_end - mb2_hdr_begin))
125+
.long -(0xE85250D6 + (mb2_hdr_end - mb2_hdr_begin))
121126
122127
# framebuffer tag: type = 5
123128
mb2_framebuffer_req:
124129
.short 5
125130
.short 1
126131
.long (mb2_framebuffer_end - mb2_framebuffer_req)
127132
# preferred width, height, bpp.
128-
# leave as zero to indicate "dont care"
133+
# leave as zero to indicate "don't care"
129134
.long 0
130135
.long 0
131136
.long 0

01_Build_Process/05_Linker_Scripts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ PHDRS
164164
SECTIONS
165165
{
166166
/* start linking at the -2GB address. */
167-
. = 0xffffffff80000000;
167+
. = 0xFFFFFFFF80000000;
168168
169169
/* text output section, to go in 'text' phdr */
170170
.text :
@@ -205,6 +205,6 @@ SECTIONS
205205
} :data
206206
207207
/* we can use the '.' to determine how large the kernel is. */
208-
KERNEL_SIZE = . - 0xffffffff80000000;
208+
KERNEL_SIZE = . - 0xFFFFFFFF80000000;
209209
}
210210
```

02_Architecture/01_README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Architecture
2+
3+
Before going beyond a basic "hello world" and implementing the first real parts of our kernel, there are some key concepts about how the CPU operates that we have to understand. What is an interrupt, and how do we handle it? What does it mean to mask them? What is the GDT and what is it's purpose?
4+
5+
It's worth noting that we're going to focus exclusively on x86_64 here, and some concepts are specific to this platform (the GDT, for example), while some concepts are transferable across most platforms (like a higher half kernels). Some concepts, like interrupts and interrupt handlers, are partially transferable to other platforms.
6+
7+
## Address Spaces
8+
9+
If you've never programmed at a low level before, you'll likely only dealt with a single address space: the virtual address space your program lives in. However there are actually many other address spaces to be aware of!
10+
11+
This brings up the idea that an address is only useful in a particular address space. Most of the time we will be using virtual addresses, which is fine before our program lives in a virtual address space, but at times we will use *physical addresses* which, as you might have guessed, deal with the physical address space.
12+
13+
These are not the same, as we'll see later on we can convert virtual addresses to physical addresses (usually the cpu will do this for us), but they are actually separate things.
14+
15+
There are also other address spaces you may encounter in osdev, like:
16+
17+
- Port I/O: Some older devices on x86 are wired up to 'ports' on the cpu, with each port being given an address. These addresses are not virtual or physical memory addresses, so we can't access them like pointers. Instead special cpu instructions are used to move in and out of this address space.
18+
- PCI Config Space: PCI has an entirely separate address that for configuring devices. This address space has a few different ways to access it.
19+
20+
Most of the time you won't have to worry about which address space to deal with: hardware will only deal with physical addresses, and your code will mostly deal with virtual addresses. As mentioned earlier we'll later look at how we use both of these so don't worry!
21+
22+
### Higher and Lower Halves
23+
24+
The concept of a higher half (and lower half) could be applied to any address space, but they are typically used to refer to the virtual address space. Since the virtual address space has a *non-canonical hole*, there are two distinct halves to it.
25+
26+
The non-canonical hole is the range of addresses in the middle of the virtual address space that the MMU (memory management unit) considers to be invalid. We'll look more at the MMU and why this exists in later chapters, but for now just know that the higher half refers to addresses above the hole, and the lower half is everything below it.
27+
28+
Of course like any convention you are free to ignore this and forge your own ways of dividing the address space between user programs and the kernel, but this is the recommended approach: the higher half is the for the kernel, the lower half is for userspace.
29+
30+
## The GDT
31+
32+
The global descriptor table has a lot of legacy on the x86 architecture and has been used for a lot of things in the past. At it's core you can think of it as a big array of descriptors, with each descriptor being a magic number that tells the cpu how to operate. Outside of long mode these descriptors can be used for memory segmentation on the CPU, but this is disabled in long mode. In long mode their only important fields are the DPL (privilege level) and their type (code, data or something else).
33+
34+
It's easy to be overwhelmed by the number of fields in the GDT, but most modern x86_64 kernels only use a handful of static descriptors: 64-bit kernel code, 64-bit kernel data, 64-bit user code, 64-bit user data. Later on we'll add a TSS descriptor too, which is required when we try to handle an interrupt while the CPU is running user code.
35+
36+
The currently active descriptors tell the CPU what mode it is in: if a user code descriptor is loaded - it's running user-mode code. Data descriptors tell the CPU what privilege level to use when we access memory, which interacts with the user/supervisor bit in the page tables (as we'll see later).
37+
38+
If you're unsure where to start, you'll need a 64-bit kernel code descriptor and 64-bit kernel data descriptor at the bare mimimum.
39+
40+
## How The CPU Executes Code
41+
42+
Normally the CPU starts a program, and runs it until the program needs to wait for something. At a time in the future, the program may continue and even eventually exit. This is the typical life cycle of a userspace program.
43+
44+
On bare metal we have more things to deal with, like how do we run more than a single program at once? Or how do we keep track of the time to update a clock? What is the user presses a key or moves the mouse, how do we detect that efficiently? Maybe something we can't predict happens like a program trying to access memory it's not supposed to, or a new packet arrives over the network.
45+
46+
These things can happen at any time, and as the operating system kernel we would like to react to them and take some action. This is where interrupts come in.
47+
48+
### Interrupts
49+
50+
When an unexpected event happens, the cpu will immediately stop the current code it's running and start running a special function called an *interrupt handler*. The interrupt handler is something the kernel tells the cpu about, and the function can then work out what event happened, and then take some action. The interrupt handler then tells the cpu when it's done, and then cpu goes back to executing the previously running code.
51+
52+
The interrupted code is usually never aware that an interrupt even occured, and should continue on as normal.
53+

0 commit comments

Comments
 (0)