Skip to content

Commit 624ced4

Browse files
authored
Update nasm chapter (#79)
* Fix Examples * remove line * Add example for loopy cycle in the nasm appendix * Update updates
1 parent 733599f commit 624ced4

2 files changed

Lines changed: 76 additions & 47 deletions

File tree

99_Appendices/D_Nasm.md

Lines changed: 75 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
There are some cases where writing some assembly code is preferred/needed to do certain operations (i.e. interrupts handling).
66

77
Nasm has a macro processor that supports conditional assembly, multi-level file inclusion, etc.
8-
A macro start with the '%' symbol.
8+
A macro start with the '%' symbol.
99

10-
There are two types of macros: _single line_ (defined with `%define`) and _multiline_ wrapped around `%macro` and `%endmacro`. In this paragraph we will explain the multi-line macros.
10+
There are two types of macros: _single line_ (defined with `%define`) and _multiline_ wrapped around `%macro` and `%endmacro`. In this paragraph we will explain the multi-line macros.
1111

12-
A multi-line macro is defined as follows:
12+
A multi-line macro is defined as follows:
1313

1414
```nasm
1515
%macro my_first_macro 1
@@ -19,7 +19,7 @@ A multi-line macro is defined as follows:
1919
%endmacro
2020
```
2121

22-
A macro can be accessed from C if needed, in this case we need to add a global label to it, for example the macro above will become:
22+
A macro can be accessed from C if needed, in this case we need to add a global label to it, for example the macro above will become:
2323

2424
```nasm
2525
%macro my_first_macro 1
@@ -31,22 +31,22 @@ my_first_macro_label_%1:
3131
%endmacro
3232
```
3333

34-
In the code above we can see few new things:
34+
In the code above we can see few new things:
3535

3636
* First we said the the label `my_first_macro_label_%1` has to be set as global, this is pretty straightforward to understand.
37-
* the `%1` in the label definition, let us create different label using the first parameter passed in the macro.
37+
* the `%1` in the label definition, let us create different label using the first parameter passed in the macro.
3838

39-
So if now we add a new line with the following code:
39+
So if now we add a new line with the following code:
4040

4141
```nasm
4242
my_first_macro 42
4343
```
4444

45-
It creates the global label: `my_first_macro_label_42`, and since it is global it will be visible also from our C code (of course if the files are linked)
45+
It creates the global label: `my_first_macro_label_42`, and since it is global it will be visible also from our C code (of course if the files are linked)
4646

47-
Basically defining a macro with nasm is similar to use C define statement, these special "instruction" are evaluated by nasm preprocessor, and transformed at compile time.
47+
Basically defining a macro with nasm is similar to use C define statement, these special "instruction" are evaluated by nasm preprocessor, and transformed at compile time.
4848

49-
So for example *my_first_macro 42* is transformed in the following statement:
49+
So for example *my_first_macro 42* is transformed in the following statement:
5050

5151
```nasm
5252
my_first_macro_label_42:
@@ -57,16 +57,16 @@ my_first_macro_label_42:
5757

5858
## Declaring Variables
5959

60-
In Nasm if we want to declare a "variable" initialized we can use the following directives:
60+
In Nasm if we want to declare a "variable" initialized we can use the following directives:
6161

62-
| Directive | Description |
62+
| Directive | Description |
6363
|-----------|-----------------------------------|
6464
| DB | Allocate a byte |
6565
| DW | Allocate 2 bytes (a word) |
6666
| DD | Allocate 4 bytes (a double word) |
6767
| DQ | Allocate 8 bytes (a quad word) |
6868

69-
These directive are intended to be used for initialized variables. The syntax is:
69+
These directive are intended to be used for initialized variables. The syntax is:
7070

7171
```nasm
7272
single_byte_var:
@@ -79,24 +79,24 @@ quad_var:
7979
dq 133.463 ; Example with a real number
8080
```
8181

82-
But what if we want to declare a string? Well in this case we can use a different syntax for db:
82+
But what if we want to declare a string? Well in this case we can use a different syntax for db:
8383

8484
```nasm
8585
string_var:
8686
db "Hello", 10
8787
```
8888
What does it mean? We are simply declaring a variable (string_variable) that starts at 'H', and fill the consecutive bytes with the next letters. But what about the last number? It is just an extra byte, that represents the newline character. So what we are really storing is the string _"Hello\\n"_
8989

90-
Now what we have seen so far is valid for a variable that can be initialized with a value, but what if we don't know the value yet, but we want just to "label" it with a variable name? Well is pretty simple, we have equivalent directives for reserving memory:
90+
Now what we have seen so far is valid for a variable that can be initialized with a value, but what if we don't know the value yet, but we want just to "label" it with a variable name? Well is pretty simple, we have equivalent directives for reserving memory:
9191

92-
| Directive | Description |
92+
| Directive | Description |
9393
|-------------|---------------------------------|
9494
| RESB | Rserve a byte |
9595
| RESW | Rserve 2 bytes (a word) |
9696
| RESD | Rserve 4 bytes (a double word) |
9797
| RESQ | Rserve 8 bytes (a quad word) |
9898

99-
The syntax is similar as the previous examples:
99+
The syntax is similar as the previous examples:
100100

101101
```nasm
102102
single_byte_var:
@@ -109,27 +109,27 @@ quad_var:
109109
resq 4
110110
```
111111

112-
One moment! What are those number after the directives? Well it's pretty simple, they indicate how many bytes/word/dword/qword we want to allocate. In the example above:
112+
One moment! What are those number after the directives? Well it's pretty simple, they indicate how many bytes/word/dword/qword we want to allocate. In the example above:
113113
* `resb 1` Is reserving one byte
114114
* `resw 2` Is reserving 2 words, and each word is 2 bytes each, in total 4 bytes
115115
* `resd 3` Is reserving 3 dwords, again a dword is 4 bytes, in total we have 12 bytes reserved
116-
* `resq 4` Is reserving... well you should know it now...
116+
* `resq 4` Is reserving... well you should know it now...
117117

118118
## Calling C from Nasm
119119

120-
In the asm code, if in 64bit mode, a call to *cld* is required before calling an external C function.
120+
In the asm code, if in 64bit mode, a call to *cld* is required before calling an external C function.
121121

122-
So for example if we want to call the following function from C:
122+
So for example if we want to call the following function from C:
123123

124124
```C
125125
void my_c_function(unsigned int my_value){
126126
printf("My shiny function called from nasm worth: %d\n", my_value);
127127
}
128128
```
129129
130-
First thing is to let the compiler know that we want to reference an external function using `extern`, and then just before calling the function, add the instruction cld.
130+
First thing is to let the compiler know that we want to reference an external function using `extern`, and then just before calling the function, add the instruction cld.
131131
132-
Here an example:
132+
Here an example:
133133
134134
```nasm
135135
[extern my_c_function]
@@ -143,58 +143,56 @@ call my_c_function
143143

144144
As mentioned in the multiboot chapter, argument passing from asm to C in 64 bits is little bit different from 32 bits, so the first parameter of a C function is taken from `rdi` (followed by: `rsi`, `rdx`, `rcx`, `r8`, `r9`, then the stack), so the `mov rdi, 42` is setting the value of *my_value* parameter to 42.
145145

146-
The output of the printf will be then:
146+
The output of the printf will be then:
147147

148148
```
149149
My shiny function called from nasm worth: 42
150150
```
151151

152152
## About Sizes
153153

154-
Variable sizes are always important while coding, but while coding in asm they are even more important to understand how they works in assembly, and since there is no real type you can't rely on the variable type.
154+
Variable sizes are always important while coding, but while coding in asm they are even more important to understand how they works in assembly, and since there is no real type you can't rely on the variable type.
155155

156-
The important things to know when dealing with assembly code:
156+
The important things to know when dealing with assembly code:
157157

158-
* when moving from memory to register, using the wrong register size will cause wrong value being loaded into the registry. Example:
158+
* when moving from memory to register, using the wrong register size will cause wrong value being loaded into the registry. Example:
159159

160160
```nasm
161161
mov rax, [memory_location_label]
162162
```
163-
is different from:
163+
is different from:
164164

165165
```nasm
166166
mov eax, [memory_location_label]
167167
```
168168

169-
And it could potentially lead to two different values in the register. That because the size of rax is 8 bytes, while eax is only 4 bytes, so if we do a move from memory to register in the first case, the processor is going to read 8 memory locations, while in the second case only 4, and of course there can be differences (unless we are lucky enough and the extra 4 bytes are all 0s).
169+
And it could potentially lead to two different values in the register. That because the size of rax is 8 bytes, while eax is only 4 bytes, so if we do a move from memory to register in the first case, the processor is going to read 8 memory locations, while in the second case only 4, and of course there can be differences (unless we are lucky enough and the extra 4 bytes are all 0s).
170170

171171
This is kind of misleading if we usually do mostly register to memory, or value to register, value to memory, where the size is "implicit".
172172

173-
_Authors Note_: Probably it can be a trivial issue, but it took me couple of hours to figure it out!
174-
175173
## If Statement
176174

177-
Below an example showing a possible solution to a complex if statement. Let's assume that we have the following `if` statement in C and we want to translate in assembly:
175+
Below an example showing a possible solution to a complex if statement. Let's assume that we have the following `if` statement in C and we want to translate in assembly:
178176

179177
```C
180178
if ( var1==SOME_VALUE && var2 == SOME_VALUE2){
181179
//do something
182180
}
183181
```
184182

185-
In asm we can do something like the following:
183+
In asm we can do something like the following:
186184

187185
```asm
188186
cmp [var1], SOME_VALUE
189-
jne else_label
187+
jne .else_label
190188
cmp [var2], SOME_VALUE2
191189
jne .else_label
192190
;here code if both conditions are true
193191
.else_label:
194192
;the else part
195193
```
196194

197-
And in a similar way we can have a if statement with a logic OR:
195+
And in a similar way we can have a if statement with a logic OR:
198196

199197
```C
200198
if (var1 == SOME_VALUE || var2 == SOME_VALUE){
@@ -209,33 +207,38 @@ cmp [var1], SOME_VALUE
209207
je .true_branch
210208
cmp [var2], SOME_VALUE
211209
je .true_branch
210+
jmp .else_label
212211
.true_branch
213212
jne .else_label
214213
```
215214

216-
## Switch Statement
215+
## Switch Statement
217216

218217
The usual switch statement in C:
219218
```C
220219
switch(variable){
221-
case X:
220+
case SOME_VALUE:
222221
//do something
223222
break;
224-
case Y:
223+
case SOME_VALUE2:
224+
//do something
225+
break;
226+
case SOME_VALUE3:
225227
//do something
226228
break;
227229
}
228230
```
229231

230-
can be rendered as:
231-
232+
can be rendered as:
233+
232234
```asm
233235
cmp [var1], SOME_VALUE
234236
je .value1_case
235237
cmp [var1], SOME_VALUE2
236238
je .value2_case
237239
cmp [var1], SOME_VALUE3
238240
je .value3_case
241+
jmp .item_not_needed
239242
.value1_case
240243
;do stuff for value1
241244
jmp .item_not_needed
@@ -245,14 +248,39 @@ je .value3_case
245248
.value3_case:
246249
;do stuff for value3
247250
.item_not_needed
248-
;rst of the code
251+
;rest of the code
249252
```
250253

254+
## Loop
255+
256+
Another typical scenario are loops. For example imagine we have the following while loop in C:
257+
258+
```c
259+
unsigned int counter = 0;
260+
while (counter < SOME_VALUE) {
261+
//do something
262+
counter++;
263+
}
264+
```
265+
266+
Again in assembly we can use the `jmp` instructions family:
267+
268+
```asm
269+
mov ecx, 0 ; Loop counter
270+
.loop_cycle
271+
; do sometehing
272+
inc ecx
273+
cmp ecx, SOME_VALUE
274+
jne loop_cycle
275+
```
276+
277+
The `inc` instruction increase the value contained by the `ecx` register.
278+
251279
## Data Structures
252280

253281
Every language supports accessing data as a raw array of bytes, C provides an abstraction over this in the form of structs. NASM also happens to provide us with an abstraction over raw bytes, that is similar to how C does it.
254282

255-
This guide will just introduce quickly how to define a basic struct, for more information and use cases is better to check the netwide assembler official documentation (see the useful links section)
283+
This section will just introduce quickly how to define a basic struct, for more information and use cases is better to check the netwide assembler official documentation (see the useful links appendix)
256284

257285
Let's for example assume we have the following C struct:
258286

@@ -263,8 +291,8 @@ struct task {
263291
};
264292
```
265293

266-
How nasm render a struct is basically declaring a list of offset labels, in this way we can use them to access the field starting from the struct memory location (*Authors note: yeah it is a trick...*)
267-
To create a struct in nasm we use the `struc` and `endstruc` keywords, and the fields are defined between them.
294+
How nasm render a struct is basically declaring a list of _offset labels_, in this way we can use them to access the field starting from the struct memory location (*Authors note: yeah it is a trick...*)
295+
To create a struct in nasm we use the `struc` and `endstruc` keywords, and the fields are defined between them.
268296
The example above can be rendered in the following way:
269297

270298
```asm
@@ -274,7 +302,7 @@ struc task
274302
endstruc
275303
```
276304

277-
What this code is doing is creating three symbols: id as 0 representing the offset from the beginning of a task structure and name as 4 (still the offset) and the task symbol that is 0 too. This notation has a drawback, it defines the labels as global constants, so you can't have another struct or label declared with same name, to solve this problem you can use the following notation:
305+
What this code is doing is creating three symbols: `id` as 0 representing the offset from the beginning of a task structure and `name` as 4 (still the offset) and the `task` symbol that is 0 too. This notation has a drawback, it defines the labels as global constants, so you can't have another struct or label declared with same name, to solve this problem you can use the following notation:
278306

279307
```asm
280308
struc task
@@ -291,8 +319,8 @@ Now if we have a memory location or register that contains our structure, for ex
291319
mov rbx, dword [(rax + task.id)]
292320
```
293321

294-
This is how to access a struct, besically we add the label representing an offset to its base address.
295-
What if we want to create an instance of it? Well in this case we can use the macros `istruc` and `iend`, and using `at` to access the fields. For example if we want create an instance of task with the values 1 for the id field and "hello123" for the name field, we can use the following syntax:
322+
This is how to access a struct, basically we add the label representing an offset to its base address.
323+
What if we want to create an instance of it? Well in this case we can use the macros `istruc` and `iend`, and using `at` to access the fields. For example if we want create an instance of task with the values 1 for the id field and "hello123" for the name field, we can use the following syntax:
296324

297325
```asm
298326
istruc task

99_Appendices/J_Updates.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,4 @@ Third Book Release
3636
* Fix tss structure in Userspace/Handling_Interrupt chapter
3737
* Add watchpoint information on gdb chapter
3838
* Add explanation on how to test when entering userspace
39+
* Fix some examples in the Nasm appendix, and add new section with loop cycle example.

0 commit comments

Comments
 (0)