Hello! This is the start of my reverse engineering journey. I’m starting from knowing almost nothing and hopefully will learn a lot along the way. Maybe this journey will help people learn things following my steps. I will try to write everything here. All of my dumb failures and epic wins. So here we go!
First things first I must say I have bare minimum understanding of assembly and it’s heavily needed in reverse engineering. So for my first step im going to copy paste a hello world code and rewrite it and try to learn everything on it. But before that we need to install ‘nasm’. nasm seems to be the compiler for assembly for lack of better word. I might be wrong but that’s the whole point of this log, to be wrong and learn from it.
I’m using arch linux so :
yay -S nasm
Now to copy paste the code in HelloWorld.asm:
I can already understand a few things from the comments but it’s not nearly enough for me. If this is the basic assembly I’m going to do my research about every single command there. but let’s not worry about that now and just compile this piece of code.
To start compiling we have to start with:
nasm -f elf HelloWorld.asm
elfis a file format
The command made another file called
Helloworld.o. ‘o’ extension means object file and we still dont have an executable. We have simply “assembled” our source file.
Now i tried this command to make my .o file an executable:
ld -o HelloWorld Helloworld.o
It returned an error:
So we have an error on our hand and a warning. error is saying we are inputing i386 architecture but output is i386:x96-64.
I looked it up and realized what’s the problem for our first error. Apparently when we ran the command
nasm -f elf Helloworld.asm we made a 32 bit object file and if we wanted to make 64bit we should have entered
nasm -f elf64 Helloworld.asm. This is a problem because apparently by default our ld command wants to make 64bit executable but we gave it 32 bit. so let’s rm our object file and try again with 64bit shall we.
nasm -f elf64 Helloworld.asm
ld -o HelloWorld Helloworld.o
and voila we were correct error is gone! I’m going to ignore the warning for now because it seems to correct itself. Bad practice yes but im already tired from debugging. XD
btw ld is the linker. linker I assume needs to link different parts of our code. so it needs to find entry point and needs to link other files and functions to it.
Now lets keep going. if
HelloWorld is an executable we should be able to run it.
Learning what our assembly program does and rewriting it
Let’s divide it into two parts: section and text.
“A section is the smallest unit of an object file that can be relocated. “
Object file seems to repeat itself a lot. I assume it’s something important. Object file means:
“An object file contains the same instructions in machine-readable, binary form”
Great so now that is out of the way. there seems to be many type of sections:
Sections containing the following material usually appear in relocatable ELF files:
The data section,
The bss section, and
The text section.
The text section is used for keeping the actual code. This section must begin with the declaration global _start, which tells the kernel where the program execution begins.
Ok, now we know what
section .text is, what does global start do. we have another start in the code. So what is their difference?
startis supposed to be entry point and
globalis a way for linker to know where our entry point is. We are using default entry point. there is a way to make custom entry point but i dont care tbh.
Ok, this is the tricky part. These are the names of registers.
There are ten 32-bit and six 16-bit processor registers in IA-32 architecture. The registers are grouped into three categories −
Control registers, and
The general registers are further divided into the following groups −
Pointer registers, and
And the registers in code above are all part of data registers. They are used for arithmetic, logical and other operations (such as printing things to screen). How we use registers depends on what action we are doing. In this case we are printing a string. Printing a string requires some kernel operation. this operation is called
sys_call. All of these operations are represented with a number. forexample
sys_write is 4,
sys_read is 3,
sys_exit is 1. We can find all about the
sys_calls in a c header file called
In order to access them we can use these comands:
eax to supply our
ebx for our file descriptor “stdout” (we get back to that later),
ecx for our string,
edx for the length of our string. There is a reason that we use these in this specific way but for now I try not to think about it and just memorize things. If I forget i just look them up. For example we can use
man to see argument structure of
and you go by the order of they are written.
fd refers to
buf refers to
count refers to
edx. I think this is how it goes but I might be wrong. Future me might be judging me heavily but for now that’s how I use the manual.
A file descriptor is a number that uniquely identifies an open file in a computer’s operating system. It describes a data resource, and how that resource may be accessed.
Kernel needs to know this number to perform operation for write. Forexample to know information about the type of file.
int doesn’t mean integer surprise surprise. It actually means interrupt.
An interrupt is a condition that halts the microprocessor temporarily to work on a different task and then return to its previous task. Whenever an interrupt occurs the processor completes the execution of the current instruction and starts the execution of an Interrupt Service Routine (ISR) or Interrupt Handler.
So we tell cpu to start doing our thing which is writing our string using
Now Time to Rewrite
I’m not going to just rewrite this. I’m going to improve on it based on what we learned. I’m going to print “enter your name” then read and input and then print it then gracefully exit the program.
so first I added the necessary variables.
section .data msg1 db "Please enter your name!", 0xa len1 equ $ - msg1 msg2 db "Your name is:", 0xa len2 equ $ - msg2
and then printed them on screen like how we learned.
now time to read from user input with sys_read.
First we need to add
.bss section. what it does is that it
reserves bytes for our input.
resb means reserve byte.
section .bss name resb 5
Then we just finish the code using
section .text global_start ;must be declared for linker (ld) _start: mov edx,len1 mov ecx,msg1 mov ebx,1 mov eax,4 int 0x80 mov eax, 3 ;sys_read mov ebx, 2 mov ecx, name mov edx, 5 int 0x80 mov edx, len2 mov ecx, msg2 mov eax, 4 mov ebx, 1 int 0x80 mov edx, 5 mov ecx, name mov eax, 4 mov ebx, 2 int 0x80 mov eax,1 ;system call number (sys_exit) int 0x80 ;call kernel section .data msg1 db "Please enter your name!", 0xa len1 equ $ - msg1 msg2 db "Your name is:", 0x20, 0x0 len2 equ $ - msg2 section .bss name resb 5
and we got it!
Please enter your name! pladimir Your name is: pladimir