Assembly and Cracking From the Ground Up |
An Introduction by Greythorne the Technomancer |
Assembly Beginnings |
Now that you have read the program development intro, you probably are starting to wonder After all, what good is a program when it can't be interactive? Okay, a program patcher.. but wait, it inputs from a file doesn't it? If you are already familiar with how dos calls are made, skip this part, Since the first thing that everyone wants to learn how to do is make a program display some silly hello world message, we will do that. First I need to explain registers (the hyper fast variables built into your x86 based pc compatible, pentium or 486 or such) In older high level programming languages, BASIC being the easiest that everyone has heard of (not visual basic, that is a monstoricty put together by our friends at microsquash, not Qbasic, not BASICA, but plain old BASIC that everyone and every machine emulated to some strange degree through the years for reasons of making everyine who wanted to learn a programming language be stuck with something that wasnt very useful) there is some way of loading a string and displaying it to the user. In the BASIC example the command was something like this: DATA "Hello Planet Hollywood"; or more simply: LET D$ = "Hello Planet Hollywood"; and the computer would spit out something - preferrably our message if we typed it right - to the screen. Assembly is not so different, but we are stuck with the first model. in assembly we say something of the sort: MYSTRING DB 'Hello Planet Hollywood"; and then we print it: We use DOS's print-a-string command by issuing this statement:
MOV AH, 09h concisely, here it is:
MYSTRING DB 'Hello Planet Hollywood','$' That wasn't so bad was it? Notice the '$' at the end of the string. There is another simple way assembly handles strings as well, called string-zero. In all honesty, with assembly you can make a print string routine that ends with anything you want, though it would be kinda useless because assembly provides $ and 0 terminated string commands free of charge already. You may later on want to develop your own for the purposes of encryption, since the 0 and the $ markers tend to stand out in a hex dump, but not always necessary if you encrypt the terminator character with the string you are encrypting, or using string commands that only print a certain number of characters, with no terminator required. Back to our example: I didn't mention that the data needs to be in a separate place or the program will try to execute the data as machine code commands, but that is okay, if you download the example .COM and .EXE skeleton files I have put online, you will notice that there is a place for data and a place for your actual executing code. The program jumps past the data when it starts so it never tries to mistake the data as if it were assembly instructions. That also isn't much different than BASIC, people tended to have to put the DATA at the end of the program after it was complete anyway, so nothing has changed. Considering that all decent programming languages have to be written in assembly anyway, it really shouldn't surprise you that they use the same type of string data format. HOW THE REGISTERS WORK Above, I showed you that you could use DX as a string variable. In BASIC you have any variables you want... Here is the rundown of your general purpose registers: The above general purpose registers are exactly that, general purpose. You can interchange them in your own code, but for DOS int 21 calls and such, it expects certain ones to be exactly as written. The "Hello" example I showed you above is a good example. The ACCUMULATOR (AX) is the most high profile one though. It has many standard uses. It tends to 'accumulate' everything. When you exit a program, or a subroutine of some kind, the results tend to wind up there, Error codes and results of arithmetic are the most usual, and when calling a subroutine or another program it is also used to put the code for a command, such as the AH=9 example from above. Each of these are 16 bit registers, which means they hold 16 bits (2 bytes) of information. and the two bytes can be accessed directly in each of them (called low and high) AX therefore is made up therefore of AH and AL (The above are 16 bit examples, the 32-bit registers are simply named EAX, EBX, ECX, and EDX) NOW we have the ones that become more specialized in usage: These next two are usually for copying memory arrays or strings DI - Destination Index (Where to move things to) then there's this one: BP - Base Pointer Very often, the SI, DI, and BP are used to keep track of where you are in the code - it really doesn't matter which you use until you interface your code with something that expects one rather than the other. It tends to happen alot. Examine virus code sometime - you will see quite a few that use SI and quite a few that use BP. There is a special one that tends to be like putting the hand of god into the program. It is the Instruction Pointer. Why is this important? Here is a neat trick, say for instance you are in a loop in a debugger like softice and want to get out. Viruses (again using the little nasties as example) tend to make use of IP as well. The EXE file format includes a header at the beginning of the program that informs DOS which segment of memory is to be used when you start (CS) for current segment, and which instruction to start at (our friend IP). What this means is that the virus which runs at the end of the program normally wouldn't be executed, but altering the header of any normal EXE file so that it sets CS:IP to start running the program at the end allows the virus to execute first. Then CS and IP are set back to what they should be to point tt the start of the program. Rather creative really - whoever first thought that up. For fun, take a look at the symbiote I wrote. It does that very same thing - it is the way you add code to a program. Files ending with .COM are a little different, but a little simpler. IF YOU HAVEN'T DONE THIS BEFORE: Go ahead and modify either the .COM file skeleton or the .EXE file example to do the hello world example from above. Considering that most DOS calls use the same basic method - once you get the hang of it, you will not have much trouble calling others. In the next lesson or so I will be dealing with user interactivity from getting input to reading the command line. If you want to get ahead, though it isn't necessary until the next lesson, try finding out how to input data from the user, or even more enterprising, looking into the fact that offset 80h records the length of the command line, and 82h is the offset of the command line input might be fun. I will cover all of that of course, but reading ahead never hurt anyone ;) |
+gthorne'97 |