home *** CD-ROM | disk | FTP | other *** search
Text File | 2000-05-25 | 80.9 KB | 2,272 lines |
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::. Feb/March 98
- :::\_____\::::::::::. Issue 3
- ::::::::::::::::::::::.........................................................
-
- A S S E M B L Y P R O G R A M M I N G J O U R N A L
- http://asmjournal.freeservers.com
- asmjournal@mailcity.com
-
-
-
-
- T A B L E O F C O N T E N T S
- ----------------------------------------------------------------------
- Introduction...................................................mammon_
-
- "An Introduction to SPARC assembly"............................+Spath.
-
- "Extending NASM"...............................................mammon_
-
- Column: Win32 Assembly Programming
- "NASM specific Win32 coding".......................Tamas Kaproncai
- "More about Text".........................................Iczelion
- "Keyboard Input"..........................................Iczelion
-
- Column: The C standard library in Assembly
- "C string functions: introduction, _strlen".................Xbios2
- "C string functions: _strcpy"...............................Xbios2
-
- Column: The Unix World
- "X-Windows in Assembly Language: Part II"..................mammon_
-
- Column: Virtual Machines
- "An Intro to the Java Virtual Machine"............Cynical Pinnacle
-
- Column: Assembly Language Snippets
- "NumFactors"..........................................Troy Benoist
-
- Column: Issue Solution
- "6-byte Solution"..........................................mammon_
- ----------------------------------------------------------------------
- ++++++++++++++++++++++++Issue Challenge+++++++++++++++++++++
- Write a routine for converting ASCII hex to binary in 6 bytes
- ----------------------------------------------------------------------
-
-
- ____________________________________________________________________________
- ___ .___ __) (__ _____ ______ ```
- ._____| \____\ ___/__._) /._) _ (_. \\
- | | _\ |_ \ | \/ | |CE ,
- .=|_____|___)\___|(_______|______| |===============[ Introduction ]===.
- '================================| :=================================='
- : . by mammon_
-
-
- The first thing that you will notice about this issue --well, that it is late--
- will probably be the section headers designed by iCE. I had to add a top/upper
- left border to them [the horizontal and slanted lines] in order to make them
- standout when scrolling though a 100K file such as this one, but other than
- they are all his: comments, etc welcome.
-
- I don't have much to say about this issue: I went overboard with the NASM stuff
- this month as I have been doing a lot of 'research' work in that area recently;
- my articles have been supplemented with Tamas Kaproncai's Win32 NASM pointers.
-
- Iczelion and XBios2 have both produced --as usual-- 2 quality articles this
- month, Iczelion's based on his win32 asm tutorial 'the MASM way', and XBios2
- once again continuing to replace C with assembler.
-
- +Spath. has produced an excellent article on SPARC assembly language; I was
- hoping to debut the 'other CPU' scene with a MIPS article I had planned but it
- looks like +Spath has beat me to it.
-
- On a similar note, I mentioned on the Message Board wanting to start a Virtual
- Machines column. Cynical Pinnacle has started the column off this month with an
- article on programming the Java VM in its native 'assembly language'; in
- subsequent issues I and perhaps others will be adding articles here as well.
-
- A final note, I have not come up with a challenge for the next issue; anyone
- with good ideas is welcome to post one to the Message Board or to the APJ
- email address.
-
- Enjoy the mag!
-
- _m
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................FEATURE.ARTICLE
- An introduction to SPARC assembly
- by +Spath.
-
-
- The goal of this article is to introduce SPARC v8 architecture and SPARC
- assembly ; I hope it can also constitute a good introduction to RISC
- philosophy.
-
-
- What is SPARC ?
- -----------------
- The principles of RISC (Reduced Instructions Set Computer) are born in the
- early 80's in two universities (Berkeley and Stanford) ; its philosophy is
- the quest for simplicity and CPU speed. SPARC (Scalable Processor ARChitecture)
- is a 32 bits RISC architecture created by Sun in 1987. It's an open
- architecture, so that any manufaturer can make SPARC processors (like Philips,
- VLSI, T.I., Fujitsu... already did). Its key features are :
-
- - a load/store architecture : this means that only registers can be used
- in data manipulation operations, and not memory locations. Memory is
- organised in a linear address space of 2^32 bits which use "big-endian"
- organisation (the MSB is stored first) ; a word is 32 bits wide (a 16 bits
- data is a halfword).
-
- - a large number of registers : from 2 to 32 sets of 24 general purpose
- registers are available ; these 24 registers are local registers %l[0-7],
- in registers %i[0-7] and out registers %o[0-7], all working in an
- overlapping windows mechanism that will be explained later. The SPARC
- architecture also provides 8 global registers %g[0-7], 32 registers for
- floating-point operations (%f[0-31]) and some specific registers (%pc, %sp,
- %psr, %y,...).
-
- - a small set of simple instructions : to avoid translation from machine code
- to microcode, SPARC instructions are directly implemented in hardware, and
- therefore are very basic (mainly load/store, logical, arithmetic, branching).
- All instructions are 4 bytes long, and most of them use 3 registers (source1,
- source2, destination in that order). Assemblers also provide a set of
- synthetic instructions, which are more "coder friendly", but does not really
- exist for the processor (and therefore must be carefully used). These
- synthetic instructions have most of the time less operands, so that the
- corresponding real instructions often use %g0, a read-only register stuck
- at 0 ; here are some aliases :
-
- synthetic instruction | real opcode
- nop <=> sethi 0, %g0
- ret <=> jmpl %i7+8, %g0
- mov reg_or_imm, reg <=> or %g0, reg_or_imm, reg
- cmp reg, reg_or_imm <=> subcc reg, reg_or_imm, %g0
-
- Enough with theory, let's see some code.
-
-
- SPARC assembly basics
- -----------------------
- Let's start with an in-season "hello world" style program : '!' is used
- for single line comments, /* .. */ is used for multiple lines comments).
-
- !8<------------------------------------------------------------------------
- /* FILE : hello.s */
-
- .section ".rodata" ! read-only initialised datas
- .MyText: ! define our string label
- .asciz "Happy new year %i \n"! define a null-terminated string
- .Year:
- .word 1999 ! define a word constant
-
- .section ".text" ! read-only object code (instructions)
- .global main ! Make function name globally visible
-
- main:
- save %sp, -112, %sp ! allocate space for stack
- sethi %hi(.MyText), %o1 ! load higher part of string offset
- or %o1, %lo(.MyText), %o0 ! add lower part of offset
- set (.Year), %l1 ! get year address
- ld [%l1], %o1 ! load year into %o1
- call printf ! print the string
- nop ! do nothing (BDS)
-
- ret ! Return to caller
- restore ! Restore register windows (BDS)
- Endmain: ! Tell the linker how big the
- .size main,(.-main) ! procedure is ("." is current address).
- !8<-------------------------------------------------------------------------
-
- Every procedure must save some memory space for itself ; this stack space
- will be used to store the out and local registers and all the datas needed by
- the procedure (the minimal space is 64 bytes for %o and %l registers). The
- stack grows from higher to lower addresses, so that allocating a stack space
- is implemented by substracting a value from the current stack pointer ; the
- previous stack pointer is called the frame pointer (%fp).
-
- Registers %o0 - %o5 are used to pass the first six parameters to a procedure,
- because the current stack pointer (%sp) is stored in %o6 and the calling
- program counter (%pc, used to calculate the return address) is stored in %o7.
- If a procedure has more than six parameters, the remaining parameters are
- passed using the stack space (eg for a caller's stack space of 92 bytes, the
- child procedure can get the seventh parameter at [%fp+92]).
-
- As I said before, all instructions are 32 bits long, so that you must use
- two steps (with sethi and or) to load a 32 bits data. Note that %hi refers
- to the most significant 22 bits and %lo refers to the least significant 10
- bits of a register.
-
- Like most RISC machines, a SPARC processor uses a branch delay slot (BDS) to
- optimize pipeline efficiency : this means that by default, the instruction
- following a branching is executed regardless of whether or not the branch is
- taken. So the coder must move (when possible) an instruction from before the
- branch to after the branch. Another possibility is to use the 'nop' instruction
- or to add the ',a' suffix to the branch instruction, which annul the next
- operation.
-
-
- Calling and branching
- -----------------------
- Let's take another example to better illustrate the calling process : this
- is a recursive implementation of the Fibonacci numbers, which are defined as :
-
- fib(N) = fib(N-1) + fib(N-2)
- fib(0) = fib(1) = 1
-
- !8<--------------------------------------------------------------------------
- /* FILE: fib.s */
-
- .section ".rodata" ! read-only initialised datas (constants)
- .align 8 ! datas must be double-words aligned
- .MyText: ! define our string label
- .asciz "Fib(%i) = %i \n" ! define a null-terminated string
-
- ! ------- FIB : handles F(0) and F(1) --------
- .section ".text" ! read-only object code (instructions)
- .align 4 ! code must be word-aligned (4 bytes)
- .global fib ! Make function name globally visible
- fib:
- save %sp, -112, %sp ! save stack space
- mov %i0, %o0 ! 1st parameter may be needed for calling
- cmp %o0, 1 ! asked for F(0) of F(1) ?
- ble F1orF0 ! yes : take the branch
- mov 1, %i0 ! return value = 1 (BDS)
-
- call fibcall !
- nop ! do nothing (BDS)
- mov %o0, %i0 ! return value = fibcall return value
- F1orF0:
- ret ! Return to caller
- restore ! Restore register windows (BDS)
- Endfib:
- .size fib,(.-fib)
-
- ! ----- FIBCALL : calls F(N-1) and F(N-2) -----
- .global fibcall ! Make function name globally visible
- fibcall:
- save %sp,-112,%sp ! save stack space
- mov %i0,%l0 ! save N in %l0
- call fib ! call F(N-1)
- sub %l0,1,%o0 ! compute N-1 (BDS)
-
- mov %o0,%i0 ! save result in %i0
- call fib ! call F(N-2)
- sub %l0,2,%o0 ! compute N-2 (BDS)
-
- ret ! Return to caller
- restore %i0,%o0,%o0 ! return F(N-1) + F(N-2) (BDS)
- Endfibcall:
- .size fibcall,(.-fibcall)
-
- !-------- MAIN --------------------------------
- .global main ! Make function name globally visible
- main:
- save %sp,-112,%sp ! save stack space
- call fibcall ! calculate fib number 7
- mov 7,%o0 ! (BDS)
-
- mov %o0,%o2 ! result is second parameter
- sethi %hi(.MyText),%o0 ! load higher part of string offset
- or %o0,%lo(.MyText),%o0 ! add lower part of offset
- call printf
- mov 7,%o1 ! number is the first parameter (BDS)
-
- ret ! Return to caller
- restore ! Restore register windows (BDS)
- Endmain:
- .size main,(.-main)
- !8<--------------------------------------------------------------------------
-
- All procedures share the global registers (%g[0-7]), the remaining registers
- %l[0-7], %i[0-7], %o[0-7] constitute the register window. When a procedure
- starts execution, it allocates 16 registers (input and local), the output
- registers are overlapped with the subroutine's input registers. Here's what
- happens if procedure A calls procedure B which calls procedure C :
-
- proc A in | local | out |
- | |
- proc B | in | local | out |
- | |
- proc C | in | local | out
-
-
- As you see, for each procedure, the parameters passed to and received from a
- subroutine are stored in %o registers. The same way, the parameters taken from
- and passed to the calling procedure are stored in %i registers. The current
- window pointer (CWP) identifies the current register window : it is stored in
- the least significant 5 bits of %psr, and is modified by the 'save' and
- 'restore' commands.
-
- The condition code register (another part of %psr) contains four flags : Z
- (zero), N (negative), C (carry), and V (overflow) ; contrary to x86 assembly,
- these bits are not updated by standard arithmetic operations, but by special
- instructions (with 'cc' suffix, like 'cmp' which is in fact a 'subcc').
- For instance, you have these equivalences :
-
- SPARC x86
- subcc r1, r2, r1 <=> sub r1, r2 ; keep result and flags
- subcc r1, r2, %g0 <=> cmp r1, r2 ; discard result, keep flags
- sub r1, 0, r2 <=> mov r2, r1 ; keep result, discard flags
-
-
- As last example, here's a simple anti-cracking method : a checksum on
- our own code :
-
- !8<-----------------------------------------------------------------------
- /* FILE: cksum.s */
-
- .section ".data" ! read-only initialised datas (constants)
- .align 8 ! datas must be double-words aligned
- .CRCError:
- .asciz "Wrong CRC !! \n" !
-
- .section ".text" ! read-only object code (instructions)
- .align 4 ! code must be word-aligned (4 bytes)
- cksum:
- save %sp,-64,%sp ! save minimal stack space
- mov %i0, %l0 ! %l0 is the base address
- sub %i1, %i0, %l1 ! %l1 is the decreasing index
- mov %g0, %l2 ! %l2 is the running sum
- loop:
- ld [%l0+%l1], %o0 ! fetch the next element
- add %l2, %o0, %l2 ! add it to the running sum
- subcc %l1, 4, %l1 ! one fewer element
-
- bge,a loop ! if %o0 >= 0 get next element
- ! (delay slot result is annulled)
- mov %l2, %o0 ! store the result in sum (BDS)
-
- ret ! Return to caller
- restore ! Restore register windows (BDS)
- endcksum: ! Tell the linker how big the
- .size cksum,(.-cksum) ! procedure is.
-
- !-------------------- MAIN -------------------------
- .global main
- main:
- save %sp,-64,%sp ! allocate space for stack
- set main, %o0 ! start address for cksum
- sethi %hi(EndOfCRCZone),%o1 ! high part of end address
- call cksum ! calculate cksum
- or %o1,%lo(EndOfCRCZone),%o1 ! low part of end address (BDS)
-
- EndOfCRCZone:
- set 0x10954, %o1 ! load precalculated checksum
- cmp %o0, %o1 ! is checksum correct ?
- be End ! yes : exit
- nop ! do nothing (BDS)
-
- Error: ! no : display message
- sethi %hi(.CRCError ),%o0 ! load higher part of string offset
- call printf ! print the string
- or %o0,%lo(.CRCError),%o0 ! add lower part of string offset (BDS)
-
- End:
- ret ! Return to caller
- restore ! Restore register windows (BDS)
-
- EndMain: ! Tell the linker how big the
- .size main,(.-main) ! procedure is.
- !8<-----------------------------------------------------------------------
-
-
- Tools and references
- ----------------------
- Here are the tools I use when I play with SPARC assembly ; some are SunOS
- specific tools, some are multi-platforms ones. The code you read here has
- been tested on various Sun workstations (using SPARC and UltraSPARC
- processors). With very little modifications, it also worked on ISEM, a SPARC
- emulator for Linux (see below).
-
- - assembling : I use gcc for that job, which itself uses as and ld (assembler
- and linker) to create ELF executables. CC and cc also work well ; these 3
- compilers can also be used to generate ASM source code from C source code
- with the "-S" option, which is IMHO a great method to learn assembly on
- a new platform.
-
- - debugging : I use adb, which is very basic but also very powerful, but gdb
- and dbx may also work.
-
- - reversing : all the previous tools are useful ; I also use a disassembler
- (SunOS dis), but some exist for other platforms (see Bruce Ediger's
- homepage).
-
-
- If you plan to give a try to SPARC assembly, here are some links :
-
- http://www.cs.unm.edu/~maccabe/classes/341/labman/labman.html
- ISEM (Instructional Sparc EMulator) homepage.
-
- http://www.csn.net/~bediger/
- Bruce Ediger Homepage
-
- http://www.cs.earlham.edu/~mutioke/cs63/
- a good introduction to SPARC with plenty of links.
-
- http://www.sics.se/~psm/sparcstack.html
- a very good overview of SPARC stack and registers.
-
-
- Final Words
- ------------
- If you use a SPARC based machine, give a try to assembly, it's quite fun.
- If not, remember that the best you know your processor, the best you
- can code ASM.
-
-
- ::/ \::::::.
- :/___\:::::::.
- /| \::::::::.
- :| _/\:::::::::.
- :| _|\ \::::::::::.
- :::\_____\:::::::::::...........................................FEATURE.ARTICLE
- Extending NASM
- by mammon_
-
-
- Programmers transitioning to NASM from a commercial assembler such as MASM or
- TASM immediately notice the lack of any high-level language structures -- the
- assembly syntax accepted by NASM is only slightly more sophisticated than what
- you would find in a debugger. While this has its good side --smaller code size,
- nothing hidden from the programmer-- it does make coding a bit more tedious.
-
- For this reason NASM comes with a preprocessor that is both simple and powerful;
- by writing NASM macros, the high-level functionality of other assemblers can be
- emulated rather easily. As thw following macros will demonstrate, most of the
- high-level asm features in commercial assemblers really do not do anything very
- elaborate; they simply are more convenient for the programmer.
-
- The macros that I will detail below provide some basic C and ASM constructs for
- use in NASM. I have made the complete file available at
- http://www.eccentrica.org/Mammon/macros.asm
- The macro file can be included in a .asm file with the NASM directive
- %INCLUDE "macros.asm"
- Comments on the usage of each macro are included in the file.
-
- Macro Basics
- ------------
- The fundamenal structure of a NASM macro is
- %macro {macroname} {# parameters}
- %endmacro
- The actual code resides on the line between the %macro and %endmacro tags; this
- code will be inserted into your program wherever NASM finds {macroname}. Thus
- you could create a macro to push the contents of each register such as:
- %macro SAVE_REGS 0
- push eax
- push ebx
- push ecx
- push edx
- %endmacro
- Once you have defined this macro, you can use it in your code like:
- SAVE_REGS
- call ReadFile
- ...which the preprocessor will expand to
- push eax
- push ebx
- push ecx
- push edx
- call ReadFile
- before assembling. It should be noted that all preprocessing takes place in a
- single stage immediately before compiling starts; to preview what the pre-
- processor will send to the assembler, you can invoke nasm with the -e option.
-
- The %macro tag requires that you declare the number of paramters that will be
- passed to the macro. This can be a single number or a range, with a few quirks:
- %macro LilMac 0 ; takes 0 arguments
- %macro LilMac 5 ; takes 5 arguments
- %macro LilMac 0-3 ; takes 0-3 arguments
- %macro LilMac 1-* ; takes 1 to unlimited arguments
- %macro LilMac 1-2+ ; takes 1-2 arguments
- %macro LilMac 1-3 0, "OK" ; takes 1-3 arguments, 2-3 default to 0 & "OK"
- The last three examples bear some explanation. The "-*" operator in the %macro
- tag specifies that the macro can handle any number of parameters; in other
- words, there is no maximum number, and the minimum is whatever number is to the
- left of the "-*" operator. The "+" operator means that any additional arguments
- will be appended to the last argument instead of causing an error, so that:
- LilMac 0, OK, This argument is one too many
- will result in argument 1 being 0 and argument 2 being "OK, This argument is
- one too many." Note that this is a good way to pass commas as part of an argu-
- ment (normally they are only separators). Providing defualt arguments after the
- number of arguments allows a macro to be called with fewer arguments than it
- expects.
- %macro SAVE_VARS 1-4 ecx, ebx, eax
- will fill a missing 4th argument with eax, 3rd with ebx, and 2nd with ecx. Note
- that you have to provide defaults starting with the last argument and working
- backwards.
-
- The parameters to the macro are available as %1 for the first argument, %2 for
- the second, and so on, with %0 containing a count of all the arguments. There
- is an equivalent to the DOS "SHIFT" command called %rotate which will rotate
- the parameters to either the left or to the right depending on whether a
- positive or negative value was supplied:
- Before: %1 %2 %3 %4 Before: %1 %2 %3 %4 Before: %1 %2 %3 %4
- %rotate 1 %rotate -1 %rotate 2
- After: %4 %1 %2 %3 After: %2 %3 %4 %1 After: %3 %4 %1 %2
- So that rotating by 1 will put the value at %1 into %4, and rotating by -1 will
- put the value of %1 into %2.
-
-
- High-Level Calls
- ----------------
- Perhaps the buggest complaint about NASM is its primitive call syntax. In MASM
- and TASM, the parameters to a call may be appended to the call itself:
- call MessageBox, hOwner, lpszText, lpszTitle, fuStyle
- where in NASM the parameters must be pushed onto the stack prior to the call:
- push fuStyle
- push lpszTitle
- push lpszText
- push hOwner
- call MessageBox
- Using NASM's "-*" macro feature along with the %rep directive make a high-level
- call easy to replicate:
- %macro call 2-*
- %define _func %1
- %rep &0-1
- %rotate 1
- push %1
- %endrep
- call _func
- %endmacro
- The %define directive simply defines the variable _func [underscores should
- prefix variable names in macros so you do not mistakenly use the same name
- later in the program] as %1, the name of the function to call. The %rep and
- %endrep directives enclose the instructions to be repeated, and %rep takes as a
- parameter the number of repetitions [in this case set to the number of macro
- parameters minus 1]. Thus, the above macro cycles through the arguments to call
- and pushes them last-argument first [C syntax] before making the call.
-
- Overloading an existing instruction such as call will cause warnings at compile
- time [remember, the preprocessor thinks you are doing a recursive macro invoke]
- so usually you will want to name the macro "c_call" or something similar. The
- following macros provide facilities for C, Pascal, fastcall, and stdcall call
- syntaxes.
- ;==============================================================-High-Level Call
- ; ccall FuncName, param1, param2, param 3... ;Pascal: 1st-1st, no clean
- ; pcall FuncName, param1, param2, param 3... ;C: Last-1st, stack cleanup
- ; stdcall FuncName, param1, param2, param 3... ;StdCall: last-1st, no clean
- ; fastcall FuncName, param1, param2, param 3... ;FastCall: registers/stack
- %macro pcall 2-*
- %define _j %1
- %rep %0-1
- %rotate -1
- push %1
- %endrep
- call _j
- %endmacro
-
- %macro ccall 2-*
- %define _j %1
- %assign __params %0-1
- %rep %0-1
- %rotate -1
- push %1
- %endrep
- call _j
- %assign __params __params * 4
- add esp, __params
- %endmacro
-
- %macro stdcall 2-*
- %define _j %1
- %rep %0-1
- %rotate -1
- push %1
- %endrep
- call _j
- %endmacro
-
- %macro fastcall 2-*
- %define _j %1
- %assign __pnum 1
- %rep %0-4
- %rotate -1
- %if __pnum = 1
- mov eax, %1
- %elif __pnum = 2
- mov edx, %1
- %elif __pnum = 3
- mov ebx, %1
- %else
- push %1
- %endif
- %assign __pnum __pnum+1
- %endrep
- call _j
- %endmacro
- ;==========================================================================-END
-
-
- Switch-Case Blocks
- ------------------
- One of the most awkward C constructs to code in assembly is the SWITCH-CASE
- block. It is also rather difficult to re-create as a macro due to variable
- number and length of CASE statements.
-
- NASM's preprocessor has a context stack which allows you to create a set of
- local variables and addresses which is specific to a particular invocation of a
- macro. Thus it becomes possible to refer to labels which will be created in a
- future macro by giving them context-dependent names:
- %macro MacPart1 0
- %push mac ;create a context called "mac"
- jmp %$loc ;jump to context-specific label "loc"
- %endmacro
-
- %macro MacPart2 0
- %ifctx mac ;if we are in context 'mac'
- %$loc: ;define label 'loc'
- xor eax, eax ;code at this label...
- ret
- %endif ;end the if block
- %pop ;destroy the 'mac' context
- %endmacro
- As you can see, the context is created and named with a %push directive, and
- destroyed with a $pop directive. NASM has a number of preprocessor conditional
- IF/ELSE statements; in the above example, the %ifctx [if current context equals]
- directive is used to determine if a 'mac' context has been created [Note that
- the 'base' NASM conditionals include %if, %elif, %else, and %endif; these carry
- over to the %ifctx directive, such that there is available %ifctx, %ifnctx,
- %elifctx, %elifnctx, %else, and %endif; all %if directives must be closed with
- an %endif directive]. Finally, %$ is used to prefix the name of a context-
- specific variable or label. Non-context-specific local labels use the %% prefix:
- %macro LOOP_XOR
- %%loop:
- pop eax
- xor eax, ebx
- test eax, eax
- jnz %%loop
- %endmacro
-
- The SWITCH-CASE macro that follows uses the syntax:
- SWITCH Variable
- CASE Int
- BREAK
- CASE Int
- BREAK
- DEFAULT
- ENDSWITCH
- Which could be implemented as follows:
- card db 0 ;card_variable
- Jack EQU 11
- Queen EQU 12
- King EQU 13
- ...
- SWITCH card
- CASE Jack
- add edx, Jack
- BREAK
- CASE Queen
- add edx, Queen
- BREAK
- CASE King
- add edx, King
- BREAK
- DEFAULT
- add d, [card]
- ENDSWITCH
- Note that SWITCH moves the variable into eax and CASE moves the value into ebx.
- ;===========================================================-SWITCH-CASE Blocks
- %macro SWITCH 1
- %push switch
- %assign __curr 1
- mov eax, %1
- jmp %$loc(__curr)
- %endmacro
-
- %macro CASE 1
- %ifctx switch
- %$loc(__curr):
- %assign __curr __curr+1
- mov ebx, %1
- cmp eax, ebx
- jne %$loc(__curr)
- %endif
- %endmacro
-
- %macro DEFAULT 0
- %ifctx switch
- %$loc(__curr):
- %endif
- %endmacro
-
- %macro BREAK 0
- jmp %$endswitch
- %endmacro
-
- %macro ENDSWITCH 0
- %ifctx switch
- %$endswitch:
- %pop
- %endif
- %endmacro
- ;==========================================================================-END
-
-
- If-Then Blocks
- --------------
- While the preprocessor provides support for if-then directives, it is a slight
- bit of work to cause that to generate the equivalent assembly language 'if'
- code [ the preprocessor 'if' is resolved before compile time, not at run time].
- Using macros, you can create if-then blocks with the following structure:
- IF Value, Cond, Value
- ;if code here
- ELSIF Value, Cond, Value
- ;else-if code here
- ELSE
- ;else code here
- ENDIF
- An example being:
- IF [Passwd], e, [GoodVal] ;e == equals or je
- jmp Registered
- ELSE
- jmp FormatHardDrive
- ENDIF
- The trickiest part about this macro sequence is the 'Cond' parameter. NASM
- allows condition codes [the 'cc' in 'jcc' that you findin opcode refs] to be
- passed to macros; these condition codes are simply the 'jcc' with the 'j' cut
- off -- 'jnz' becomes 'nz', 'jne' becomes 'ne', 'je' becomes 'e', and so on.
- The reason for this is that the condition code is appended to a 'j' later in
- the macro:
- %macro Jumper %1 %2 %3 ;JUMPER Reg1, cc, Reg2
- cmp %1, %3
- j%+2 Gotcha
- jmp error
- %endmacro
- The above code appends %2 to the 'j' with the directive j%+2. Note that if you
- use j%- instead of j%+, NASM will insert the *inverse* condition code, so that
- jz becomes jnz, etc. For example, calling the macro
- %macro Jumper2 %1
- j%-1 JmpHandler
- %endmacro
- with the invocation 'Jumper2 nz' would assemble the code 'jz JmpHandler'.
-
- The condition codes can be a bit tricky to work with; it is advisable to add a
- sequence such as the following to the macro file:
- %define EQUAL e
- %define NOTEQUAL ne
- %define G-THAN g
- %define L-THAN l
- %define G-THAN-EQ ge
- %define L-THAN-EQ le
- %define ZERO z
- %DEFINE NOTZERO nz
- so that you could call the IF macro as follows:
- IF PassWd, EQUAL, GoodVal
- ;if code here
- ...etc etc. Note also that the IF-THEN-ELSE macros put the passed values into
- eax and ebx for compatison, so these registers will need to be preserved.
-
- ;===========================================================-IF-THEN-ELSE Loops
- %macro IF 3
- %push if
- %assign __curr 1
- mov eax, %1
- mov ebx, %3
- cmp eax, ebx
- j%+2 %%if_code
- jmp %$loc(__curr)
- %%if_code:
- %endmacro
-
- %macro ELSIF 3
- %ifctx if
- jmp %$end_if
- %$loc(__curr):
- %assign __curr __curr+1
- mov eax, %1
- mov ebx, %3
- cmp eax, ebx
- j%+2 %%elsif_code
- jmp %$loc(__curr)
- %%elsif_code:
- %else
- %error "'ELSIF' can only be used following 'IF'"
- %endif
- %endmacro
-
- %macro ELSE 0
- %ifctx if
- jmp %$end_if
- %$loc(__curr):
- %assign __curr __curr+1
- %else
- %error "'ELSE' can only be used following an 'IF'"
- %endif
- %endmacro
-
- %macro ENDIF 0
- %$loc(__curr):
- %$end_if:
- %pop
- %endmacro
- ;==========================================================================-END
-
- For/While Loops
- ---------------
- The DO...FOR and DO...WHILE do nothing differnet from the previous macros, but
- are simply a different application of the same principles. The syntax for
- calling these macros is:
- DO
- ;code to do here
- FOR min, Cond, max, step
-
- DO
- ;code to do here
- WHILE variable, Cond, value
- It is perhaps easiest to illustrate this by comparing the macros with C code.
- for( x = 0; x <= 100; x++) { SomeFunc() }
- Equates to:
- DO
- call SomeFunc
- FOR 0, l, 100, 1
- Likewise,
- for( x = 0; x != 100; x--) { SomeFunc() }
- Equates to:
- DO
- call SomeFunc
- FOR 0, e, 100, -1
- The WHILE macro is similar:
- while( CurrByte != BadAddr) {SomeFunc() }
- Equates to:
- DO
- call SomeFunc
- WHILE CurrByte, ne, BadAddr
- Once again, eax and ebx are used in the FOR and WHILE macros.
-
- ;====================================================-DO-FOR and DO-WHILE Loops
- %macro DO 0
- %push do
- jmp %$init_loop
- %$start_loop:
- push eax
- %endmacro
-
- %macro FOR 4
- %ifctx do
- pop eax
- add eax, %4
- cmp eax, %3
- j%-2 %%end_loop
- jmp %$start_loop
- %$init_loop:
- mov eax, %1
- jmp %$start_loop
- %%end_loop:
- %pop
- %endif
- %endmacro
-
- %macro WHILE 3
- %ifctx do
- pop eax
- mov ebx, %3
- cmp eax, ebx
- j%+2 %%end_loop
- jmp %$start_loop
- %$init_loop:
- mov eax, %1
- jmp %$start_loop
- %%end_loop:
- %pop
- %endif
- %endmacro
- ;==========================================================================-END
-
-
- Data Declarations
- -----------------
- Declaring data is relatively simple in assembly, but sometimes it helps to make
- code more clear if you create macros that assign meaningful data types to
- variables, even if those macros simply resolve to a DB or a DD. The following
- macros demonstrate this concept. They are invoked as follows:
- CHAR Name, String ;e.g. CHAR UserName, "Joe User"
- INT Name, Byte ;e.g. INT Timeout, 30
- WORD Name, Word ;e.g. WORD Logins
- DWORD Name, Dword ;e.g. DWORD Password
- Note that when invoked with a name but not a value, these macros create empty
- [DB 0] variables.
- ;============================================================-Data Declarations
- %macro CHAR 1-2 0
- %1: DB %2,0
- %endmacro
-
- %macro INT 1-2 0
- %1: DB %2
- %endmacro
-
- %macro WORD 1-2 0
- %1: DW %2
- %endmacro
-
- %macro DWORD 1-2 0
- %1: DD %2
- %endmacro
- ;==========================================================================-END
-
- Procedure Declarations
- ----------------------
- Procedure declarations are another matter of convenience. It is often useful in
- your code to clearly delineate the start and end of a procedure; each of the
- PROC macros below does that, as well as creating a stack fram for the procedure.
- The ENTRYPROC macro creates a procedure named 'main' and declares main as a
- global symbol; the standard PROC declares the provided name as global. These
- macros can be used as follows:
- PROC ProcName Parameter1, Parameter2, Parameter3
- ;procedure code here
- ENDP
-
- ENTRYPROC
- ;entry-procedure code here
- ENDP
- Note that the Parameters to PROC are set up to EQU to offsets from ebp, e.g.
- ebp-4, ebp-8, etc. I have also included support for local variables, which
- will EQU to positive offsets from ebp' these may be used as follows:
- PROC ProcName Parameter1, Parameter2, Parameter3...
- LOCALDD Dword_Variable
- LOCALDW Word_Variable
- LOCALDB Byte_Variable
- ;procedure code here
- ENDP
-
- ;=======================================================-Procedure Declarations
- %macro PROC 1-9
- GLOBAL %1
- %1:
- %assign _i 4
- %rep %0-1
- %2 equ [ebp-_i]
- %assign _i _i+4
- %rotate 1
- %endrep
- push ebp
- mov ebp, esp
- %push local
- %assign __ll 0
- %endmacro
-
- %macro ENDP 0
- %ifctx local
- %pop
- %endif
- pop ebp
- %endmacro
-
- %macro ENTRYPROC 0
- PROC main
- %endmacro
-
- %macro LOCALVAR 1
- sub esp, 4
- %1 equ [ebp + __ll]
- %endmacro
-
- %macro LOCALDB 1
- %assign __ll __ll+1
- LOCALVAR %1
- %endmacro
-
- %macro LOCALDW 1
- %assign __ll __ll+2
- LOCALVAR %1
- %endmacro
-
- %macro LOCALDD 1
- %assign __ll __ll+4
- LOCALVAR %1
- %endmacro
- ;==========================================================================-END
-
- Further Extension
- -----------------
- Continued experimentation will of course prove fruitful. It is recommended that
- you read/print out chapter 4 of the NASM manual for reference. In addition, it
- is very helpful to test your macros by cpmpiling the source with "nasm -e",
- which will output the preprocessed source code to stdout and will not compile
- the program.
-
-
- ____________________________________________________________________________
- ______ _____. ____ ```
- ._____/\______.________._________. ._\___ |__\_ /. \\
- | | | _ | | (_ | __/ |CE ,
- .=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
- '===============| :==================================================='
- NASM specific Win32 coding
- by Tamas Kaproncai
-
-
- Contents
- ========
- 0. Preface
- 1. Compiling
- 2. Include files
- 3. Library files
- 4. Importing API functions
- 5. Calling API functions
- 6. WinMain
- 7. Window procedure
- 8. Sections
- 9. Self modification
-
-
- 0. Preface
- ==========
- I will introduce the win32 coding and I will focus on the NASM specific part.
-
- Downloadable working examples:
- ftp://ftp.szif.hu/pub/demos/tool/w32nasm.zip
- http://rs1.szif.hu/~tomcat/win32
-
- There is another tutorial on this topic, called:
- "The Win32 NASM Coding Toolkit v0.02 by Gij"
- that uses the LCC linker and the resource compiler which comes with LCC.
-
-
- 1. Compiling
- ============
- I'm working with the following free programs in connection with NASM:
- - linker: ALINK v1.5 by Anthony A.J. Williams.
- - resource compiler: GoRC v0.50b by Jeremy Gordon.
-
- The process of compiling a win32 program involves a number of steps which can
- be divided into three main processes: preparing the include files, preparing
- the library files, and writing the actual program.
-
- The compiling flow chart
- ------------------------
- .h -> ? -> .inc
- \
- .asm -> NASM -> .obj
- \
- .rc -> GORC -> .res -> ALINK -> .exe
- /
- .dll -> IMPLIB -> .lib (? means handwork)
-
-
- 2. Include files
- ================
- The include files (*.inc) must be generated from existing header files (*.h)
- that come with win32-compatible C or Pascal compilers. Files needed:
- WIN32N.INC (Thanks for the inital MASM version to S.L.Hutchesson).
- The compiler will be NASM version 0.97
- http://www.cryogen.com/nasm
- Usage: nasmw -fobj -w+orphan-labels -pwin32n.inc %1.asm
-
-
- 3. Library files
- ================
- Files Needed:
- WIN32.LIB
- The linker will be ALINK
- http://www.geocities.com/SiliconValley/Network/4311/#alink
- Usage: alink -oPE %1 win32.lib %1.res %2 %3
-
- More lib files can be created with IMPLIB.
- Example: IMPLIB DDRAW.DLL
-
-
- 4. Importing API functions
- ==========================
- EXTERN MessageBoxA
- IMPORT MessageBoxA use32.dll
-
-
- 5. Calling API functions
- ========================
- PUSH UINT MB_OK
- PUSH LPCTSTR title1
- PUSH LPCTSTR string1
- PUSH HWND NULL
- CALL [MessageBoxA]
-
-
- 6. WinMain
- ==========
- You don't need to use the name, WinMain:
- You must start the program with the label, ..start:
-
- At the begening there is nothing special in the stack, so you should call
- GetModuleHandleA for hInstance and GetCommandLineA for the command line.
- (Command line consists the full path, the file name and the parameters).
-
- You can exit the program with: RETN
- or you should call the ExitProcess function:
- PUSH UINT 0 ; the error code
- CALL [ExitProcess]
-
-
- 7. Window procedure
- ===================
- There are four parameters on the top of the stack:
- PUSH EBP
- MOV EBP,ESP
- %DEFINE hwnd EBP+8 ;handle of window
- %DEFINE message EBP+12 ;message
- %DEFINE wParam EBP+16 ;first message parameter
- %DEFINE lParam EBP+20 ;second message parameter
-
- You can handle the messages depends on WPARAM [wParam]
- and the rest you can pass to DefWindowProcA:
- PUSH LPARAM [lParam]
- PUSH WPARAM [wParam]
- PUSH UINT [message]
- PUSH HWND [hwnd]
- CALL [DefWindowProcA]
- POP EBP
- RETN 16
-
-
- 8. Sections
- ===========
- You need a code section:
- SECTION CODE USE32 CLASS=CODE
- and a data section:
- SECTION DATA USE32 CLASS=DATA
-
- You don't need bss section, instead of you should append
- every RESB, RESW, RESD, RESQ to the end of the source code.
- This zero data not will be included to the exe file.
-
-
- 9. Self modification
- ====================
- You can include your code and data together in one section:
- SECTION CODE USE32 CLASS=CODE
-
- In that case you need another object file, with only one line source:
- SECTION CODE USE32 CLASS=DATA
-
- ALINK will combine the properties of these two sections.
-
- EXTERN MessageBoxA
- EXTERN ExitProcess
-
- SECTION CODE USE32 CLASS=CODE
- ..start:
-
- PUSH UINT MB_OK
- PUSH LPCTSTR title1
- PUSH LPCTSTR string1
- PUSH HWND NULL
- CALL MessageBoxA
-
- PUSH UINT NULL
- CALL ExitProcess
-
- SECTION DATA USE32 CLASS=DATA
- string1: db 'Hello world!',13,10,0
- title1: db 'Hello',0
-
-
-
- ____________________________________________________________________________
- ______ _____. ____ ```
- ._____/\______.________._________. ._\___ |__\_ /. \\
- | | | _ | | (_ | __/ |CE ,
- .=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
- '===============| :==================================================='
- More about Text
- by Iczelion
-
-
- We will experiment more with text attributes, ie. font and color.
-
- Preliminary:
- ------------
-
- Windows color system is based on RGB values, R=red, G=Green, B=Blue. If you
- want to specify a color in Windows, you must state your desired color in
- terms of these three major colors. Each color value has a range from 0 to
- 255 (a byte value). For example, if you want pure red color, you should use
- 255,0,0. Or if you want pure white color, you must use 255,255,255. You can
- see from the examples that getting the color you need is very difficult
- with this system since you have to have a good grasp of how to mix and
- match colors.
-
- For text color and background, you use SetTextColor and SetBkColor, both of
- them require a handle to device context and a 32-bit RGB value. The 32-bit
- RGB value's structure is defined as:
-
- RGB_value struct
- unused db 0
- blue db ?
- green db ?
- red db ?
- RGB_value ends
-
- Note that the first byte is not used and should be zero. The order of the
- remaining three bytes is reversed,ie. blue, green, red. However, we will
- not use this structure since it's cumbersome to initialize and use. We will
- create a macro instead. The macro will receive three parameters: red, green
- and blue values. It'll produce the desired 32-bit RGB value and store it in
- eax. The macro is as follows:
-
- RGB macro red,green,blue
- xor eax,eax
- mov ah,blue
- shl eax,8
- mov ah,green
- mov al,red
- endm
-
- You can put this macro in the include file for future use.
-
- You can "create" a font by calling CreateFont or CreateFontIndirect. The
- difference between the two functions is that CreateFontIndirect receives
- only one parameter: a pointer to a logical font structure, LOGFONT.
- CreateFontIndirect is the more flexible of the two especially if your
- programs need to change fonts frequently. However, in our example, we will
- "create" only one font for demonstration, we can get away with CreateFont.
- After the call to CreateFont, it will return a handle to a font which you
- must select into the device context. After that, every text API function
- will use the font we have selected into the device context.
-
-
- Content:
- --------
-
- Below is our source code:
- ;======================================================================TEXT.ASM
- include windows.inc
- includelib user32.lib
- includelib kernel32.lib
- includelib gdi32.lib
-
- RGB macro red,green,blue
- xor eax,eax
- mov ah,blue
- shl eax,8
- mov ah,green
- mov al,red
- endm
-
- .data
- ClassName db "SimpleWinClass",0
- AppName db "Our First Window",0
- TestString db "Win32 assembly is great and easy!",0
- FontName db "script",0
-
- .data?
- hInstance HINSTANCE ?
- CommandLine LPSTR ?
-
- .code
- start:
- invoke GetModuleHandle, NULL
- mov hInstance,eax
- invoke GetCommandLine
- invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
- invoke ExitProcess,eax
-
- WinMain proc
- hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
- LOCAL wc:WNDCLASSEX
- LOCAL msg:MSG
- LOCAL hwnd:HWND
- mov wc.cbSize,SIZEOF WNDCLASSEX
- mov wc.style, CS_HREDRAW or CS_VREDRAW
- mov wc.lpfnWndProc, OFFSET WndProc
- mov wc.cbClsExtra,NULL
- mov wc.cbWndExtra,NULL
- push hInstance
- pop wc.hInstance
- mov wc.hbrBackground,COLOR_WINDOW+1
- mov wc.lpszMenuName,NULL
- mov wc.lpszClassName,OFFSET ClassName
- invoke LoadIcon,NULL,IDI_APPLICATION
- mov wc.hIcon,eax
- mov wc.hIconSm,0
- invoke LoadCursor,NULL,IDC_ARROW
- mov wc.hCursor,eax
- invoke RegisterClassEx, addr wc
- invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
- WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
- CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
- hInst,NULL
- mov hwnd,eax
- invoke ShowWindow, hwnd,SW_SHOWNORMAL
- invoke UpdateWindow, hwnd
- .WHILE TRUE
- invoke GetMessage, ADDR msg,NULL,0,0
- .BREAK .IF (!eax)
- invoke TranslateMessage, ADDR msg
- invoke DispatchMessage, ADDR msg
- .ENDW
- mov eax,msg.wParam
- ret
- WinMain endp
-
- WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
- LOCAL hdc:HDC
- LOCAL ps:PAINTSTRUCT
- LOCAL hfont:HFONT
-
- mov eax,uMsg
- .IF eax==WM_DESTROY
- invoke PostQuitMessage,NULL
- .ELSEIF eax==WM_PAINT
- invoke BeginPaint,hWnd, ADDR ps
- mov hdc,eax
- invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\
-
- OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
- DEFAULT_QUALITY,DEFAULT_PITCH or
- FF_SCRIPT,\
- ADDR FontName
- invoke SelectObject, hdc, eax
- mov hfont,eax
- RGB 200,200,50
- invoke SetTextColor,hdc,eax
- RGB 0,0,255
- invoke SetBkColor,hdc,eax
- invoke TextOut,hdc,0,0,ADDR TestString,SIZEOF TestString
- invoke SelectObject,hdc, hfont
- invoke EndPaint,hWnd, ADDR ps
- .ELSE
- invoke DefWindowProc,hWnd,uMsg,wParam,lParam
- ret
- .ENDIF
- xor eax,eax
- ret
- WndProc endp
-
- end start
- ;===========================================================================EOF
-
- Let's begin our analysis : )
-
- invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\
- OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
- DEFAULT_QUALITY,DEFAULT_PITCH or FF_SCRIPT,\
- ADDR FontName
-
- CreateFont creates a logical font that is the closest match to the given
- parameters and the font data available. This function has more parameters
- than any other function in Windows. It returns a handle to logical font to
- be used by SelectObject function. We will examine its parameters in detail.
-
- HFONT CreateFont(int nHeight, int nWidth, int nEscapement, int
- nOrientation, int nWeight, BYTE cItalic, BYTE cUnderline, BYTE cStrikeOut,
- BYTE cCharSet, BYTE cOutputPrecision, BYTE cClipPrecision, BYTE cQuality,
- BYTE cPitchAndFamily, LPSTR lpFacename);
-
- nHeight --> The desired height of the characters . 0 means use default size.
- nWidth --> The desired width of the characters. Normally this value should be
- 0 which allows Windows to match the width to the height. However, in our
- example, the default width makes the characters hard to read, so I use the
- width of 16 instead.
- nEscapement --> Specifies the orientation of the next character output
- relative to the previous one in tenths of a degree. Normally, set to 0. Set
- to 900 to have all the characters go upward from the first character, 1800
- to write backwards, or 2700 to write each character from the top down.
- nOrientation --> Specifies how much the character should be rotated when
- output in tenths of a degree. Set to 900 to have all the characters lying
- on their backs, 1800 for upside-down writing, etc.
- nWeight --> Sets the line thickness of each character. Windows defines the
- following sizes:
-
- FW_DONTCARE equ 0
- FW_THIN equ 100
- FW_EXTRALIGHT equ 200
- FW_ULTRALIGHT equ 200
- FW_LIGHT equ 300
- FW_NORMAL equ 400
- FW_REGULAR equ 400
- FW_MEDIUM equ 500
- FW_SEMIBOLD equ 600
- FW_DEMIBOLD equ 600
- FW_BOLD equ 700
- FW_EXTRABOLD equ 800
- FW_ULTRABOLD equ 800
- FW_HEAVY equ 900
- FW_BLACK equ 900
-
- cItalic --> 0 for normal, any other value for italic characters.
- cUnderline --> 0 for normal, any other value for underlined characters.
- cStrikeOut --> 0 for normal, any other value for characters with a line
- through the center.
- cCharSet --> The character set of the font. Normally should be OEM_CHARSET
- which allows Windows to select font which is operating system-dependent.
- cOutputPrecision --> Specifies how much the selected font must be closely
- matched to the characteristics we want. Normally should be
- OUT_DEFAULT_PRECIS which defines default font mapping behavior.
- cClipPrecision --> Specifies the clipping precision. The clipping precision
- defines how to clip characters that are partially outside the clipping
- region. You should be able to get by with CLIP_DEFAULT_PRECIS which defines
- the default clipping behavior.
- cQuality -->Specifies the output quality. The output quality defines how
- carefully GDI must attempt to match the logical-font attributes to those of
- an actual physical font. There are three choices: DEFAULT_QUALITY,
- PROOF_QUALITY and DRAFT_QUALITY.
- cPitchAndFamily --> Specifies pitch and family of the font. You must combine
- the pitch value and the family value with "or" operator.
- lpFacename A pointer to a null-terminated string that specifies the
- typeface of the font.
-
- The description above is by no means comprehensive. You should refer to
- your Win32 API reference for more details.
-
- invoke SelectObject, hdc, eax
- mov hfont,eax
-
- After we get the handle to the logical font, we must use it to select the
- font into the device context by calling SelectObject. SelectObject puts the
- new GDI objects such as pens, brushs, and fonts into the device context to
- be used by GDI functions. It returns the handle to the replaced object
- which we should save for future SelectObject call. After SelectObject call,
- any text output function will use the font we just selected into the device
- context.
-
- RGB 200,200,50
- invoke SetTextColor,hdc,eax
- RGB 0,0,255
- invoke SetBkColor,hdc,eax
-
- Use RGB macro to create a 32-bit RGB value to be used by SetColorText and
- SetBkColor.
-
- invoke TextOut,hdc,0,0,ADDR TestString,SIZEOF TestString
-
- Call TextOut function to draw the text on the client area. The text will be
- in the font and color we specified previously. The syntax of TextOut is as
- follows:
-
- BOOL TextOut(
-
- HDC hdc, // handle of device context
- int nXStart, // x-coordinate of starting position
- int nYStart, // y-coordinate of starting position
- LPCTSTR lpString, // address of string
- int cbString // number of characters in string
- );
-
- invoke SelectObject,hdc, hfont
-
- When we are through with the font, we should restore the old font back into
- the device context. You should always restore the object that you replaced
- in the device context.
-
-
- ____________________________________________________________________________
- ______ _____. ____ ```
- ._____/\______.________._________. ._\___ |__\_ /. \\
- | | | _ | | (_ | __/ |CE ,
- .=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
- '===============| :==================================================='
- Keyboard Input
- by Iczelion
-
-
- We will learn how a Windows program receives keyboard input.
-
- Preliminiary:
- ------------
-
- Since there's only one keyboard in each PC, all running Windows programs
- must share it between them. Windows is responsible for sending the key
- strokes to the window which has the input focus.
-
- Although there may be several windows on the screen, only one of them has
- the input focus. The window which has input focus is the only one which can
- receive key strokes. You can differentiate the window which has input focus
- from other windows by looking at the title bar which is highlighted.
- Actually, there are two main types of keyboard message. You can view a
- keyboard as a group of keys. For example, if you press the "a" key, Windows
- sends a WM_KEYDOWN message to the window which has input focus, notifying
- that a key is pressed. When you release the key, Windows sends a WM_KEYUP
- message. In this case, you treat a key as a button. Another way to look at
- the keyboard is that it's a character input device. When you press "a" key,
- Windows sends a WM_CHAR message to the window which has input focus,
- telling it that the user sends "a" character to it. In fact, Windows sends
- WM_KEYDOWN, WM_CHAR, and WM_KEYUP messages to the window which has input
- focus. The window procedure may decide to process all three messages or
- only the messages it's interested in. Most of the time, you can ignore
- WM_KEYDOWN and WM_KEYUP since TranslateMessage function call in the message
- loop translate WM_KEYDOWN and WM_KEYUP messages to a WM_CHAR message. We
- will focus on WM_CHAR in this tutorial.
-
-
- Content:
- -------
- ;=======================================================================KEY.ASM
- include windows.inc
- includelib user32.lib
- includelib kernel32.lib
- includelib gdi32.lib
-
- .data
- ClassName db "SimpleWinClass",0
- AppName db "Our First Window",0
- char WPARAM 20h ; the character the program receives from keyboard
-
- .data?
- hInstance HINSTANCE ?
- CommandLine LPSTR ?
-
- .code
- start:
- invoke GetModuleHandle, NULL
- mov hInstance,eax
- invoke GetCommandLine
- invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
- invoke ExitProcess,eax
-
- WinMain proc
- hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
- LOCAL wc:WNDCLASSEX
- LOCAL msg:MSG
- LOCAL hwnd:HWND
- mov wc.cbSize,SIZEOF WNDCLASSEX
- mov wc.style, CS_HREDRAW or CS_VREDRAW
- mov wc.lpfnWndProc, OFFSET WndProc
- mov wc.cbClsExtra,NULL
- mov wc.cbWndExtra,NULL
- push hInstance
- pop wc.hInstance
- mov wc.hbrBackground,COLOR_WINDOW+1
- mov wc.lpszMenuName,NULL
- mov wc.lpszClassName,OFFSET ClassName
- invoke LoadIcon,NULL,IDI_APPLICATION
- mov wc.hIcon,eax
- mov wc.hIconSm,0
- invoke LoadCursor,NULL,IDC_ARROW
- mov wc.hCursor,eax
- invoke RegisterClassEx, addr wc
- invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
- WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
- CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
- hInst,NULL
- mov hwnd,eax
- invoke ShowWindow, hwnd,SW_SHOWNORMAL
- invoke UpdateWindow, hwnd
- .WHILE TRUE
- invoke GetMessage, ADDR msg,NULL,0,0
- .BREAK .IF (!eax)
- invoke TranslateMessage, ADDR msg
- invoke DispatchMessage, ADDR msg
- .ENDW
- mov eax,msg.wParam
- ret
- WinMain endp
-
- WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
- LOCAL hdc:HDC
- LOCAL ps:PAINTSTRUCT
-
- mov eax,uMsg
- .IF eax==WM_DESTROY
- invoke PostQuitMessage,NULL
- .ELSEIF eax==WM_CHAR
- push wParam
- pop char
- invoke InvalidateRect, hWnd,NULL,TRUE
- .ELSEIF eax==WM_PAINT
- invoke BeginPaint,hWnd, ADDR ps
- mov hdc,eax
- invoke TextOut,hdc,0,0,ADDR char,1
- invoke EndPaint,hWnd, ADDR ps
- .ELSE
- invoke DefWindowProc,hWnd,uMsg,wParam,lParam
- ret
- .ENDIF
- xor eax,eax
- ret
- WndProc endp
- end start
- ;===========================================================================EOF
-
- Let's analyze it:
-
- char WPARAM 20h ; the character the program
- receives from keyboard
-
- This is the variable that stores the character received from the keyboard.
- Since the character is sent in WPARAM of the window procedure, we define
- the variable as type WPARAM for simplicity. The initial value is 20h or the
- space since when our window refreshes its client area the first time, there
- is no character input. So we want to display space instead.
-
- .ELSEIF eax==WM_CHAR
- push wParam
- pop char
- invoke InvalidateRect, hWnd,NULL,TRUE
-
- This is added in the window procedure to handle the WM_CHAR message. It
- just puts the character into the variable named "char" and then calls
- InvalidateRect. InvalidateRect makes a specified rectangle in the client
- area invalid which forces Windows to send WM_PAINT message to the window
- procedure. Its syntax is as follows:
-
- BOOL InvalidateRect(
- HWND hWnd, // handle of window with changed update region
- CONST RECT * lpRect, // address of rectangle coordinates
- BOOL bErase // erase-background flag
- );
-
- lpRect is a pointer to the rectagle in the client area that we want to
- declare invalid. If this parameter is null, the entire client area will be
- marked as invalid.
- bErase is a flag telling Windows if it needs to erase the background. If
- this flag is TRUE, then Windows will erase the backgroud of the invalid
- rectangle when BeginPaint is called.
-
- So the strategy we used here is that: we store all necessary information
- about how to paint the client area and generate WM_PAINT message to paint
- the client area. Of course, the codes in WM_PAINT section must know
- beforehand what's expected of them. This seems a roundabout way of doing
- things but it's the way of Windows.
-
- Actually we can paint the client area during processing WM_CHAR message by
- calling GetDC and ReleaseDC pair. There is no problem there. But the fun
- begins when our window needs to repaint its client area. Since the codes
- that paint the character are in WM_CHAR section, the window procedure will
- not be able to repaint our character in the client area. So the bottom line
- is: put all necessary data and codes that do painting in WM_PAINT. You can
- send WM_PAINT message from anywhere in your code anytime you want to
- repaint the client area.
-
- invoke TextOut,hdc,0,0,ADDR char,1
-
- When InvalidateRect is called, it sends a WM_PAINT message back to the
- window procedure. So the codes in WM_PAINT section is called. It calls
- BeginPaint as usual to get the handle to device context and then call
- TextOut which draws our character in the client area at x=0, y=0. When you
- run the program and press any key, you will see that character echo in the
- upper left corner of the client window. And when the window is minimized
- and maximized again, the character is still there since all the codes and
- data essential to repaint are all gathered in WM_PAINT section.
-
-
- ____________________________________________________________________________
- ::::::::::.___ . ```
- ::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
- :::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
- ::::::::::/ / | \ | - | \ | - | - | \/| - |
- .=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
- '=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
- :::::::::: | |------| - |
- :::::::::: | |______|______|CE
- . :
- C string functions: introduction, _strlen
- by Xbios2
-
-
- I. INTRODUCTION
- ---------------
- Beware: this is going to be long...
-
- String handling in assembly is - anyway - a difficult subject. There are few
- string-oriented x86 opcodes, and most of them are slow. There is not a standard
- library providing even basic functions. There is no string specific syntax in
- assembly, like C's printf('hello world') or, even worse, BASIC's a$=b$+'hello'.
- In a few words, if easy string-related programming is your goal, maybe you
- should consider PERL, or another text-manipulation language.
-
- Yet, string functions are really needed, since almost any program in assembly
- uses text for I/O. (An alternative to this would be using animated paper-clips
- to communicate with the user :)).
-
- Furthermore, coding those functions in assembly allows for smaller and faster
- functions. Actually many of the string functions in C _were_ written in
- assembly (e.g. strlen, strcat, strcpy, etc). Those can be divided in two
- categories:
-
- -'Traditional' functions, using the x86 string instructions
- -'Modern' functions, which run faster by being Pentium-optimized
-
- Borland C++ 4.02 and KERNEL32.DLL only have traditional functions. Borland's
- C++ Builder v1.0 (once given free as a demo) includes both types. MSVCRT.DLL
- (version 5) contains 'modern' versions.
-
- The three main aspects considered in these articles (and generally when compar-
- ing different versions of the same function) are speed, size and common sense.
-
- 'Common sense' indicates how easy it is to understand the way a function
- operates by reading the source code, how 'elegant' the code is. In a library
- module distributed as a binary (in a 'static' reuse of code), common sense is
- not important. It becomes important when the source code is distributed too,
- because it allows 'dynamic' reuse. 'Elegant' code can be easily optimized for
- specific needs or expanded to become a more general function.
-
- 'Size' is, obviously, the size of the resulting code. Besides creating smaller
- files, small size has two interesting 'side-effects'. It (usually) creates more
- elegant code and faster code (it decreases k, but it usually increases l (for
- an explanation of k and l see 'speed'). For very small functions like strlen it
- has the added advantage of allowing the code to be inlined without wasting too
- much space, thus decreasing k even more.
-
- 'Speed' indicates the number of cycles needed to execute the function. For
- simple string functions the number of cycles needed can be expressed as
-
- c=k+l*n
-
- where c is the total number of cycles, k is the number of cycles needed to
- 'prepare' the function, l is the number of cycles needed to process each chara-
- cter and n is the number of characters in the string. It is obvious that small
- values of c mean faster execution. In order to compare two versions of a
- function that run at speeds of
-
- c1=k1+l1*n and c2=k2+l2*n
-
- the ratio of c1/c2 is calculated:
-
- c1 k1+l1*n
- r=----=---------
- c2 k2+l2*n
-
- if r=1 then both versions run at the same speed.
- if r>1 then version 2 is faster. if r<1 then version 1 is faster.
- Simple maths prove that:
-
- 1. When n becomes infinite, r becomes equal to l1/l2. Especially if l1=l2,
- then r=1
-
- 2. If k1<k2 but l1>l2, c1<c2 (version 1 is faster) if n<(k2-k1)/(l1-l2).
-
- Point 1 means that for long strings speed is (almost) independent of the value
- of k. Especially if l1=l2 both versions will run at (almost) the same speed.
- Point 2 means that for small strings k strongly affects the value of c.
-
- For those of you that are fed up with maths, here is a simple example that
- demonstrates what I've been trying to say all this time :)
-
- If version 1 runs at c1=10+3*n and version 2 at c2=30+1*n then:
-
- -For strings up to 9 chars version 1 is faster
- -For strings of 10 chars both versions run at the same speed
- -For strings of 11 or more version 2 is faster
- -For strings of 50 chars, version 2 is 2x faster than version 1
- -For strings of 770 chars version 2 is 2.9x faster than version 1
-
- The problem is that none of the above versions can be classified as better than
- the other. Think of the parser of a compiler. It receives as input lines from a
- text file, which are strings longer than 10 characters, but also has to deal
- with tokens, which are short strings (in an assembler, three-char tokens are
- very common).
-
- Keep in mind that, while l depends only on the method used to implement the
- function, k also depends on the 'push arg/call/prepare stack/resore stack/
- ret/get arg' times. So if n is low, overall speed can be increased by inlining
- the code, thus subtracting from k the time needed to call the function.
-
- Well, I think you've had enough. Let's see all this stuff in practice.
-
-
- II. THE _STRLEN FUNCTION
- ------------------------
- Attention: especially for _strlen, ALL versions I have either written or found
- in libraries will be explained. This means you'll get source code for 8
- functions...
-
- size_t strlen(const char *s);
-
- Calculates the length of a string. strlen returns the number of characters in
- s, not counting the null-terminating character.
-
- _strlen is the simplest of the string functions. The 'traditional' way to
- implement it is through 'repne scasb'. BC 4.02 implements it as:
-
- ; ------------ version 1 ------------
- ; Borland C++ 4.02
- ; 25 bytes
- ; c=27+4*n
-
- _strlen proc near
- push ebp
- mov ebp, esp
- push edi
- mov edi, [ebp+8]
- mov ecx, -1
- xor al, al
- cld
- repne scasb
- not ecx
- lea eax, [ecx-1]
- pop edi
- pop ebp
- retn
- _strlen endp
- ; -----------------------------------
-
- A shorter, and a bit faster version of this would be:
-
- ; ------------ version 2 ------------
- ; Improved 'repne scasb'
- ; 18 bytes
- ; c=21+4*n
-
- _strlen proc near
- xor eax, eax
- push edi
- mov edi, [esp+8]
- or ecx, -1
- repne scasb
- sub eax, 2
- pop edi
- sub eax, ecx
- retn
- _strlen endp
- ; -----------------------------------
-
- The win32 API also includes a strlen function, called lstrlenA. It is based on
- 'repne scasb' as well, but you are _strongly_ advised to avoid it. It runs at
- c=56+4*n cycles.
-
- The most 'common sense' function (IMHO) is also the smallest:
-
- ; ------------ version 3 ------------
- ; Elegant and very small
- ; 15 bytes
- ; c=27+4*n
- ; k is so big because we have a retn immediately after a jump
- ; if a nop is added between those two, k drops to 13
-
- _strlen proc near
- or eax, -1
- mov ecx, [esp+4]
- loop1: inc eax
- cmp byte ptr [ecx+eax], 0
- jnz short loop1
- ; nop
- retn
- _strlen endp
- ; -----------------------------------
-
- Which gets a little less elegant but faster if tweaked a little:
- (The trick is that the carry flag is set by the 'cmp' instruction if the byte
- read is 0, else it is cleared. The 'inc' instruction doesn't affect the carry
- flag).
-
- ; ------------ version 4 ------------
- ; Very small and faster than repne scasb
- ; 15 bytes
- ; c=12+3*n
-
- _strlen proc near
- mov ecx, [esp+4]
- xor eax, eax
- loop1: cmp byte ptr [ecx+eax], 1
- inc eax
- jnc short loop1
- dec eax
- retn
- _strlen endp
- ; -----------------------------------
-
- But it gets even better as inlined code, as a macro:
-
- ; ------------ version 4.5 ------------
- ; Very small, extremely elegant macro
- ; 10 bytes
- ; c=8+3*n
-
- strlen macro srcreg, cntreg
- xor cntreg, cntreg
- cmp byte ptr [srcreg+cntreg], 1
- inc cntreg
- jnc $-5
- dec eax
- endm
- ; -----------------------------------
-
- This macro returns in cntreg the length of the string at srcreg.
- It uses no other registers, srcreg is unchanged, it is only 10 bytes long and
- it runs at a speed of only 8+3*n cycles. It also returns its value in any
- register, without altering the other registers.
-
- Suppose we need in ecx the length of the string at esi. The following code:
-
- push esi
- call _strlen
- pop ecx ; restore stack
- mov ecx, eax
-
- takes 9 bytes, only one less than the macro version. Plus, of course, the at
- least 15 bytes of code in _strlen.
-
- Another 'elegant' version, which is also small and much faster is the following:
-
- ; ------------ version 5 ------------
- ; Elegant, small and fast
- ; 16 bytes
- ; c=12+2*n
-
- _strlen proc near
- mov ecx, [esp+4]
- xor eax, eax
- loop1: mov dl, [ecx+eax]
- inc eax
- or dl, dl
- jnz short loop1
- dec eax
- retn
- _strlen endp
- ; -----------------------------------
-
- I believe that version 5 is the best version that could have elegance, speed
- and small size together. It can also be converted to a macro and inlined to
- drop to a speed of c=8+2*n (it will use one register more, but this register
- would anyway be lost if a call to the function was made).
-
- It also has what I believe is the smallest value of k. However, it doesn't have
- the smallest value of l. To reduce the cycles needed, data can be read not byte
- after byte but dword after dword. Here is a routine given by Agner Fog in his
- document on Pentium optimization (which you are _strongly_ advised to read):
-
- ; ------------ version 6 ------------
- ; [by Agner Fog] Very fast
- ; 61 bytes
- ; c=18+1*n (not exactly, as data is read in 4 byte blocks)
-
- _strlen proc
- mov eax, [esp+4] ; get pointer
- mov edx, 7
- add edx, eax ; pointer+7 used in the end
- push ebx
- mov ebx, [eax] ; read first 4 bytes
- add eax, 4 ; increment pointer
- l1: lea ecx, [ebx-01010101h] ; subtract 1 from each byte
- xor ebx, -1 ; invert all bytes
- and ecx, ebx ; and these two
- mov ebx, [eax] ; read next 4 bytes
- add eax, 4 ; increment pointer
- and ecx, 80808080h ; test all sign bits
- jz l1 ; no zero bytes, continue loop
- test ecx, 00008080h ; test first two bytes
- jnz short l2
- shr ecx, 16 ; not in the first 2 bytes
- add eax, 2
- l2: shl cl, 1 ; use carry flag to avoid a branch
- pop ebx
- sbb eax, edx ; compute length
- ret
- _strlen endp
- ; -----------------------------------
-
- The only problem with this routine is that it expects the string to be aligned
- on a 4 byte boundary. If the string is misaligned, the speed drops to
- c=24+1.75*n. In the extreme case that the string is misalinged AND it ends on a
- page boundary, the function will cause an access violation error.
-
- The fastest version (I have found) is the one in the Borland C++ builder
- library:
-
- ; ------------ version 7 ------------
- ; [C++ Builder, slightly modified] Fastest
- ; 88 bytes
- ; c=20+0.75*n (not exactly, see notes)
-
- _strlen proc near
- mov eax, [esp+4]
- test al, 3
- jnz short unalgn
-
- loop1: mov edx, [eax]
- add eax, 4
- mov ecx, edx
- sub edx, 1010101h
- and edx, 80808080h
- jz short loop1
- not ecx
- and edx, ecx
- jz short loop1
- test dl, dl
- jnz short minus4
- test dh, dh
- jnz short minus3
- test edx, 0FF0000h
- jnz short minus2
- jmp short minus1
-
- unalgn: add eax, 3
- xor cl, cl
- cmp byte ptr [eax-3], cl
- jz short minus3
- cmp byte ptr [eax-2], cl
- jz short minus2
- cmp byte ptr [eax-1], cl
- jz short minus1
- and al, 0FCh
- jmp short loop1
-
- minus4: dec eax
- minus3: dec eax
- minus2: dec eax
- minus1: mov ecx, [esp+4]
- dec eax
- sub eax, ecx
- retn
- _strlen endp
- ; -----------------------------------
-
- Actually, the original version is 90 bytes long. I have only changed the
- 'unalgn:' block, to reduce k if the string is unaligned.
-
- This function works well even on unaligned strings, as it first check the unali-
- gned bytes, and the proceeds in the main loop with aligned data (for unaligned
- strings it runs at c=31+0.75*n cycles). Since all dwords read are aligned,
- unaligned strings that end on page boundaries don't cause problems.
-
- This function is not always the fastest. If the string contains characters in
- the range 128 to 255 (i.e. signed bytes) the speed drops. If all the characters
- are signed (actually if at least one byte in every dword read), the speed
- becomes c=1.25*n. Of course most of the time (especially for english text) this
- is not the case, but if you have to process strings in another language that
- has characters in the range 128 to 255, it is a bit slower.
-
- Another fast version of strlen can be found in MSVCRT.DLL (the one I checked is
- version 5.00.7303). It runs at c=20+1*n, and handles unaligned strings almost
- like the Builder version. Misaligned strings give a value of k ranging from a
- minimum of 25 to a maximum of 52.
-
- What the MSVCRT function lacks completely is common sense and small size. In
- fact it is 144 bytes long and it is divided in two different pieces of the dll's
- code, causing most jumps to be in the long form.
-
- The main loop MSVCRT uses is good, but the rest of the function isn't. Based on
- that function, I came up with the following one:
-
- ; ------------ version 8 ------------
- ; My fast version
- ; 92 bytes
- ; c=17+1*n
-
- _strlen proc
- mov eax, [esp+4]
- xor ecx, ecx
- loop2: test al, 3
- jz loop1
- cmp byte ptr [eax], cl
- jz short ret0
- cmp byte ptr [eax+1], cl
- jz short ret1
- cmp byte ptr [eax+2], cl
- jnz short adjust
- inc eax
- ret1: inc eax
- ret0: sub eax, [esp+4]
- ret
-
- adjust: add eax, 3
- and eax, 0FFFFFFFCh
-
- loop1: mov edx, [eax]
- mov ecx, 81010100h
- sub ecx, edx
- add eax, 4
- xor ecx, edx
- and ecx, 81010100h
- jz loop1
- sub eax, [esp+4]
- shr ecx, 9
- jc minus4
- shr ecx, 8
- jc minus3
- shr ecx, 8
- jc minus2
- minus1: dec eax
- ret
- minus4: sub eax, 4
- ret
- minus3: sub eax, 3
- ret
- minus2: sub eax, 2
- ret
- _strlen endp
- ; -----------------------------------
-
- This one has the advantage of having k=17 for aligned strings and k=24 to 25
- for misaligned ones.
-
- The only question left to be answered is: 'Which version should we prefer?'.
-
- If your program does not include string handling in it's time-critical parts,
- I higly recommend either versions 5 or 4.5 (the inlined macro). As said before,
- the size overhead of the inlined version is very small (if any), and it has
- another advantage: it keeps the source code more readable, as it only involves
- the needed registers (input and output) in one single line.
-
- If string handling IS time-critical, I recommend version 8 (of course, it's
- mine... :)). Even then, the average size of the handled strings is to be consi-
- dered, as well as the percentage of unaligned strings. For unaligned strings of
- 16 or less characters, the fastest version would be an inlined version 5,
- running at c=8+2*n.
-
- The choice is yours....
-
-
- ____________________________________________________________________________
- ::::::::::.___ . ```
- ::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
- :::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
- ::::::::::/ / | \ | - | \ | - | - | \/| - |
- .=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
- '=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
- :::::::::: | |------| - |
- :::::::::: | |______|______|CE
- . :
- C string functions: _strcpy
- by Xbios2
-
-
- I. INTRODUCTION
- ---------------
- C syntax: char *strcpy(char *dest, const char *src);
-
- _strcpy copies string src to dest, stopping after the terminating null character
- has been moved, and returns dest.
-
- The 'traditional' way to do this is with the 'rep movs' instruction. BC 4.02
- and kernel32 use it. The problem is that it is rather slow (BC _strlen takes
- 53+5.5*n cycles, lstrlenA takes 74+5.5*n cycles, and optimizing their code
- leads to 46+5.5*n cycles wher n the number of chars, see part I of these
- articles). This is because even though the 'rep movs' instruction is fast it
- needs to know the number of bytes to copy in advance. So, _strcpy includes a
- _strlen function before the actual copying, which is implemented through 'repne
- scasb', a slow instruction.
-
- In this article we will examine two 'modern' _strcpy functions, found in
- MSVCRT.DLL and Borland C++ Builder library. Those functions are (supposed to be)
- optimized for Pentium processors. If you're not familiar with optimization for
- Pentium processors, I suggest you read the document on Pentium optimization by
- Agner Fog (http://announce.com/agner/assem).
-
-
- II. STRCPY IN MSVCRT
- --------------------
- ; c=39+1.75*n / 146 bytes
-
- strcpy proc
- push edi
- mov edi, [esp+8] ; dest
- mov ecx, [esp+0Ch] ; src
- test ecx, 3
- jz short loop1
-
- algn: mov dl, [ecx]
- inc ecx
- test dl, dl
- jz short one
- mov [edi], dl
- inc edi
- test ecx, 3
- jnz short algn
-
- loop1: mov edx, -81010101h
- mov eax, [ecx]
- add edx, eax
- xor eax, -1
- xor eax, edx
- mov edx, [ecx]
- add ecx, 4
- test eax, 81010100h
- jz short nozero
- test dl, dl
- jz short one
- test dh, dh
- jz short two
- test edx, 0FF0000h
- jz short three
- test edx, 0FF000000h
- jz short four
-
- nozero: mov [edi], edx
- add edi, 4
- jmp short loop1
- ;... in the DLL, there is code here, not used by strcpy
-
- one: mov [edi], dl
- mov eax, [esp+8]
- pop edi
- retn
-
- two: mov [edi], dx
- mov eax, [esp+8]
- pop edi
- retn
-
- three: mov [edi], dx
- mov eax, [esp+8]
- mov byte ptr [edi+2], 0
- pop edi
- retn
-
- four: mov [edi], edx
- mov eax, [esp+8]
- pop edi
- retn
- strcpy endp
-
- This procedure does the following:
- 1. Read arguments (src, dest) from stack
- 2. Check if src is aligned on a 4 byte boundary
- If not, copy byte after byte until src gets aligned
- 3. Loop
- Read one dword from src
- Test if there is a zero byte in the dword
- If no zero, copy dword to dest, loop back
- 4. Copy the remaining bytes
- 5. Return with dest in eax
-
- Actually the code above compiles to 130 bytes. The extra 16 bytes are added
- because between the loop and the 'one:' label there is the strcat function. So
- 4 conditional jumps take the 6-byte form, not the 2-byte one.
-
- This function takes 39+1.75*n. This means that the loop takes 7 cycles to
- execute (since each time the loop runs, it copies 4 bytes). Here is the explan-
- ation of the loop (U and V refer to the pipe the commands run in).
-
- loop1: mov edx, -81010101h ; U 1st
- mov eax, [ecx] ; V
- add edx, eax ; U 2nd
- xor eax, -1 ; V
- xor eax, edx ; U 3rd
- mov edx, [ecx] ; V
- add ecx, 4 ; U 4th
- test eax, 81010100h ; V
- jz short nozero ; U 5th
- ...
- nozero: mov [edi], edx ; U 6th
- add edi, 4 ; V
- jmp short loop1 ; U 7th
-
- The problem here is that both jumps run in the U pipe so they will not pair.
- Generally it's better to have an even number of instructions in each block of
- code. Just by moving one instruction this code will run in 6 cycles (i.e.
- 39+1.5*n cycles):
-
- loop1: mov edx, -81010101h ; U 1st
- mov eax, [ecx] ; V
- add edx, eax ; U 2nd
- xor eax, -1 ; V
- xor eax, edx ; U 3rd
- mov edx, [ecx] ; V
- test eax, 81010100h ; U
- jz short nozero ; V 4th
- ...
- nozero: mov [edi], edx ; U 5th
- add ecx, 4 ; V <<< moved instruction
- add edi, 4 ; U
- jmp short loop1 ; V 6th
-
- Everything pairs perfectly, and so 12 instructions only take 6 cycles. Pay
- attention to one thing: if 'add ecx, 4' and 'add edi, 4' are swapped, we get
- back to 7 cycles per loop, even though the pairing is the same. This is because
- the 'mov eax, [ecx]' instruction uses ecx to access memory, but ecx was changed
- in the previous clock cycle (add ecx, 4 / jmp short loop1). This causes an AGI
- stall (Address Generation Interlock), which wastes one cycle.
-
- As you 've noticed, _strcpy makes sure that the data read from src is aligned,
- because reading aligned dwords is faster. If src is aligned, the test only takes
- one cycle more, so it shouldn't trouble us. Yet, aligning src is not always a
- good idea. Suppose that you have an unaligned string and want to copy it in a
- buffer that is aligned. So what happens is that by aligning src we misalign
- dest. The problem is that misaligned writes are more expensive in cycles than
- misaligned reads. So _strcpy should either align dest or leave everything
- untouched. (not aligning src introduces an extremely small possibility of an
- access violation error, read section V below for details).
-
-
- III. STRCPY IN C++ B
-
-