Last Revision: | July 16th 1999 |
Email: | muproject@hotmail.com |
Latest versions available from: |
http://members.tripod.com/~MUProject/Latest.zip |
Reasons for FAQ
[Miz]
It will also serve as an introduction to the much more detailed unpacking
project which will be created shortly.
It came about after several requests for information on +Sandman's Newbie
messageboard. The topic is so broad the posts were often fragmented. As a result,
a suggestion was made to centralize many of the questions people had, and to create
a project that will offer real world examples.
Hopefully it will help to de-mystify this area of reversing, as well as stimulate
some interest in associated areas such as API hooking, decompilation, virii,
anti-anti-Sice measures, encryption, etc.
This 'open' FAQ will be updated as frequently as possible and readers are actively
encouraged to submit queries and/or contribute answers/ideas/knowledge.
| ||||
How can we ask questions and/or contribute to this project?
Email: muproject@hotmail.com | ||||
How do we know when this FAQ/Project has been updated?
[Miz] | ||||
I have completed 'PartX' of the mini-projects where do I post the tutorial?
[Miz]
This is to stop the mini-projects becoming a race and spoiling things for others.
Please read the 'Intro.txt' for more information.
Thanks. | ||||
Why would we want to unpack an executable?
[Miz] There are several reasons, but IMO the main one is so that you remain informed about exactly what a specific program is doing. Deliberately hiding a programs code and hiding which DLL's and API functions it uses should set off warning lights with people concerned with even basic privacy and security. In a Win9x environment any program has access to anything on your machine. Anything. I have no problem with the actual compression programs themselves, they provide a useful service and are excellent study material in their own right. My problem is with programs that can potentially abuse the side-effects of executable compression to hinder a target's examination and (if necessary) reversing. 'Baloney, you just want to patch the pants off it!' - purely a side effect, honest. Anyway, if a program's author has nothing to hide then he surely would not mind us checking for ourselves ;) By unpacking an executable down to the point that we can remove all traces of the original packer, repairing all the relevant PE data, we end up being able to dead-list code, use API monitors, make patches etc.etc. We unpack so we can see. | ||||
I've seen unpack utilities for BrandX, why can't I simply use them?
[Miz]
On top of that, it is (IMO) a very interesting area for study, and touches on many
other interesting areas. I've seen some hostile discussions on the importance of
understanding/ignoring the PE format. Maybe studying such things as manual
unpacking, virri, API hooking, etc, may make people understand it's relevance more,
and give some more insight into what the system is up to in the background.
| ||||
What sort of utilities are available to pack/encrypt a program?
[Miz]
Programs such as Petite, Neolite, Shrinker, ASPack, etc, are widely used schemes,
having been around for some time and consequently being more refined than others.
Check out those programs. Many have evaluation periods. Reading their documentation
gives a good background to the topic. Also check out the more basic ones from tools
sites. Sometimes you find gems, like ones that come with source code or a 'how it
was done' doc.
| ||||
I'm experienced at reversing, but this area is relatively new to me. Where's a good place to start?
[Miz]
Read, Read, then Re-read; particularly documentation like MattP's. Then, get hold of a small, simple executable - one you are familiar with. Pack it with an uncomplicated packer (suggest ASPack) and make it your aim to reverse it to being as close a copy of the original as you possibly can. Make it an ongoing project while you continue to search and read.
If you want to dive in straight away then check out Torn@do's packed crackme and
its accompanying unpacking doc (included in the 'Library/OldStuff' directory).
It skirts around the huge 'import' problem (that will be covered in much more detail
as part of the main project), but it may be of some help. Please bear in mind the
'doc' was never intended to be a tutorial as such. Much more information will be
provided here.
Alternatively, (and the way I learnt), after reading as much info as your brain can
handle, write your own PE 'modifiers'. My first attempt was a modifier that reversed
the order of all the bytes in the code section and swapped them back at runtime.
Pointless, I know ;) But it didn't involve delving in the PE format that much, and
its surprising how much such a trivial task can teach you. From there you can move to
more involved tasks like packing the code section. From there packing the import
section. etc.etc.
Sounds scary? Not with the resources available. I will keep referring to the excellent work that Stone did long ago in this area. I'm sure others have contributed equally as much, but for me Stone's work was a revelation. There's a simple pe-encryptor with source and docs on his site that can work as an great reference.
| ||||
Blimey! All I want to do is patch a single byte in the uncompressed exe, do I really have to do all this?
[Miz]
Leave the executable packed but divert the final 'jump' (to the original entry point)
to some of your own code that can then apply the patch to the now unpacked code/data.
For you to be able to do this properly however, you will need some knowledge of
caving or section construction; you'll need to write relocatable code that
references the patch address in a OS friendly way (i.e. uses the ACTUAL base address,
not the PREFERRED base address), etc.
This project, FAQ, and the references in Further Reading will give you
the knowledge to do these things, and hopefully much more besides.
| ||||
Is packing a program like zipping a proggy? [Jeff]
[Miz] | ||||
Is packing that simple and quick? [Jeff]
[Miz] | ||||
What is the purpose behind packing up a proggy? [Jeff]
[Miz]
Allegedly, just to make it smaller - 40-60% of original size is about normal. | ||||
Who would use such a packed proggy and why? [Jeff]
[Miz] | ||||
Are the changes to the proggy always the same? [Jeff]
[Miz]
Again the OS dictates some limitations, but other than that a packer/virus/encryptor
can be as creative as its creator ;) It is because of this that this FAQ and the resulting full project will try to avoid direct descriptions of particular versions of packers. Instead we hope it will provide enough background knowledge for people to be able to cope with new 'generations' of packers as they emerge.
| ||||
HOW do we KNOW when we dl a proggy that it is packed? [Jeff]
[Miz]
NOTE: Exe checkers will only know about packers they have been programmed to know about.
They normally just check for signatures (byte strings) in the same way that virus checkers do.
If you rely on an exe checker and it says a program is not packed then be sure to check it manually. It may be a latest revision of an existing packer which is not recognised yet.
| ||||
What, if any, tell tale signs tell us this proggy may be packed? [Jeff]
[Miz] | ||||
Does something happen in SoftIce or Win32Dasm to give us a clue? [Jeff]
[Miz] | ||||
What tools would we need in our unpacking arsenal? [Jeff]
[Miz]
Check any decent tools site and you will find dozens of utils for studying and
modifying pe-files. Take a look at as many as you can, sometimes they even include
source. Anything you can get your hands on may be of help.
Also, (it should go without saying), try writing your own. Check out Stone's site for some great starting points. The pe-encryptor he presents is a great foundation to this topic. Start with simple things and gradually work your way up.
| ||||
Packers/encryptors/Virri, etc, seem to be mentioned together quite often. Are they so similar?
[Miz]
Like a virus, unpackers often have to be resourceful in how they initialise. They may
use similar steps to find API addresses, memory etc. Unpackers often decrypt part of their unpacking code, like exe-encryptors, to make examination more difficult. They may also employ similar anti-debugging measures. Packers/Encryptors/Virri quite often have to use fully relocatable code, and as such, use many of the same tricks (like the call [NextInstruction]/pop pair used to get the current eip).
Links to some virri descriptions are provided in Further Reading. Search for some more to see the similarities. Take a look at the cabanas one, hmmm ;)
| ||||
What is a PE Header? [Jeff]
[Miz]
The important thing to grasp early on is that an executable is not just a binary
image of your code/data and nothing else. There are many pieces of additional system
information attached so that the OS knows exactly what resources your executable
will need.
It may be easier to think of an executable as a directory. In that directory are some files and some sub-folders containing more files. The structure of the whole directory tree will change from executable to executable. For example if your executable doesn't have any resources then the .RSRC section will be missing. If it contains debug information then a .DBG section may exist etc.
A simple scheme may be:
If you're more familiar with older COM files you'll be amazed at the amount of information that is stored. It does indeed make unpacking more involved that before, however, it also makes certain areas (like API monitoring etc) very easy. More on that later.
If this structure is pretty new to you then I recommend you get hold of a decent PE viewer and get a feel for the structure of a PE file before delving deeper. Alot of tools exist, but my own favorite is PEBrowse Professional. There's also MattP's PEDUMP util, complete with source to study, see Further Reading below.
Look again at the above layout.
[From MattP's Windows Secrets] | ||||
Can u tell at a glance, without tools, that the PE Header is screwy? [Jeff]
[Miz]
Anyways, 'screwy' PE Headers are normally a result of incorrect unpacking - something
we will hopefully be avoiding ;)
| ||||
What do the numbers we see in PE Header mean to us? (eg Entry point, Image Base, Virtual size, Virtual Offset, etc. etc.)
[From MattP's Windows Secrets]
Image Base
RVA (Relative Virtual Address) AKA Relative Offset
(virtual address 0x401464) - (base address 0x400000) = RVA 0x1464
Virtual Offset AKA Virtual Address
VirtualSize
Entry Point AKA AddressOfEntry
[Miz] | ||||
Is ProcDump the only way to fix a PE Header?
[Miz]
As mentioned earlier, the '.TEXT' section is the general name for the section containing code. Early versions of packers and encryptors only modified this section as it was by far the easiest to do, and 'protected' the main code. More modern packers give the ability to pack/encrypt most section types including imports and resources. Doing this stops people using APISpy programs and Resource grabbers as well as complicating the whole unpack operation. Sometimes packers pack all but the first
Icon group so that your program still retains its own icons when in explorer etc.
Remember, by the time the first instruction of the executable is executed most of the usual headers have been processed by the OS. So if you choose to pack these as well (for example the import section) then the system will have no knowledge of it and so the unpacker has to build it up itself.
This is where most people get confused and the sub-topic of import/reloc, etc, rebuilding, will be covered in detail in another section.
| ||||
What happens in the proggy when its packed? [Jeff]
[Miz]
Say you just wanted to pack the '.TEXT' (usually code) section.
A *very* simplified flow could be: | ||||
What happens when our packed proggy is run?
[Miz]
What is meant by 'In-situ'?
| ||||
So for a simple target, as described above, how would I go about unpacking it?
[Miz]
What you would like is an unpacked image, as close as possible to the original, and
one that works with the OS correctly.
Here's an example you may be more familiar with:
Now imagine a '.TEXT' section is packed. You know that depacker encounters it,
unpacks it and continues. *In principle* it's simply a case of letting the
unpacker unpack it, then stopping the unpacker unpacking it next time.
Why *in principle*?
Because there are many caveats, which we will see later,
but it's the concept that is important to grasp.
Think now about what wrapper-style packers (like Petite, Neolite, Shrinker,
ASPack etc.) can and can not do.
'If this is the case, then (again, in principle) surely 'dumping' the executable's
image to disk at this point would be all that was needed, yes?'
Again, yes and no ;)
'Ah, that's just because the entry point is pointing to the wrong place, soon fix
that.....'.
No (patience grasshopper...) - remember, the entry point is just a *tiny* bit of the
information needed by the OS in order to correctly process an exe. We really need
to be sure we are fixing *all* the relevant information required by the OS. This
is the (assumed) difficult part of unpacking. For us to be confident in our ability
to restore executables then we really need to be familiar with the actions of
unpackers, very familiar with the structure and [and functionality of] the PE Header, as well as the actions of the Windows loader.
| ||||
Ok, so what type of 'Sections' are there? What exactly are they, what do they mean, and where do they come from?
[Miz]
Sections themselves can be thought of as chunks of distinct code, data, system
resources, user resources, etc.
'But when I program I have no knowledge of these!'
It depends on what type of assembler/compiler/linker you use. The data is orgainized
in this way because it is an OS requirement, not a programming one. It makes no
real difference to most programs WHERE the stuff is stored, but it very important
that everything is in the right place and correct for the OS.
'So if I wrote a simple program like a messagebox that says 'UmBongo!' then how
would that look when compiled/linked into and executable?'
Your code would be placed in it's own section. The string (data) 'UmBongo!'
would be placed in its own section. You would have and import section that described
what relevant DLL's and API calls you used (User32 and MessageBox), any icons that
were created would go in a resource section, etc.
Here's some brief descriptions of commonly encountered section types:
Think of initialised data as things like strings (text), or a block of data to
decrypt; ie chunks of data with some predetermined value.
Think of uninitialised data as being variables (not predefined ones), empty
arrays, etc; ie blocks of data that will be filled by the executable with some
data at runtime, but at startup have no preset values.
'Why are these treated separately?'
Initialised data obviously needs room in the exe to be stored, whereas
uninitialised data does not. The Windows loader will MAKE the room for this data
when a file is loaded, but it requires no storage in the executable itself, other
than the section description. Hence the two types of data section.
The '.IDATA' section provides all the information the OS needs about what DLL's
and API calls were explicitly linked with the executable.
'explicitly linked?'
'So just by looking at this section, and without running the program itself,
I can figure out whether it uses winsock, mapi, even if it uses a messagebox?!'
Yes. That's how programs like QuickView/DLLShow etc. work, and more interestingly
how API spy programs can monitor API calls from executables. It is because of this that many packers choose to pack this section. Obviously by packing it you are 'hiding' it from such programs and (more importantly) us!
'So why is this 'painful' in terms of unpacking?'
Well, judging by the number of emails and posts, this is the thing that confuses
people the most. Understanding the concept of dumping, etc, comes quickly, but many
do not see the importance of correctly fixing this section. It is *vital*, in order for an
executable to work correctly all the time, in all enviroments, that it has a correct
import section.
'So why can't I just dump it, just before control passes to the original exe?'
Remember we said that for the original program to function correctly then all
it's code and data must be correct? Remember also that we said that an executable's
image contains much more information that just that? Remember that we said the
OS relies on this information to correctly provide the resources for an executable?
Also remember that by the time control passes to the original exe then all of it's
OS initialisation has already taken place?
'Well, if a packer has packed the import table then how on earth can the OS know what's
going on?'
The answer is it doesn't need to, *IF* the unpacker has done the work that
the OS would normally do for itself.
'What does the OS do with this section then?'
Take a look at an ascii-dump of any '.IDATA' section. You'll see strings; names of
DLL's and API functions. These are no use to an executable in that format. Somewhere,
somehow, these must be processed into a format more usable for the exe.
I'm going to divert for a bit, but you'll see why.......
Imagine your messagebox program again. When you do a call to 'MessageBox' the
compiler/linker does something that may at first seem very strange. If you look
at it in Softice you will see something like Call [USER32!MessageBox].
What is strange about that? Look more closely. This explanation from MattP's
book should clarify things. If not, keep re-reading. Its an important thing
to grasp.
[Extract From MattP's Windows Secrets]
'So for an unpacker to correctly initialise this table it must process the
names stored in the '.IDATA' section. How does it do this?'
API functions exist for this, they were mentioned previously, and are used by
programs that load and handle DLL's themselves. The more relevant calls are
GetModuleHandle, LoadLibrary, and GetProcAddress. Full docs on these functions
are supplied in separate .txt files with this package. Read them and see how
they would be applied to create the table.
'Alot of information to absorb, so where does that leave us in terms of unpacking
files and correctly restoring their PE Headers?'
As you can hopefully see by now, if we are intending to bypass (and remove!) the
unpacker completely then we really have to make sure that we have valid information
in the PE header so the OS can do all this for us again.
'How do we tackle this?'
There are many ways, and the way you choose will depend on how the unpacker handles
these things. Here's a simple example for a simple unpacker's scheme:
What we can do here is let it unpack the '.idata' section, but *importantly* stop
it from generating the IAT and wiping the contents of the original data. If we can
do this then we have a valid import table again, ready for the OS to process.
All we would need to do then is to alter the relevant 'Directory' settings in the
PE Header to point to the correct offset and size of this new (original!) import
section. Remember, we are doing all this so, eventually, we no longer have to
rely on ANY of the unpacker's code, and so can even remove the unpacker completely.
This is a simple explanation of a simple scheme. It's the thing I 'glossed over'
in the original post on Torn@do's ASPack packed crackme. I'm sure you can
appreciate why now ;) The version of ASPack used made this easy to do, but not
particularly easy to explain (I didn't want to confuse people too much!).
Other packers, and I'm sure future versions of ASPack, will not make things this
simple so make sure you understand the information here - read as many other
resources as you can (See Further Reading). The concepts will always be
similar, but expect things to be get tricky soon ;)
Now, where were we.....oh yes, section descriptions....
Remember that 'ImageBase' was our PREFERRED loading address, but that the OS could,
in theory, put it anywhere it wants? (It is very rare for the OS to do this, but
it's the fact that it can that is important).
If a .RELOC section exists, then in unpackers, just like in the windows loader, you
will see the code checking the preferred and actual imagebases. If they are the
same then the info in the .RELOC section will be ignored, otherwise it will need
to make some 'fixups' to anything that assumed it would be located at the preferred
image base. The areas to 'fix' are stored in this (.RELOC) section.
'How does this affect us?'
Well, if we have (in effect) dumped the code/data and fixed the import issue, but
NOT fixed the reloc section then we have code, etc, that will ONLY work if it is
located at the imagebase when the dump occurred. Like the import section, we need
to fix this so that we have a valid one for the OS to process should it decide later
that it wants the imagebase to be something else. Here's something that explains a
real world example:
[Extract From MattP's Windows Secrets]
It's important to note that the JMP and CALL instructions generated by
a compiler use offsets relative to the instructions, rather than actual offsets
in the 32-bit fiat segment. If the image needs to be loaded somewhere other
than the location the linker assumed was a base address, these instructions
don't need to change, since they use relative addressing. As a result, there
are not as many relocations as you might think. Relocations are usually
needed only for instructions that use a 32-bit offset to some data.
For example, let's say you had the following global variable declarations:
int i;
If the linker assumed an image base of 0x10000, the address of the variable
'i' will end up containing something like 0x12004. At the memory used to hold
the pointer ptr, the linker will have written out 0x12004, since that's
the address of the variable 'i'. If the loader (for whatever reason) decided
to load the file at a base address of 0x70000, the address of 'i' would then
be 0x72004. However, the pre-initialized value of the ptr variable would then
be incorrect because i is now 0x60000 bytes higher in memory.
This is where the relocation information comes into play. The .reloc
section is a list of places in the image where the difference between the
linker-assumed load address and the actual load address needs to be taken
into account."
Like the .IDATA section, the .EDATA contains a list of functions, only this time
they are the names of functions WITHIN the executable that are EXPORTED to other
modules. .EDATA sections are more frequent with DLL's (because they obviously
EXPORT the functions that the executable IMPORTs), but they can also (rarely)
occur within executables themselves. You will probably have noticed the
'Import Functions / Export Functions' controls within W32dasm. Load a few
executables into it and see. Then take alook at some dll's.....
If you want to read more on this (and things like export forwarding ;)) then
take alook at MattP's stuff. He goes into some detail there. For our purposes
(ie unpacking), it's just another section we'll have to deal with.
Everything a resource grabber grabs, and now you know how ;) Again, just like any other
section.
Note: If a program chose to pack this section then things like correct icons would
not be displayed in Desktop, explorer etc. For that reason many (if packing the .RSRC section) strip out the first icon group and store it unpacked elsewhere. That way the executable 'looks' correct when viewed by other programs via icons.
| ||||
It's been mentioned here that sections are unpacked over themselves. How can this work? Surely, if some sections are packed and others are not then some data would get overwritten because the unpacked data must be bigger!
[Miz]
Another slight digression now, but something you may wondered about before......
Imagine you wanted to patch some code in an exectable and have found the relevant
location using SIce (or whatever). It may give an address like 0x401234, and from
what you have learnt from the PE header, you know that the preferred ImageBase was
(say) 0x400000. If the disk image and the memory image were the same then the
offset in the executables disk image to patch would be
0x401234-0x400000 = 0x1234, right?
Wrong.
'So, what happens for these to be different?'
Well, the way the executables MEMORY image is layed out is specified by some
information within the PE Header. The OS uses this information to organize and
create the memory for the executable. It 'maps' the sections based on information
stored in the PE header.
Remember before, when we looked at the differences between initialised and
uninitialised data? Remember we said that for uninitialised data the PE Header
only specified how much 'room' it would need but required no actual storage in
the disk image? Lets look at how this was done. It's a simple example that
can be extended when looking at other section types.
Each section description within the section table has the following fields:
[..Extract from WinNT.h, remember to look here for the structure definitions....]
The answer lies in VirtualSize, VirtualAddress, SizeOfRawData and PointerToRawData.
Now go and take a look at a few section descriptions using a PE viewer before continuing
to read the rest of this answer. Get a feel for the type of numbers in them and how
they differ. Look at some unpacked files' sections as well as packed ones. You should
immediately see some big differences.
Done that? Good. Then the following will make more sense....
VirtualAddress vs PointerToRawData:
So, can you see how we could use this information to calculate the file offset for a
patch address (given it's address in the memory image) using the PE Header?
Heres an example:
(0x1234 - 0x1000 = 0x234, so our data is 0x234 bytes 'into' this particular section,
and the DISK image of this section is 0x200, so our the diskimage address
would be 0x200+0x234 = 0x434!).
'This is all very interesting, but I don't see the relevance!'
It is very relevant if you are going to manually dump files and sections as we
will see later. But for now the important ones to understand are the differences
in the 'size' members - VirtualSize and SizeOfRawData.
As you have some knowledge now about the differences between RVAs and raw offsets,
VirtualSize and SizeOfRawData should be quite simple. VirtualSize is the size of the
block of memory the OS will allocate for this particular section and the SizeOfRawData
is the size that section takes up in the diskimage.
'Why would these be different?'
For a number of reasons. We already have seen one big one in how the OS will map
an uninitialised data section. In this case it will set SizeOfRawData to zero, and
set the VIRTUALSIZE to the size actually needed by the program.
'But how does it apply to packing/unpacking?'
Well, imagine the following scenario:
Read it again if the answer is not immediately obvious.
Got it? Cool.
'Is it really that simple?'
Well, these changes obviously have knockon effects for the other sections.
They can not be changed in isolation. If you change the sizes of addresses for a
section then all the others would need to have THEIR addresses/offsets moved up or
down to compensate. Remember that we said some packers packed in-situ and others used blocks of memory allocated by the unpacker? The reasons why they choose different methods is
normally determined by the compression algorithm used. But you should now be
able to see how an in-situ packer can work with out overwriting other sections.
| ||||
How did u KNOW to change a C00000040 to a E00000020 (<-in your initial post?) What are the significance of those numbers?
[Miz]
If you refer to the original post and the WinNT.h header the significance should be clear.
We are basically changing the section characteristics back to what they really represent.
For example, packers may change the system charateristics to make sections writable (because
they write the unpacked code back to the section's memory, as allocated by the OS) and to fool
SIce and disassemblers, etc.
I would have thought this would screw up some OS, particularly NT with it's stricter policy
on memory management, but it apparently works......most packers seem to have adopted it.
Anyway, by changing it back we may even be fixing bugs in the packers ;)
[From MattP's Windows Secrets]
Some of the more important flags..... | ||||
'I have applied the the things described here and in 'Further Reading' and have
successfully reversed a 'BrandX' packed file! I am indestructible!
I am 'master reverser', hear me roar!'
[Miz]
Unpackers/encryptors etc have been adopted as 'protections' by many developers too
lazy to do their own. As such, when shown to have weaknesses, they will improve.
Remember, they are now commercial ventures, and it makes no commercial sense for them
to remain trivial to bypass. Expect new versions, new ideas and techniques.
More fun for us. At last, something interesting we can use to pass those long
lunchtimes and late night sessions that isn't all over in 3 seconds ;)
| ||||
Your Questions/Answers/Comments/Corrections/Contributions
| ||||
Further Reading
If you only ever read one book then read this, (ok, and Brave New World...).
There's an online version...search...just until your bookstore gets it back in
stock of course.......
(Note: obviously the online version is not there(!), but an updated source disk is, with
a new improved PEDUMP etc.)
As +Fravia points out, searching is the first tool. Seek and ye shall find.
|