asmDiff is an binary assembly search, diff and disassembly tool. It supports Windows PE (exe/dll) and Linux ELF binary format compiled for x86 and x68_64 architectures. It is particular useful when searching for asm functions, instructions or memory pointers in a patched, updated or otherwise modified binary. Try out the live demo and read the paper below.
Note: this project is not finished yet, so there may be TODOs and bugs. We stay closed source as long as we do not know how to proceed with it. Feel free to contact us on questions, suggestions, or license issues.
For quick starters: upload two exe/dll files and do: "asmdiff search 0x00402000 -o old.exe -n new.exe"
The "diff" mode is not ready yet. But the far more powerful search and batch search mode are implemented.
==duschkumpane==
|=-------------------------------=[ ASM DIFF ]=-------------------------------=|
|=----------------------------------------------------------------------------=|
|=------------------=[ by defragger - rene@0xaa55.org ]=----------------=|
|=------------------=[ and willsteel - michael@willigens.de ]=----------------=|
|=----------------------------------------------------------------------------=|
|=-----------------------------=[ Aug 04, 2011 ]=-----------------------------=|
<---------------------------( 0x00 Table of Contents )------------------------->
- 0x10 Introduction
- 0x20 PE file extraction
- 0x30 Detecting logical blocks
- 0x40 Finding function bounds
- 0x41 Recursive method
- 0x42 Linear method
- 0x43 Combined method
- 0x50 Function matching
- 0x51 ASM metrics
- 0x52 Metric distance
- 0x53 Metric strength
- 0x54 Metric uniqueness
- 0x55 Metric rank
- 0x56 Metric score
- 0x60 Data-Section
- 0x61 Reverse lookup
- 0x70 Diffing ASM binary code
- 0x80 Use Cases
- 0x81 Binary Search
- 0x82 DLL Search
- 0x83 Binary Diff
- 0x84 Security Issues
- 0x90 Next steps
- 0x91 Branch instruction blocks
- 0x92 Call hierarchy graph
- 0xA0
<------------------------------------------------------------------------------>
|=---------------------------=[ 0x10 Introduction ]=-------------------------=|
The process of reverse engineering is an important part of hardening soft- and
hardware systems. Because this method is rather expensive in a matter of
development time, it can be very helpful to find differences or similarities
between two assembly binaries of a different version. But this is not a trivial
task. Some of the most obvious problems are: address shift, obfuscation,
register usage, compile flags, etc..
The approach of ASMdiff is to ignore the impact of those effects. This is done
by using assembler code metrics that are robust against these effects.
ASMdiff was started as part of a reverse engeenering project. The scientific
nature of this approach made us work on this article.
|=-------------------------=[ 0x20 PE file extraction ]=---------------------=|
First part in building a reverse engineering tool is understanding the format
of target binaries. The most important file-format for reversers is Microsoft's
Portable Executable (also called PE). The second most interesting format is
Unix/Linux ELF. For PE files any nessecary information can be found in MS's
header file <winnt.h>. To reverse a given binary assembler file it is
inevitable to read the various binary sections of the file. Those sections
contain the executed code (.text sections) and initialized or uninitialized
data (.data sections). There are platform specific differences but the basic
idea behind it the same.
The first part in the PE file is the IMAGE_DOS_HEADER (winnt.h). The most
important part here is the e_lfanew member of this struct. Its the offset to the
IMAGE_NT_HEADER which contains the IMAGE_OPTIONAL_HEADER. The various sections
are lying directly behind the IMAGE_NT_HEADERS:
--
| void* first_section = IMAGE_DOS_HEADER->e_lfanew+sizeof(IMAGE_NT_HEADERS)
|__
That is the offset from the beginning of the opened file. The next section is
sizeof(IMAGE_SECTION_HEADER) bytes behind the previous one. Section count is
defined by IMAGE_NT_HEADER->FileHeader->NumberOfSections. The CPUs entry-point
to a PE binary is:
--
| IMAGE_OPTIONAL_HEADER->AddressOfEntryPoint+IMAGE_OPTIONAL_HEADER->ImageBase
|__
This section is also called the ".text" section of a PE file. Normally, the
address of this section is:
IMAGE_OPTIONAL_HEADER->ImageBase+section->VirtualAddress
The offset to the data of that section is:
--
| section->PointerToRawData
|__
Finally the length of a section can obtained by:
--
| section->Misc->VirtualSize.
|__
There are other sections that can be of interest as well, such as ".tls" or
".rdata" sections that contain static content.
|=----------------------=[ 0x30 Detecting logical blocks ]=-------------------=|
In order to show differences between arbitrary data one must be able to detect
certain logical blocks from the nature of that data. In case of text files this
is easy: newlines, paragraphs, pages. In the absence of logical blocks, one can
only do things such as measuring the edit distance. When diffing machine level
code it is obvisous to take the function calls and their boundaries as blocks.
Other blocks can be defined as well but using the function bounds is a really
good start.
We are currently evaluating the usage of branch instructions as logical blocks.
This is more complex but can help when functions are getting inlined. Also the
call hierarchy which forms a graph like structure can be taken into account
when comparing assembly blocks. A generic approach would be to use pattern
recognition techniques to detect the logical blocks of the data automatically.
This can be helpful when the nature of the analized data is completely unkown.
|=----------------------=[ 0x40 Finding function bounds ]=--------------------=|
To find assembler function boundaries we first need to find the function
entry-points. Those entry-points are typically referenced by a "CALL"
instruction. Most of them are followed by the relative offset to the real
entry-point address. There are three common ways of function calls:
1. The regular relative offset call instruction:
CALL 0x0052F358
Note: The destination address is the relative offset from the next instruction.
Meaning the length of the CALL instruction itself has to be taken into
consideration. On 32Bit x86 assembler this is 5 bytes. This is where the mystic
"len+=5" from various cadecaving tutorials came from.
2. The called address is loaded as a static memory pointer into a register.
After that a call is made on that register. Those calls may look like this:
MOV EAX, [0x00B6EF2C]
CALL EAX
3. The offset address is calculated during runtime in non-static fashion. This
is most commonly used on callback methods, where a function pointer is attached
to some kind of memory structure.
MOV EAX, [ECX+0xD1]
CALL EAX
Its is hard to detect entry-points of dynamic function callbacks without
running the programm itself. So we start by collecting the entry-points of the
direct referenced functions. Following pseudo code illustrates this step:
--
| while (instruction = assemble_next())
|
| if (instruction.op == 0xE8) /* x86 CALL */
| add_entry_point(current_offset + current_cmd_len + instruction.arg1)
|
| if (instruction.asm == 'push' || instruction.asm == 'mov')
| if (address_within_textsection(instruction.argX))
| add_entry_point(instruction.argX)
|__
There are other ways of asm function calls, but we make a break here to face
the next problem of finding the function boundaries. At this time only the
entry-points are known, but we need to know where a function starts, where it
ends and where the next function starts. To make things more complicated, there
are some functions where the entry-point is not the starting offset of a
function. This happens when a function makes a back-jump in a region somewhere
before the entry-point.
|=-------------------------=[ 0x41 Recursive method ]=------------------------=|
To find a functions boundaries we need to follow the path that the CPU will
take through this function. As there are many conditional jumps we need to be
able to follow each resulting sub-path through the function. That leads to the
first basic approach by following each sub-path in recursive fashion until
certain abort conditions are met. Following conditions suit our needs:
1. reaching an asm RET, RETN
2. crossing the previous or next asm function
3. reaching an invalid instruction
4. crossing another sub-path that has already been analyzed
The advantage of this method is its accuracy. On the other hand, the major
drawback is its performance. In regular assembly binaries there tons of
assembler function with ton of sub-path that we need to follow. Sure the
overall number can be reduced by making some assumprions. But even then it
takes several minutes to analyze a 3Mbyte text section on a recent CPU. This
leads us to the next paragraph.
|=---------------------------=[ 0x42 Linear method ]=-------------------------=|
Most of the asm functions start with their entry-point address and stop right
before the first instruction of the next method. Sure, the used compiler could
behave different, but most implementations behave this way because they are
optimized for speed and it costs some CPU cycles to do a back jump. We can
benefit of this fact by gathering as much information as possible in a single
run through the text section. Following pseudo code illustrates this process:
--
| function;
| next_function;
| last_function;
| offset;
| isntruction;
|
| while (instruction = assemble_next())
| function->max_address = Math.max(offset, function->max_address)
|
| boolean FIN = false;
|
| if (instruction == JUMP)
| new_offset = get_jump_offset(instruction)
| if (new_offset > function->max_address < next_function->min_address)
| function->max_address = new_offset
| if (new_offset < function->min_address > last_function->max_address)
| function->min_address = new_offset
|
| // set terminator on negative jumps or reaching of next function
| if (newoffset > offset || new_offest >= mext_function->min_address)
| FIN = true;
|
| if (isntruction == "RET"/"RETN"/"INVALID")
| FIN = true;
|
| if (FIN && off <= function->max_address)
| function = next_function ...
| skip_assemble_to(function->entry_point)
|__
This way we are only pushing the max_address of each function until the abort
criteria is reached. As there are:
1. RET / RETX reched
2. Invalid instructions
3. Unconditional back-jumps
The min_address is also pushed upwards, but resulting code that has not been
analyzed yet is not processed. So we can detect a single back-jump but not a
back-jump where the reached code also includes another back-jump.
|=--------------------------=[ 0x43 Combined method ]=------------------------=|
The last thing that can happen now are gaps between function blocks where
certain double back-jumps were taken. These can be easily scanned for and
corrected by the recursive method introduced in prior in chapter [0x41].
|=-------------------------=[ 0x50 Function matching ]=-----------------------=|
Now that we know where assembler functions begin and where they end we can
analyze them. Our goal is to get a fingerprint of each function. Those
fingerprints shall be robust agains compiler specific flags, optimizations,
vendor etc.. Now, all functions can be compared against each other in order to
find possible matches. So we also have to find a method to compare the function
fingerprints in means of equalness.
|=----------------------------=[ 0x51 ASM metrics ]=--------------------------=|
Robust and compareable fingerprints can be extracted using a form of
information geometry. Each ASM function is measured multiple times, using a set
of metics (N). This way we creat a N-dimensional vector space where each ASM
functions defines a point within this space. To reduce the number of
duplicates, we use serveral distinct metrics. Our current implementation has
~20 metrics. In order to work, a good metric has to match certiain criteria:
1. It counts some low-level logic of the assembled code. Good metrics count
the usage of instructions that can hardly be replaced by other instructions.
For example the square-root function "sqrt". Also counting the resources a
function uses is a good idea: reference count, calls or stack_size. The idea
behind that is that almost any compiler is forced to build executables that
give a good performance when running the binary code. So we can assume they
will use a similar number of "sqrt" calls in whatever configuration.
2. The metrics have to be distinct from each other in a way that they do not
count the same thing partially or twice. A Counter-Example are two metrics
where one counts the number of ASM bytes, and the other the number of
instructions. It is easy to see that there exists some kind of correlation
between those metrics.
3. It shall be robust against common compiler flags, optimizations or
obfuscation effects. This is important so we are able to find the "same"
function in a new binary even if it has changed a lot due to the usage of a
different compiler configuration.
4. If (3) is not met in some case, the metric shall add a similar penalty to
all analyzed functions so that the loss of accuracy can are compensated by
other metrics. For example, if we count the general-purpose register usage,
and the gcc flag "-fomit-fame-pointer" is used in the new binary. We add the
same penalty to each method that is "-1". Note: Because EBP is now free to use
the penalty may also be "0".
|=-------------------------=[ 0x51 Metric distance ]=------------------------=|
To match functions agains each other we can simply look for the closest match
in our n-dimensional vektor space. This can be done in geometric fashion:
--
| metric_vevtor_a;
| metric_fector_b;
| delta = sqrt((b1-a1) + (b2-a2) + ... + (bn-an))
|__
The metric distance also gives a good tuning point for the application when we
are not able to find good matches. We simply multiply a priority vector in
front of each metric, so that the influence of "bad" metrics can be reduced or
disabled at all. Each element in such a vector has to be in the interval of
[0.0 and 1.0].
|=-------------------------=[ 0x51 Metric strength ]=------------------------=|