Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 18. Total Control with Native Code Patching

	“Every man has a scheme that will not work.”
	--Howe's Law

IN THIS CHAPTER

Why and When to Patch Native Code

We have looked at various techniques for replacing, patching, and reverse engineering Java classes. All the techniques require working at the source code or bytecode level, and that has confined our capabilities to the high-level Java world. The Java Virtual Machine (JVM) interacts with the operating system (OS) via native libraries, which means that all low-level operations are not coded in Java and therefore cannot be manipulated by the presented techniques. For instance, System.currentTimeMillis() is a native method, and all methods of ClassLoader delegate the actual class definition to its native method, called defineClass0. Although patching the Java class is typically easier and cleaner, in some cases you have no other option but to patch the native code. This chapter presents several low-level techniques of native code patching that, together with the earlier techniques, give you total control over the JVM.

I would like to bring up two important points before we get our hands dirty with native patching. The first one has to do with the legality of the work we are about to perform. As discussed earlier in this book, it is your responsibility to check that reverse engineering and patching is not prohibited by a license agreement of the product with which you are working. Besides being illegal, stealing intellectual property from other people is unethical, so I highly encourage you to use the presented techniques only for a good cause. The second point is that working with native code requires a solid knowledge of the C language, some basic understanding of machine instructions, and familiarity with binary file formats. Binary files have different formats on different platforms, and even two different compilers can produce different executable files for the same platform. For instance, object files compiled by a Microsoft C compiler differ from files created by a Borland C compiler. Patching binary code requires insertion of machine instructions into the existing machine code and manipulation of the binary file. This is like venturing into uncharted waters, so be prepared to deal with challenges and do not expect that everything will work from the start. The absence of a common, well-defined format and the complexity of dealing with raw machine instructions result in a lack of the good tools that have helped us so much previously. For instance, no decompiler can produce C code from a binary executable.

The following is a list of prerequisites for this chapter:

An understanding of C language
An ability to write and compile native libraries for the target platform
A basic knowledge of machine instructions and assembly language
Some familiarity with the Java Native Interface (JNI)

Native Code Usage in the Java Virtual Machine

Most of the code executing inside the JVM, including the core classes, is written in Java. This makes perfect sense because Java is clean, safe, and platform independent. However, at some point the JVM needs to interact with the hardware; to do that it relies on the OS. The low-level operations, such as reading a block of bytes from a hard disk or creating a network socket, are delegated to the native libraries that make OS-specific calls. Figure 14.1 in Chapter 14, “Controlling Class Loading,” showed a primitive diagram of class and native code loading by the JVM. Most of the time the native libraries simply delegate the call to the operating system in a platform-dependent manner. The native libraries for Java can be written only in the C language and accessed via the JNI.

JNI Overview

To be cross platform, Java has to use a layer of abstraction between itself and the operating system. This level of abstraction is implemented in a set of native libraries that are accessed through the JNI. JNI is a specification describing how to define native methods in Java and how to provide the implementation of those methods in C libraries. In other words, JNI provides a contract between Java classes and native libraries.

The Java side of the contract is simple: To declare a native method, you simply add a keyword (native) to the method declaration and end the declaration with a semicolon. Let's assume that a Java program needs to find out memory parameters such as the total amount of physical and virtual memory and the amount of available physical and virtual memory on the local machine. The java.lang.Runtime class can provide only information about the memory parameters for the JVM, not the total memory properties, so we have to resort to making a native call to the OS. To achieve that, we write a Java class called OSMemoryInfo having a set of native methods. This is the declaration of the method returning the total physical memory:

public native static long getPhysicalTotal();

After the method is declared, it can be compiled and used by other Java classes. An attempt to execute the method results in java.lang.UnsatisfiedLinkError because no implementation is provided for getPhysicalTotal() yet. To execute the native methods, the Java class that declares it must load a native library that provides the method implementation. The native libraries are OS dependent, which means a different version of the library must be written for every platform the application is required to run. The library is loaded only by name because the extension is platform dependent. On Windows, the library file names end with .dll; on Unix they end with .so. Listing 18.1 shows how to load a library called OSMemoryInfo.

Example 18.1. Loading a Native Library from a Java Class

public class OSMemoryInfo {
    static {
        try {
            System.loadLibrary("OSMemoryInfo");
        } catch (Exception x) {
            System.err.println("Error while loading native library");
            x.printStackTrace(System.err);
            System.exit(1);
        }
    }
    ...
}

The library is loaded by a static initializer that is executed when the class is first loaded into a JVM. This step completes the contract on the Java side and brings us to the native code side.

To execute the OSMemoryInfo class, the JVM has to be provided with a library containing implementations of all the native methods. The location of the library is determined by a platform-specific search path. On Windows, the search path includes the current directory and the directories specified by the PATH environment variable. On Unix, the search path is determined by an environment variable, whose name depends on the Unix flavor. For instance, on Solaris its name is LS_LIBRARY_PATH and on HP UX it is SH_LIB_PATH. The name of the native library is also OS specific. On Windows, our native library would be named OSMemoryInfo.dll, whereas on Unix it would be OSMemoryInfo.so. The requirement for the library is to export the functions that match the name and the declaration syntax of the native methods defined in the Java class. JNI specifies the type mapping between C types and Java types and provides extensive mechanisms for accessing Java objects, throwing exceptions, and manipulating the data types. For instance, a C function that implements the Java method getPhysicalTotal(), shown earlier, should be declared as follows:

JNIEXPORT jlong JNICALL 
   Java_covertjava_nativecode_OSMemoryInfo_getPhysicalAvail(JNIEnv *, jclass);

JNI Implementation Example

Learning by example is the most effective way to learn, so let's work with the OSMemoryInfo class presented in the previous section. Recall that the class was designed to use JNI to obtain memory information from the operating system. It has four native methods, returning the total and available amount of physical and virtual memory. All methods have the syntax shown in Listing 18.1, and the entire class source can be found in CovertJava/src/covertjava/nativecode/OSMemoryInfo.java.

The easiest way to find the right syntax for the C functions that correspond to Java native methods is to use the javah utility. javah generates a C header file based on the provided Java class file. For every native method found in the Java class, javah creates a function signature in the output C header file. Running javah on the covertjava.bytecode.OSMemoryInfo class produces a file, covertjava_nativecode_OSMemoryInfo.h, that can also be found in the CovertJava/src/covertjava/nativecode directory. Take a moment to examine the function declarations and how Java data types are mapped to C types.

The next step is to code the bodies of the four functions declared in covertjava_nativecode_OSMemoryInfo.h. To keep the example concise, we will look at only the Windows implementation because the Unix implementation differs only in the function call that is made to the OS. All four functions use the same Win32 API function—GlobalMemoryStatusEx—that returns a slew of information about the OS memory. The function bodies are coded in OSMemoryInfo.c, which can be found in the CovertJava/src/covertjava/nativecode directory. Listing 18.2 shows the implementation of Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal()

Example 18.2. Native Implementation of getPhysicalTotal()

JNIEXPORT jlong JNICALL 
  Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal
    (JNIEnv *env, jclass cls)
{
    MEMORYSTATUSEX memStat;
    memStat.dwLength = sizeof (memStat);
    if (GlobalMemoryStatusEx(&memStat) == 0 && (*env) != 0) {
        jclass exceptionCls = (*env)->FindClass(env, "java/lang/Exception");
        char msg[100];
        sprintf(msg,
                "Failed to get memory information from the OS, error code %li",
                (long)GetLastError());
        if (exceptionCls != 0)  /* Raise Java exception */
            (*env)->ThrowNew(env, exceptionCls, msg);
        return -1;
    }
    return (jlong) (int) memStat.ullTotalPhys;
}

There's nothing complicated here—just a call to a Win32 function, a check for an error, and a return of the result. In the spirit of Java ideology, the C function we have created throws a java.lang.Exception if the Win32 API call fails.

With the function bodies coded, we can build a Windows Dynamically Linked Library (DLL). I am going to use the MSVC compiler, which is actually shipped free when you download Windows SDK and .Net SDK. The makefile that builds the DLL can be found in the CovertJava/build directory, and the batch file CovertJava/bin/build_native.bat can be used to run nmake.exe. You are free to choose the compiler and the build method of your choice, but I recommend using the Microsoft compiler for reasons explained later. If you want to rebuild the native libraries, be sure to update all the paths inside build_native.bat.

We can now run the OSMemoryInfo class's main() method, which outputs the values received from the native methods. Executing the CovertJava/bin/OsMemoryInfo.bat file that invokes the main() method produced the following output on my machine:

C:ProjectsCovertJavain>OsMemoryInfo.bat
Total     Physical Memory: 535121920
Available Physical Memory: 199958528
Total     Virtual  Memory: 2147352576
Available Virtual  Memory: 1960931328

We now have a working JNI implementation that we can experiment with.

Generic Approaches to Patching Native Methods

Knowing the basic principle of how Java code interacts with native code and the architecture of JNI, we can now look at the methods of overriding the native functions. Just as with bytecode patching, the goal is to intercept a native method invocation and provide our own implementation of it. The patch should be transparent to the caller, requiring no changes in the Java client code. Let's examine three approaches, each with its own pros and cons.

Patching a Java Method Declaration

The easiest solution is to patch the Java class that declares the method, removing the native keyword and replacing it with a Java implementation. The implementation can delegate to a helper class that provides the actual method logic. Even though it's simple, this method is the most effective and should be your first choice. Because all the changes are done at the Java level, you don't need to delve into C programming and binary file manipulations. A complication to this approach is a situation in which you actually want to make a native call but need to change some of its logic. Assume you have a new requirement to have the OS user created in the Users group instead of the Administrators. Here you won't avoid calling a native method that interacts with the OS. Even in this case, however, you can patch the original Java method to be non-native and then have it call a native method. The native method is then implemented in a custom native library with an alternative name that creates a user at the OS level. The only time when declaration patching cannot be used is when a license agreement prohibits reverse engineering of Java classes but does not restrict the modifications of native libraries.

Substituting Native Libraries

The second approach is to replace the original native library with a substitute that exports the same functions that are exported by the original library. The substitute functions delegate to the original functions unless an alternative implementation is required. The substitute library acts like a smart proxy to the original library, capable of preprocessing, post-processing, and completely overriding the method calls. This approach works well if the library has few functions, or if patching is needed for most of the methods exported by the library. Because all the work can be done in C, this is a relatively simple approach requiring no changes to either the Java classes or the binary machine code. If the number of exported functions is high, coding the substitute library can become tiresome. Just as with patching the Java method declaration, a potential problem can occur with trying to keep some of the logic from the original native method. It is pretty much an all-or-nothing approach—you either delegate to the original method or you don't.

Patching Native Code

Do you remember one of the questions we contemplated in Chapter 15, “Replacing and Patching Core Java Classes”? It was, “What do we do when we have tried every road but failed?” I don't expect this to be quoted on the Internet, but in a way this is what this book is all about. The previous two approaches provide clean and relatively simple solutions to native code patching, but they do not live up to the promise of “total control.” To get total control, we must be able to hack the native libraries and patch the code similarly to how we have done it with the bytecode. The third approach does exactly that: It relies on exploring the binary format of the library, finding the machine code to be changed, and patching it with the new logic. It is not an easy path, which is why I recommend using the first two approaches before attempting this one. Patching native code is platform specific, requiring a thorough understanding of the executable file format and knowledge of assembly language and processor addressing. But the payoff is great, too. The technique we will study here can be used on any executable, not just JNI libraries. It also gives you an insight into the executable file formats and how the operating system loads and runs programs. The following sections explore patching of native code on the Windows and Unix platforms.

Patching Native Code on the Windows Platform

Understanding this section requires a basic knowledge of assembly language and some familiarity with the Portable Executable format. Hacking and patching is a rather popular subject among gamers and college students, which results in an abundance of utilities that greatly simplify the task on the Windows platform. Instead of having to manually edit the binary code and insert new machine instructions, we can rely on the utilities and libraries to do the low-level patching.

Portable Executable Format

Windows Portable Executable (PE) format is loosely based on Unix's Common Object File Format (COFF). It describes the binary structure of an executable file that can run on any Win32-compatible OS. Executable files include EXE, DLL, SCR, VxD, and other types. Structurally, a PE file is much like a JAR or Zip archive that contains other files or sections. A PE file has a DOS header; a PE header; and a section table followed by a number of sections representing various resources such as text, data, and UI resources. Table 18.1 shows the structure of a PE file.

Table 18.1. PE File Structure

ELEMENT	DESCRIPTION
DOS MZ header	Provided for backward compatibility to ensure that the file is recognized as a valid executable when run under MS-DOS.
DOS stub	A small built-in program that usually just outputs a line saying that the file must be run on Win32
PE header	Contains various information about the PE portion of the file, such as the number of sections and the entry point addresses
Section table	An array of structures describing each section. The structures contain information such as the section attribute, file offset, and virtual offset.
`.text` section	Contains the program binary code.
`.data` section	Contains the initialized data.
`.idata` section	Contains the import table.
`.edata` section	Contains the export table.
Debug symbols	Various debugging information such as line numbers.

A great way to explore the internal structure of a portable executable is to open it in the PE Explorer utility. It is a well-written shareware program that displays the headers, sections, and contents of the known PE sections in a GUI window. PE Explorer also includes a disassembler that can be used to study the machine code inside the file. PE Explorer can be downloaded for free evaluation from http://www.heaventools.com. For instance, loading the OSMemoryInfo.dll file we created earlier into PE Explorer enables us to see the sections and exports of that DLL. Viewing exports reveals that the DLL exposes four functions with mangled names. We can see that Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal is exported as Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal@8. The C compiler automatically appended an @ followed by the number of bytes the parameters take on the stack to all the functions, following the __stdcall convention.

Because we are interested in patching the function logic, we need to be able to view its corresponding machine code. C language source code is compiled directly into the binary machine code. Unlike Java bytecode, which needs to be further compiled or interpreted by the JIT, the machine code is directly executed by the processor. The direct implication of this is that the compiled executable can run only on the processor architecture for which it is built. The indirect implication is that there is no easy way to decompile the machine code back into the source code. The two are very different; there is no standard as to how to represent C language constructs with machine instructions; and every compiler makes different optimizations that further complicate the decompiling. Therefore, the only way to reverse engineer the binary executables is to work at the assembly language level. The assembly language is a human-readable representation of the machine instructions. It is very primitive, but its code corresponds directly to the way in which the processor will execute it. We are not going to write any code in assembly language, but if you want to learn more about it, pick up a book from Amazon.com or just read the online documentation. For Intel architectures, I recommend Assembly Language for Intel-Based Computers by Kip R. Irvine (Prentice Hall, ISBN: 0130910139).

Let's try to locate the code of the Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal function inside the binary file using PE Explorer. If you haven't done it yet, download, install, and run PE Explorer; then load OSMemoryInfo.dll into it. Take a look at the exports to see the names of the functions exposed by the DLL. Then run the Disassembler from the Tools menu with the default settings. You will see a blue screen showing panels with various information. The main panel shows the disassembled code for the entry point into the DLL. Because we are interested in the getPhysicalTotal() code, we will use the search feature to locate it quickly. Select Find from the Search menu and in the Find dialog box, type getPhysicalTotal in the text field. The Name List panel should highlight an item called Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal, and its disassembled code should be displayed in the main panel, as shown in Figure 18.1.

Figure 18.1. PE Explorer showing the disassembled code of getPhysicalTotal().

With a basic understanding of assembly language, you should be able to discern that the function starts by saving the stack pointer and allocating space on the stack for the local variables. It then calls GlobalMemoryStatusEx from the KERNEL32.dll module and checks whether the return value is 0. If the result is 0, it checks whether the env parameter to getPhysicalTotal() is 0; if it's not, it formats an error message and calls a subroutine to throw an exception. Otherwise, it uses the value from a local structure populated by GlobalMemoryStatusEx as the return value. It then restores the stack pointer and returns. What we see is a virtually one-to-one match to the C code of the function body because getPhysicalTotal uses only primitive operations such as comparison and function calls. We are now ready to patch that code with a new logic.

Patching a Native Function Using the Function Replacer Utility

As I stated earlier, the process of patching a native function involves locating the binary code of the function and replacing a portion of it with new code or a diversion to the new code. The diversion can be a simple JMP assembly instruction to the address where the new instructions begin or a piece of code that loads a dynamic library and calls a procedure from it. The patch must be applied carefully to avoid unsettling the state of the registers and the call stack. Another delicate issue is the fate of the code that was overridden with the diversion code. If you don't need to execute the original code, the patch code can be written over the original instructions. However, if the patch adds logic on top of the original logic by doing pre- or post-processing, the original code must be relocated to a different space before being replaced with the diversion. As you can see, binary patching is a rather complex and fragile process requiring a thorough analysis of the state of the caller and the code being called. That is why I recommend that you patch the Java method declaration or substitute the entire library as the first choice.

No reliable tools can safely do the binary patching. The only decent utility that I was able to find and use with marginal success (it didn't work under JDK 1.4) is a Function Replacer written by a member of the Execution coding group with the flamboyant name of Death. It can be downloaded from the Execution group's Web site, which is currently hosted at http://execution.cjb.net. The idea behind the utility fits our requirements perfectly. Function Replacer replaces an exported function from one Win32 DLL with an exported function from another DLL. The replacement function has to have the same number of parameters and the same calling style to preserve the state of the stack. We'll use this utility to patch the getPhysicalTotal() method of OSMemoryInfo.dll with a stub from another DLL that is hardcoded to always return a value of 10. Listing 18.3 shows the source code for the patch.

Example 18.3. getPhysicalTotal() Patch Source Code

JNIEXPORT jlong JNICALL Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal
    (JNIEnv *env, jclass cls)
{
    return (jlong) (int) 10;
}

The DLL containing the patch is called OSMemoryInfoPatch.dll and is prebuilt for this book. It can be rebuilt using the CovertJava/bin/build_native.bat script, provided you have installed a C compiler and updated the build script for it. Make a backup copy of the OSMemoryInfo.dll and run Function Replacer. In the Function Replacer UI, specify OSMemoryInfo.dll as the To-Be-Patched DLL and select Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal@8 (the second item in the list box) as the function to replace. Specify OSMemoryInfoPatch.dll as the Replacer DLL and select Java_covertjava_nativecode_OSMemoryInfo_getPhysicalTotal@8 as the function to replace with. Click the Replace Function button and be sure that the utility does not report any errors. Now try to run the Java application and see whether the patch has worked. Make sure that the current JDK is 1.2 or 1.3 and run CovertJava/bin/OSMemoryInfo.bat. On my machine I got the following output:

C:ProjectsCovertJavain>OsMemoryInfo.bat
Total     Physical Memory: 10
Available Physical Memory: 318607360
Total     Virtual  Memory: 2147352576
Available Virtual  Memory: 1992871936

Instead of printing 535121920, which is the real value of the total physical memory on my machine, the Java native method now returns 10. The patch has worked, so let's investigate the magic behind it. Function Replacer works by writing bootstrap code over the original code of the method and inserting a call to the replacement procedure. The bootstrap code, written at the start of the original function code, loads the patch DLL using a LoadLibrary() API call and locates the replacement function using GetProcAddress(). This is a standard way of dynamically loading a DLL on the Win32 platform. After the replacement function is located, the control is transferred to it via a JMP instruction. The assembly code of the bootstrap is shown in Listing 18.4.

Example 18.4. Patched Assembly Code of getPhysicalTotal()

push    esi
call    osmemory.10001006
pop     esi
sub     esi,401005
lea     eax,dword ptr ds:[esi+40102c]
push    eax
call    dword ptr ds:[<&kernel32.LoadLibraryA>]
push    ebx
lea     ebx,dword ptr ds:[esi+401042]
push    ebx
push    eax
call    dword ptr ds:[<&kernel32.GetProcAddress>]
pop     ebx
pop     esi
; OSMemoryInfoPatch._java_covertjava_nativecode_osmemoryinfo_GetPhysicalTotal@8
jmp     eax
; Define strings for library and patch function name
db     ...

Because the control is transferred via a JMP instruction, the replacement procedure returns directly to the caller instead of going back to the bootstrap code. The analysis of the code enables us to understand the limitations of the Function Replacer design. The size of the bootstrap code depends on the length of the patch DLL and function name, so the approach does not work for very small native functions. Because the bootstrap code overrides the original code, the original function cannot be called.

Another problem with Function Replacer is that it crashes the JVM when the patch is running under JDK 1.4.2. Even though the assembly code is valid and the patched DLL can be loaded by C programs without any problems, it seems to interfere with the internal state of the JVM. Function Replacer makes patching easy, but the utility is unreliable. We will therefore look at an alternative approach of using a powerful library to implement and install the patch manually.

Manual Patching Using Microsoft Detours Library

Detours is a Microsoft library for working with PE files at the binary level and for intercepting functions at runtime. It is a solid and well-written framework that can be used in C programs. Following are the main features of the Detours library:

Function interception at execution time—. Functions are intercepted in memory at runtime, not on disk. This is a cleaner approach that also can help to overcome certain license agreement restrictions.
Original function invocation—. Detours preserves the code of the patched function. Unlike Function Replacer, the Detours library saves the machine instructions from the original function code to an entity called trampoline before overriding them with the detour code. This allows for pre- and post-processing logic around the original function.
Small footprint of the detour—. The detour is implemented as a JMP to the patching logic, which requires only 4 bytes and therefore works for very short functions, as well.
Import table editing for DLL insertion—. Detours provides functions for editing the import table of a PE executable. This is useful for inserting a DLL that implements and installs a patch as a detour for a target function. Import modifications are saved to a file on the disk.
Clean high-level C API—. The library is well designed and fairly easy to use. It still requires an understanding of Win32 architecture, but it makes assembly coding unnecessary. The patch and the detour are coded as C functions, and the interception is installed with just a few lines of code.

The Detours library can be downloaded free from http://research.microsoft.com/sn/detours. It comes with good documentation and many examples, and because this book is Java centric, we are not going to spend time writing C code. Listing 18.5 shows a few key excerpts from an example that patches a Win32 Sleep function and measures the total time a program spends sleeping.

Example 18.5. Key Steps in Using the Detours Library

/* Declare a Sleep() trampoline using Detours macro */
DETOUR_TRAMPOLINE(VOID WINAPI UntimedSleep(DWORD dwMilliseconds), Sleep);

/* DLL entry point that installs and removes a detour for Sleep */
BOOL WINAPI DllMain(HINSTANCE hinst, DWORD dwReason, LPVOID reserved)
{
    if (dwReason == DLL_PROCESS_ATTACH) {
        printf("slept.dll: Starting.
");
        Verify((PBYTE)Sleep);
        printf("
");
        fflush(stdout);
        DetourFunctionWithTrampoline((PBYTE)UntimedSleep, (PBYTE)TimedSleep);
    }
    else if (dwReason == DLL_PROCESS_DETACH) {
        DetourRemove((PBYTE)UntimedSleep, (PBYTE)TimedSleep);
        printf("slept.dll: Removed trampoline, slept %d ticks.
", dwSlept);
        fflush(stdout);
    }
    return TRUE;
}

/* This is a patch for Sleep() that measures the total time spent sleeping */
VOID WINAPI TimedSleep(DWORD dwMilliseconds)
{
    DWORD dwBeg = GetTickCount();
    UntimedSleep(dwMilliseconds);
    DWORD dwEnd = GetTickCount();

    InterlockedExchangeAdd(&dwSlept, dwEnd - dwBeg);
}

The code in Listing 18.5 installs a detour (patch) called TimedSleep() for the Sleep() function. The original Sleep() function can still be invoked via the trampoline called UntimedSleep(). To use Detours for a JNI function, a replacement function having the same signature as the target function needs to be written and placed inside a DLL. The DllMain() function of that DLL should install a detour using DetourFunctionWithTrampoline(); then the DLL needs to be inserted as the first import to the DLL or EXE that contains the JNI function being patched.

Patching Native Code on Unix Platforms

Patching binaries in the Unix world is a much harder task compared to on a Windows platform. Because Unix is a diverse platform with multiple hardware architectures and software standards, the low-level undertaking such as disassembling an executable file and editing the machine code requires different implementations for different architectures. For instance, the common Unix processor architectures include SPARC used by Sun Solaris, PA-RISC or Itanium used by HP UX, RS/6000 or PPC used by IBM AIX, and Intel used by Linux. Each processor has a different instruction set, so the binary files are not portable across the architectures. This means no common disassembler can convert the machine code into assembly on all platforms. Free and commercial disassemblers are available for each platform, but the quality and the ease of use vary greatly. One of the best utilities is IDA Pro (http://datarescue.com), which supports a plethora of processor types. It can run only on Windows, but it claims to be capable of disassembling the binaries for most of the common hardware architectures.

The situation with the software standards is not much better. Many standards exist for executable file formats, with the Common Object File Format (COFF) and Executable and Linking Format (ELF) being the two most prominent choices today. COFF was traditionally used on Unix systems. It has certain limitations and lacks flexibility, which is why a more modern ELF has been gradually replacing it. Both COFF and ELF are similar to Microsoft's PE format. Table 18.2 shows a high-level structure of the ELF format from the linking view.

Table 18.2. ELF File Structure

ELEMENT	DESCRIPTION
ELF header	Contains various information about the file such as the number of sections and the entry point addresses.
Program header table (optional)	Provides the location and description of segments.
Section 1	Data specific to section 1. It can be machine instructions, data, a symbol table, and so on.
Section N	Data specific to section N.
Section header table	An array of structures describing the attributes of each section such as the name, the type, the section starting address, and how the information should be interpreted.

Patching binaries requires reading and writing. Working with ELF files can be simplified by using the libelf library. libelf provides a set of high-level C functions that manipulate executable files, shared libraries, object files, and other files that follow the ELF format. libelf is available for Solaris, HP-UX, AIX, and Linux; it can most likely be found for other Unix flavors, as well. Because libelf is a general-purpose library, it does not provide the functions for patching that we have found in Microsoft's Detours library. libelf offers a convenient way of locating the code to be patched and updating the executable file with the changes, but the actual task of inserting assembly instructions and possibly implementing a trampoline has to be done manually.

The approach to patching Unix shared libraries that contain native code is identical to the work we have done on Windows. The native code for the target function has to be located and disassembled. Then it can be overwritten with the new code or a JMP instruction to the new code. The new logic can also be implemented in a shared library that is dynamically loaded by the patch. As long as the function signature and the calling convention are the same, the passing of the parameters and the return occurs correctly. To design the specific assembly code, refer to the target processor documentation.

Quick Quiz

1:	What role does JNI play in Java architecture?
2:	What steps need to be executed to implement and execute a native method?
3:	For each of the three approaches to patching native methods, list their pros and cons.
4:	Which section of the PE file needs to be accessed to get the machine code?
5:	Why does the Function Replacer utility not work for native functions with just a few machine code instructions?
6:	When implementing a detour in assembly code, can the control to the patch be transferred via a CALL instead of JMP? Explain why.
7:	What advantages does the Detours library offer over the Function Replacer?
8:	What are the dominant formats for executable files on Unix?
9:	How would you patch a native function in Unix?

In Brief

Native code patching provides the ultimate control over the JVM because it allows altering the behavior on the lowest level. It relies on exploring the binary format of the library, finding the machine code to be changed, and patching it with the new logic.
JNI is a specification describing how to define native methods in Java and how to provide the implementation of those methods in native libraries.
Java native methods require development of a dynamic (shared) library in the C language that is loaded by JVM at runtime.
The easiest approach to native patching is patching the Java class that declares the method, removing the native keyword, and providing a new Java method implementation.
Substituting a native library with a delegating proxy offers a second alternative to native code patching. The substitute library is implemented in the C language with no changes made to the Java classes. The original library is renamed to a different name, and the new library is given the name of the original library.
On the Windows platform a utility such as Function Replacer can be used to patch an exported function from one DLL with an exported function from another DLL. Function Replacer is easy to use, but it has limitations and reliability problems.
Microsoft Detours is a library for working with PE files at the binary level and for intercepting functions at runtime. It is a solid and well-written framework that can be used in C programs for manual patching.
Unix-executable files typically adhere to the COFF or ELF format. The general approach to patching Unix libraries is similar to the Windows approach.
libelf is a commonly used library for the manipulation of executable files in the ELF format on Unix.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 18. Total Control with Native Code Patching

Create new playlist

Sign In

Sign Up