9
CROSS-REFERENCES

Image

Two common questions asked while reverse engineering a binary are “Where is this function called from?” and “Which functions access this data?” These and other similar questions seek to identify and catalog the references to and from various resources in a program. The following two examples serve to show the usefulness of such questions.

Example 1

While you are reviewing the large number of ASCII strings in a particular binary, you see a string that seems particularly suspicious: “Pay within 72 hours or the recovery key will be destroyed and your data will remain encrypted forever.” On its own, this string is just circumstantial evidence. It in no way confirms that the binary has the capability or intent to execute a crypto ransomware attack. The answer to the question “Where is this string referenced in the binary?” would help you to quickly track down the program location(s) that makes use of the string. This information, in turn, should assist you in locating any related crypto ransomware code that uses the string or to demonstrate that the string, in this context, is benign.

Example 2

You have located a function containing a stack-allocated buffer that can be overflowed, possibly leading to exploitation of the program, and you want to determine if this is actually possible. If you want to develop and demonstrate an exploit, the function is useless to you unless you can get it to execute. This leads to the question “Which functions call this vulnerable function?” as well as additional questions regarding the nature of the data that those functions may pass to the vulnerable function. This line of reasoning must continue as you work your way back up potential call chains to find one that you can influence to demonstrate that the overflow is exploitable.

Referencing Basics

Ghidra can help you analyze both of these cases (and many others) through its extensive mechanisms for displaying and accessing reference information. In this chapter, we discuss the types of references that Ghidra makes available, the tools for accessing reference information, and ways to interpret that information. In Chapter 10, we will use Ghidra’s graphing capabilities to examine visual representations of reference relationships.

All references obey the same general traffic rules. Associated with each reference is the notion of a direction. All references are made from one address to another address. If you are familiar with graph theory, you can think of addresses as nodes (or vertices) in a directed graph, and references as the edges that identify directed connections between the nodes. Figure 9-1 provides a quick refresher on basic graph terminology. In this simple graph, three nodes—A, B, and C—are connected by two directed edges.

Directed edges are represented by arrows to indicate the allowable direction of travel along the edge. In Figure 9-1, travel from A to B is possible, but travel from B to A is not, similar to a one-way street. If the arrows were bidirectional, travel in either direction would be acceptable.

Ghidra has two basic categories of references: forward references and back references (each with subcategories as well). The back references are the less complex of the two types and are likely to be used most frequently in reverse engineering. Back references, also referred to as cross-references, provide a means to navigate between locations in the listing such as code and data.

image

Figure 9-1: Directed graph with three nodes and two edges

Cross-References (Back References)

Back references within Ghidra are often referred to simply as XREFs, which is a mnemonic for the term cross-reference. Within this text, we use the term XREF only when referring to the specific sequence of characters (XREF) in a Ghidra listing, menu item, or dialog. In all other cases, we stick to the more general term cross-reference when referring to back references. Let’s start by looking at specific examples of XREFs in Ghidra before moving on to a more comprehensive example.

Example 1: Basic XREFs

Let’s begin by examining some of the XREFs that we encountered in demo_stackframe (see Chapter 6) and use the following listing to understand the associated format and meaning:

     *******************************************************************
     *                         FUNCTION                                *
     *******************************************************************
     undefined demo_stackframe(undefined param_1, undefined4. . .
        undefined   AL:1            <RETURN>
        undefined   Stack[0x4]:4    param_1
        undefined4  Stack[0x8]:4    param_2   XREF[1]:0804847f(R)
        undefined4  Stack[0xc]:4    param_3   XREF[1]:  08048479(R)
        undefined4  Stack[-0x10]:4  local_10  XREF[1]:  0804847c(W)  
        undefined4  Stack[-0x14]:4  local_14  XREF[2]:  08048482(W),
                                                        08048493(R)  
        undefined4  Stack[-0x18]:4  local_18  XREF[2]:  08048485(W),
                                                        08048496(R)  
        undefined1  Stack[-0x58]:1  local_58  XREF[1]:  0804848c(W)  
     demo_stackframe                          XREF[4]:  Entry Point(*),  
                                                        main:080484be(c),
                                                        080485e4, 08048690(*)  

Ghidra not only indicates that there is a cross-reference with the indicator XREF but also shows the number of cross-references with an index value following XREF. This part of the cross-reference (for example, XREF[2]:) is called the XREF header. Examining the headers in the listing, we can see that most of the cross-references have only one referring address, but a few have more.

Following the header is the address associated with the cross-reference , which is a navigable object. Following the address, there is a type indicator in parentheses . For data cross-references (which is the case in this example), the valid types are R (indicating that the variable is read at the corresponding XREF address), W (indicating that the variable is being written to), and * (indicating that an address of a location is being taken as a pointer). In summary, data cross-references are identified in the listing where the data is declared, and associated XREF entries provide links to the locations where the data is referenced.

FORMATTING XREFS

As with most items you encounter in the Listing window, you can control the attributes associated with the cross-reference display. Selecting Edit ▸ Tool Options opens the editable options for the CodeBrowser. Since an XREF is part of the Listing window, the XREFs Field can be found within the Listing Fields folder. When it is selected, it will open the dialog shown in Figure 9-2 (here with default options). If you were to change Maximum Number of XREFs to Display to 2, the header for all cross-references exceeding this number would be displayed as XREF[more]. The option to display nonlocal namespaces allows you to quickly identify all of the cross-references that are not within the current function’s body. All of the options are explained in Ghidra Help.

image

Figure 9-2: XREFs Field edit window showing defaults

The listing also contains a code cross-reference . Code cross-references are a very important concept, as they facilitate Ghidra’s generation of function graphs and function call graphs, which are the focus of Chapter 10. A code cross-reference is used to indicate that an instruction transfers or may transfer control to another instruction. The manner in which instructions transfer control is referred to as a flow. Flows may be any of three basic types: sequential, jump, or call. Jump and call flows can be further divided according to whether the target address is a near or far address.

A sequential flow is the simplest flow type, as it represents linear flow from one instruction to the next. This is the default execution flow for all nonbranching instructions such as ADD. There are no special display indicators for sequential flows other than the order in which instructions are listed in the disassembly: if instruction A has a sequential flow to instruction B, then instruction B will immediately follow instruction A in the disassembly listing.

Example 2: Jump and Call XREFs

Let’s take a quick look at a new example containing code cross-references that demonstrate jumps and calls. As with data cross-references, code cross-references also have an associated XREF entry in the Listing window. The following listing shows information associated with the function main:

     ********************************************************************
     *                         FUNCTION                                 *
     ********************************************************************
     undefined4 __stdcall main(void)
        undefined4  EAX:4           <RETURN>
        undefined4  Stack[-0x8]:4   ptr      XREF[3]:  00401014(W),
                                                         0040101b(R),
                                                         00401026(R)
     main                                    XREF[1]:  entry:0040121e(c)

You can clearly identify the three XREFs associated with the stack variable as well as the XREF associated with the function itself . Let’s decode the meaning of the XREF, entry:0040121e(c). The address (or in this case, identifier) before the colon indicates the referring (or source) entity. In this case, control is transferred from entry. To the right of the colon is the specific address within entry that is the source of the cross-reference. The suffix (c) indicates that this is a CALL to main. Stated simply, the cross-reference says, “main is called from address 0040121e within entry.”

If we double-click the cross-reference address to follow the link, we are taken to the specified address within entry where we can examine the call. While the XREF is a unidirectional link, we can quickly return to main by double-clicking the function name (main) or using the backward navigation arrow in the CodeBrowser toolbar:

0040121e  CALL   main

In the following listing, the (j) suffix on the XREF indicates that this labeled location is the target of a JUMP:

004011fe  JZ     LAB_00401207
00401200  PUSH   EAX
00401201  CALL   __amsg_exit
00401206  POP    ECX
        LAB_00401207                           XREF[1]: 004011fe(j)
00401207  MOV    EAX,[DAT_0040acf0]

Similar to the previous example, we can double-click the XREF address to navigate to the statement that transferred control. We can return by double-clicking the associated label .

References Example

Let’s walk through an example from source code to disassembly to demonstrate many types of cross-references. The following program, simple_flows.c, contains various operations that exercise Ghidra’s cross-referencing features, as noted in the comment text:

int read_it;            // integer variable read in main
int write_it;           // integer variable written 3 times in main
int ref_it;             // integer variable whose address is taken in main
void callflow() {}      // function called twice from main

int main() {
    int *ptr = &ref_it; // results in a "pointer" style data reference (*)
    *ptr = read_it;     // results in a "read" style data reference (R)
    write_it = *ptr;    // results in a "write" style data reference (W)
    callflow();         // results in a "call" style code reference (c)
    if (read_it == 3) { // results in "jump" style code reference (j)
        write_it = 2;   // results in a "write" style data reference (W)
    }
    else {              // results in an "jump" style code reference (j)
        write_it = 1;   // results in a "write" style data reference (W)
    }
    callflow();         // results in an "call" style code reference (c)
}

Code Cross-References

Listing 9-1 shows the disassembly of the preceding program.

     undefined4 __stdcall main(void)
        undefined4 EAX:4 <RETURN>
        undefined4 Stack[-0x8]:4 ptr          XREF[3]:  00401014(W),
                                                        0040101b(R),
                                                        00401026(R)
     main                                     XREF[1]:  entry:0040121e(c)
00401010  PUSH   EBP
00401011  MOV    EBP,ESP
00401013  PUSH   ECX
00401014  MOV  dword ptr [EBP + ptr],ref_it
0040101b  MOV    EAX,dword ptr [EBP + ptr]
0040101e  MOV  ECX,dword ptr [read_it]
00401024  MOV    dword ptr [EAX]=>ref_it,ECX
00401026  MOV    EDX,dword ptr [EBP + ptr]
00401029  MOV    EAX=>ref_it,dword ptr [EDX]
0040102b  MOV    [write_it],EAX
00401030  CALL callflow
00401035  CMP    dword ptr [read_it],3
0040103c  JNZ    LAB_0040104a
0040103e  MOV    dword ptr [write_it],2
00401048  JMP  LAB_00401054

        LAB_0040104a                          XREF[1]:0040103c(j)
0040104a  MOV   dword ptr [write_it],1
        LAB_00401054                          XREF[1]:  00401048(j)
00401054  CALL   callflow
00401059  XOR    EAX,EAX
0040105b  MOV    ESP,EBP
0040105d  POP    EBP
0040105e  RET

Listing 9-1: Disassembly of main in simple_flows.exe

Every instruction other than JMP and RET has an associated sequential flow to its immediate successor. Instructions used to invoke functions, such as the x86 CALL instruction , are assigned a call flow, indicating transfer of control to the target function. Call flows are noted by XREFs at the target function (the destination address of the flow). The disassembly of the callflow function referenced in Listing 9-1 is shown in Listing 9-2.

     undefined __stdcall callflow(void)
        undefined AL:1 <RETURN>
     callflow                                 XREF[4]:  0040010c(*),
                                                        004001e4(*),
                                                        main:00401030(c),
                                                        main:00401054(c)
00401000  PUSH   EBP
00401001  MOV    EBP,ESP
00401003  POP    EBP
00401004  RET

Listing 9-2: Disassembly of the callflow function

EXTRA XREFS?

Every now and again, you see something in a listing that seems anomalous. Listing 9-2 has two pointer XREFs, 0040010c(*) and 004001e4(*), that are not easily explained. We immediately understood the two XREFs that we could trace back to the calls to callflow in main. What are the other two XREFs? It turns out that these are an interesting artifact of this particular code. This program was compiled for Windows, which results in a PE file, and the two anomalous XREFs take us to the PE header in the Headers section of the listing. The two reference addresses (including the associated bytes) are shown here:

0040010c  00 10 00 00 ibo32     callflow               BaseOfCode
               .  .  .
004001e4  00 10 00 00 ibo32     callflow               VirtualAddress

Why is this function referenced in the PE header? A quick Google search can help us understand what is happening: callflow just happens to be the very first thing in the text section, and the two PE fields indirectly reference the start of the text section, hence the unanticipated XREFs associated with the callflow function.

In this example, we see that callflow is called twice from main: once from address 00401030 and again from address 00401054. Cross-references resulting from function calls are distinguished by the suffix (c). The source location displayed in the cross-references indicates both the address from which the call is being made and the function that contains the call.

A jump flow is assigned to each unconditional and conditional branch instruction. Conditional branches are also assigned sequential flows to account for control flow when the branch is not taken; unconditional branches have no associated sequential flow because the branch is always taken. Jump flows are associated with jump-style cross-references displayed at the target of the JNZ in Listing 9-1. As with call-style cross-references, jump cross-references display the address of the referring location (the source of the jump). Jump cross-references are distinguished by the (j) suffix.

BASIC BLOCKS

In program analysis, a basic block is a maximal sequence of instructions that executes, without branching, from beginning to end. Each basic block therefore has a single entry point (the first instruction in the block) and a single exit point (the last instruction in the block). The first instruction in a basic block is often the target of a branching instruction, while the last instruction is often a branch instruction. The first instruction may be the target of multiple code cross-references. Other than the first instruction, no other instruction within a basic block can be the target of a code cross-reference. The last instruction of a basic block may be the source of multiple code cross-references, such as a conditional jump, or it may flow into an instruction that is the target of multiple code cross-references (which, by definition, must begin a new basic block).

Data Cross-References

Data cross-references are used to track how data is accessed within a binary. The three most commonly encountered types of data cross-references indicate when a location is being read, when a location is being written, and when the address of a location is being taken. The global variables from the previous sample program are shown in Listing 9-3, as they provide several examples of data cross-references.

        read_it                               XREF[2]:  main:0040101e(R),
                                                        main:00401035(R)
0040b720 undefined4    ??
        write_it                              XREF[3]:  main:0040102b(W),
                                                        main:0040103e(W),
                                                        main:0040104a(W)
0040b724    ??         ??
0040b725    ??         ??
0040b726    ??         ??
0040b727    ??         ??
        ref_it                                XREF[3]:  main:00401014(*),
                                                        main:00401024(W),
                                                        main:00401029(R)
0040b728 undefined4    ??

Listing 9-3: Global variables referenced in simple_flows.c

A read cross-reference indicates that the contents of a memory location are being read. Read cross-references can originate only from an instruction address but may refer to any program location. The global variable read_it is read twice in Listing 9-1. The associated cross-reference comments shown in this listing indicate exactly which locations in main are referencing read_it and are recognizable as read cross-references from the (R) suffix. The read performed on read_it in Listing 9-1 is a 32-bit read into the ECX register, which leads Ghidra to format read_it as an undefined4 (a 4-byte value of unspecified type). Ghidra often attempts to infer the size of a data item based on how the item is manipulated by code throughout a binary.

The global variable write_it is referenced three times in Listing 9-1. Associated write cross-references are generated and displayed as comments for the write_it variable, indicating the program locations that modify the contents of the variable. Write cross-references utilize the (W) suffix. In this case, Ghidra did not format write_it as a 4-byte variable even though there seems to be enough information to do so. As with read cross-references, write cross-references can originate only from a program instruction but may reference any program location. Generally, a write cross-reference that targets a program instruction byte is indicative of self-modifying code and is frequently encountered in malware de-obfuscation routines.

The third type of data cross-reference, a pointer cross-reference, indicates that the address of a location is being used (rather than the content of the location). The address of global variable ref_it is taken in Listing 9-1, resulting in the pointer cross-reference at ref_it in Listing 9-3, as indicated by the suffix (*). Pointer cross-references are commonly the result of address derivations either in code or in data. As you saw in Chapter 8, array access operations are typically implemented by adding an offset to the starting address of the array, and the first address in most global arrays can often be recognized by the presence of a pointer cross-reference. For this reason, most string literals (strings being arrays of characters in C/C++) are the targets of pointer cross-references.

Unlike read and write cross-references, which can originate only from instruction locations, pointer cross-references can originate from either instruction locations or data locations. An example of pointers that can originate from a program’s data section is any table of addresses (such as a vftable, which results in the generation of a pointer cross-reference from each entry in the table to the corresponding virtual function). Let’s see this in context using the SubClass example from Chapter 8. The disassembly for the vftable for SubClass is shown here:

           SubClass::vftable           XREF[1]:  SubClass_Constructor:00401062(*)
   00408148 void * SubClass::vfunc1 vfunc1
0040814c void * BaseClass::vfunc2 vfunc2
   00408150 void * SubClass::vfunc3 vfunc3
   00408154 void * BaseClass::vfunc4 vfunc4
   00408158 void * SubClass::vfunc5 vfunc5

Here you see that the data item at location 0040814c is a pointer to BaseClass::vfunc2. Navigating to BaseClass::vfunc2 presents us with the following listing:

     **************************************************************
     *                          FUNCTION                          *
     **************************************************************
     undefined __stdcall vfunc2(void)
        undefined AL:1 <RETURN>
        undefined4 Stack[-0x8]:4 local_8      XREF[1]:  00401024(W)
     BaseClass::vfunc2                        XREF[2]:  00408138(*),
                                                        0040814c(*)
00401020  PUSH   EBP
00401021  MOV    EBP,ESP
00401023  PUSH   ECX
00401024  MOV    dword ptr [EBP + local_8],ECX
00401027  MOV    ESP,EBP
00401029  POP    EBP
0040102a  RET

Unlike most functions, this function has no code cross-references. Instead, we see two pointer cross-references indicating that the address of the function is derived in two locations. The second XREF refers back to the SubClass vftable entry discussed earlier. Following the first XREF would lead us to the vftable for BaseClass, which also contains a pointer to this virtual function.

This example demonstrates that C++ virtual functions are rarely called directly and are usually not the target of a call cross-reference. Because of the way vftables are created, all C++ virtual functions will be referred to by at least one vftable entry and will always be the target of at least one pointer cross-reference. (Remember that overriding a virtual function is not mandatory.)

When a binary contains sufficient information, Ghidra is able to locate vftables for you. Any vftables that Ghidra finds are listed as an entry under the vftable’s corresponding class entry within the Classes folder of the Symbol Tree. Clicking a vftable in the Symbol Tree window navigates you to the vftable location in the program’s data section.

Reference Management Windows

By now, you’ve probably noticed that XREF annotations are quite common in the Listing window. This is no accident, as the links formed by cross-references are the glue that hold a program together. Cross-references tell the story of intra- and inter-functional dependencies, and most successful reverse engineering efforts demand a comprehensive understanding of their behavior. The sections that follow move beyond the basic display and navigational usefulness of cross-references to introduce several options for managing cross-references within Ghidra.

XRefs Window

You can use XREF headers to learn more about a particular cross-reference, as shown in the following listing:

        undefined4 Stack[-0x10]:4 local_10    XREF[1]:  0804847c(W)  
        undefined4 Stack[-0x14]:4 local_14    XREF[2]:08048482(W),
                                                        08048493(R)  

Double-clicking the XREF[2] header will bring up the associated XRefs window shown in Figure 9-3 with a more detailed listing of the cross-references. By default, the window shows the location, label (if applicable), referring disassembly, and reference type.

image

Figure 9-3: XRefs window

References To

Another window that can be helpful in understanding the program flow is the References To window. Right-clicking any address in the Listing window and choosing ReferencesShow Reference to Address brings up the window shown in Figure 9-4.

image

Figure 9-4: References To window

In this example, we have selected the starting address of the helper function. Within this window, you can navigate to the associated location by clicking any entry in the window.

Symbol References

Another reference view that was introduced in “The Symbol Table and Symbol References Windows” on page 82 is the combination of the Symbol Table and Symbol Reference windows. By default, when you choose Window ▸ Symbol References, you get two related windows. One displays every symbol in the entire symbol table. The other displays the associated references to the symbols. Selecting any entry in the Symbol Table window (function, vftable, and so on) causes the associated symbol references to be displayed in the Symbol References window.

Reference lists can be used to rapidly identify every location from which a particular function is called. For example, many people consider the C strcpy function to be dangerous as it copies a source array of characters, up to and including the associated null termination character, to a destination array, with no checks whatsoever that the destination array is large enough to hold all of the characters from the source. You could locate any one call to strcpy in your listing and use the aforementioned method to open the References To window, but if you don’t want to take the time to find strcpy used somewhere in the binary, you can open the Symbol References window and quickly locate strcpy and all associated references.

Advanced Reference Manipulation

At the start of this chapter, we equated the term back reference with cross-reference and briefly mentioned that Ghidra also has forward references, of which there are two types. Inferred forward references are generally added to the listing automatically and correspond one-for-one to back references, although inferred forward references are travelled in the opposite direction. In other words, we traverse back references from a target address back to a source address, and we traverse inferred forward references from a source address forward to a target address.

The second type is an explicit forward reference. There are several types of explicit forward references, and their management is much more complex than other cross-references. The types of explicit forward references include memory references, external references, stack references, and register references. In addition to viewing references, Ghidra allows you to add and edit a variety of reference types.

You may need to add your own cross-references when Ghidra’s static analysis cannot determine jump or call targets that are computed at runtime, but you know the target from other analysis. In the following code, which we last saw in Chapter 8, a virtual function is called.

0001072e  PUSH   EBP
0001072f  MOV    EBP,ESP
00010731  SUB    ESP,8
00010734  MOV    EAX,dword ptr [EBP + param_1]
00010737  MOV    EAX,dword ptr [EAX]
00010739  ADD    EAX,8
0001073c  MOV    EAX,dword ptr [EAX]
0001073e  SUB    ESP,12
00010741  PUSH   dword ptr [EBP + param_1]
00010744  CALL EAX
00010746  ADD    ESP,16
00010749  NOP
0001074a  LEAVE
0001074b  RET

The value held in EAX depends on the value of the pointer passed in param_1 . As a result, Ghidra does not have enough information to create a cross-reference linking 00010744 (the address of the CALL instruction) to the target of the call. Manually adding a cross-reference (to SubClass::vfunc3 for example) would, among other things, link the target functions into a call graph, thereby improving Ghidra’s analysis of the program. Right-clicking the call and selecting ReferencesAdd Reference from opens the dialog shown in Figure 9-5. This dialog is also available through the References ▸ Add/Edit option.

image

Figure 9-5: The Add Reference dialog

Specify the address of the target function as the To Address setting and make sure that the correct setting for Ref-Type is selected. When you close the dialog with the Add button, Ghidra creates the reference, and a new (c) cross-reference appears at the target address. More information on forward references, including the remaining reference types as well as reference manipulation, can be found in Ghidra Help.

Summary

References are powerful tools to help you understand how artifacts within a binary are related. We discussed cross-references in detail and introduced some other capabilities associated with references that will be visited again in later chapters. In the next chapter, we look at visual representations of references and how the resulting graphs can help us better understand the control flows within functions and the relationships between functions in our binaries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.199.162