The IDA Application Programming Interface

IDA’s API is defined by the contents of the header files in <SDKDIR>/include. There is no single-source index of available functions (though Steve Micallef has collected a rather nice subset in his plug-in writing guide). Many prospective SDK programmers find this fact initially difficult to come to terms with. The reality is that there is never an easy-to-find answer to the question, “How do I do x using the SDK?” The two principal options for answering such questions are to post the questions to an IDA user’s forum or attempt to answer them yourself by searching through the API documentation. What documentation, you say? Why, the header files, of course. Granted, these are not the most searchable of documents, but they do contain the complete set of API features. In this case, grep (or a suitable replacement, preferably built into your programming editor) is your friend. The catch is knowing what to search for, which is not always obvious.

There are a few ways to try to narrow your searches through the API. The first way is to leverage your knowledge of the IDC scripting language and attempt to locate similar functionality within the SDK using keywords and possibly function names derived from IDC. However—and this is an extremely frustrating point—while the SDK may contain functions that perform tasks identical to those of IDC functions, the names of those functions are seldom identical. This results in programmers learning two sets of API calls, one for use with IDC and one for use with the SDK. In order to address this situation, Appendix B presents a complete list of IDC functions and the corresponding SDK 6.1 actions that are carried out to execute those functions.

The second technique for narrowing down SDK-related searches is to become familiar with the content and, more important, the purpose of the various SDK header files. In general, related functions and associated data structures are grouped into headers files based on functional groups. For example, SDK functions that allow interaction with a user are grouped into kernwin.hpp. When a grep-style search fails to locate a capability that you require, some knowledge of which header file relates to that capability will narrow your search and hopefully limit the number of files that you need to dig deeper into.

Header Files Overview

While the SDK’s readme.txt files provide a high-level overview of the most commonly used header files, this section highlights some other useful information for working with these files. First, the majority of the header files use the .hpp suffix, while a few use the .h suffix. This can easily lead to trivial errors when naming header files to be included in your files. Second, ida.hpp is the main header file for the SDK and should be included in all SDK-related projects. Third, the SDK utilizes preprocessor directives designed to preclude access to functions that Hex-Rays considers dangerous (such as strcpy and sprintf). For a complete list of these functions refer to the USE_DANGEROUS_FUNCTIONS macro prior to including ida.hpp in your own files. An example is shown here:

#define USE_DANGEROUS_FUNCTIONS
#include <ida.hpp>

Failure to define USE_DANGEROUS_FUNCTIONS will result in a build error to the effect that dont_use_snprintf is an undefined symbol (in the case of an attempt to use the snprintf function). In order to compensate for restricting access to these so-called dangerous functions, the SDK defines safer equivalents for each, generally in the form of a qstrXXXX function such as qstrncpy and qsnprintf. These safer versions are also declared in pro.h.

Along similar lines, the SDK restricts access to many standard file input/output variables and functions such as stdin, stdout, fopen, fwrite, and fprintf. This restriction is due in part to limitations of the Borland compiler. Here again the SDK defines replacement functions in the form of qXXX counterparts such as qfopen and qfprintf. If you require access to the standard file functions, then you must define the USE_STANDARD_FILE_FUNCTIONS macro prior to including fpro.h (which is included from kernwin.hpp, which is, in turn, included from several other files).

In most cases, each SDK header file contains a brief description of the file’s purpose and fairly extensive comments describing the data structures and functions that are declared in the file. Together these comments constitute IDA’s API documentation. Brief descriptions of some of the more commonly used SDK header files follow.

area.hpp

This file defines the area_t struct, which represents a contiguous block of addresses within a database. This struct serves as the base class for several other classes that build on the concept of an address range. It is seldom necessary to include this file directly, as it is typically included in files defining subclasses of area_t.

auto.hpp

This file declares functions used to work with IDA’s autoanalyzer. The autoanalyzer performs queued analysis tasks when IDA is not busy processing userinput events.

bytes.hpp

This file declares functions for working with individual database bytes. Functions declared in this file are used to read and write individual database bytes as well as manipulate the characteristics of those bytes. Miscellaneous functions also provide access to flags associated with instruction operands, while other functions allow manipulation of regular and repeatable comments.

dbg.hpp

This file declares functions offering programmatic control of IDA’s debugger.

entry.hpp

This header declares functions for working with a file’s entry points. For shared libraries, each exported function or data value is considered an entry point.

expr.hpp

This file declares functions and data structures for working with IDC constructs. It is possible to modify existing IDC functions, add new IDC functions, or execute IDC statements from within modules.

fpro.h

This file contains the alternative file I/O functions, such as qfopen, discussed previously.

frame.hpp

This header contains functions used to manipulate stack frames.

funcs.hpp

This header contains functions and data structures for working with disassembled functions as well as functions for working with FLIRT signatures.

gdl.hpp

This file declares support routines for generating graphs using either DOT or GDL.

ida.hpp

This is the main header file required for working with the SDK. This file contains the definition of the idainfo structure as well as the declaration of the global variable inf, which contains a number of fields containing information about the current database as well as fields initialized from configuration file settings.

idp.hpp

This file contains declarations of structures that form the foundation of processor modules. The global variable ph, which describes the current processor module, and the global variable ash, which describes the current assembler, are defined in this file.

kernwin.hpp

This file declares functions for interacting with the user and the user interface. The SDK equivalents of IDC’s AskXXX functions are declared here, as are functions used to set the display position and configure hotkey associations.

lines.hpp

This file declares functions for generating formatted, colorized disassembly lines.

loader.hpp

This file contains the declarations for the loader_t and plugin_t structures required for the creation of loader modules and plug-in modules, respectively, as well as functions useful during the file-loading phase and functions for activating plug-ins.

name.hpp

This file declares functions for manipulating named locations (as opposed to names within structures or stack frames, which are covered in stuct.hpp and funcs.hpp, respectively).

netnode.hpp

Netnodes are the lowest-level storage structure accessible via the API. The details of netnodes are typically hidden by the IDA user interface. This file contains the definition of the netnode class and functions for low-level manipulation of netnodes.

pro.h

This file includes the top-level typedefs and macros required in any SDK module. You do not need to explicitly include this file in your projects, as it is included from ida.hpp. Among other things, the IDA_SDK_VERSION macro is defined in this file. IDA_SDK_VERSION provides a means to determine with which version of the SDK a module is being built, and it can be tested to provide conditional compilation when using different versions of the SDK. Note that IDA_SDK_VERSION was introduced with SDK version 5.2. Prior to SDK 5.2, there is no official way to determine which SDK is being used. An unofficial header file that defines IDA_SDK_VERSION for older versions of the SDK (sdk_versions.h) is available on this book’s website.

search.hpp

This file declares functions for performing different types of searches on a database.

segment.hpp

This file contains the declaration of the segment_t class, a subclass of area_t, which is used to describe individual sections (.text, .data, etc.) within a binary. Functions for working with segments are also declared here.

struct.hpp

This file contains the declaration of the struc_t class and functions for manipulating structures within a database.

typeinf.hpp

This file declares functions for working with IDA type libraries. Among other things, functions declared here offer access to function signatures, including function return types and parameter sequences.

ua.hpp

This file declares the op_t and insn_t classes used extensively in processor modules. Also declared here are functions used for disassembling individual instructions and for generating the text for various portions of each disassembled line.

xref.hpp

This file declares the datatypes and functions required for adding, deleting, and iterating code and data cross-references.

The preceding list describes approximately half of the header files that ship with the SDK. You are encouraged to familiarize yourself not only with the files in this list but also with all of the other header files as well, as you dig deeper into the SDK. Functions that make up the published API are marked as ida_export. Only functions designated as ida_export are exported in the link libraries that ship with the SDK. Don’t be misled by the use of idaapi, as it merely signifies that a function is to use the stdcall calling convention on Windows platforms only. You may occasionally run across interesting-looking functions that are not designated as ida_export; you cannot use these functions in your modules.

Netnodes

Much of IDA’s API is built around C++ classes that model various aspects of a disassembled binary. The netnode class, on the other hand, seems wrapped in mystery because it appears to have no direct relationship to constructs within binary files (sections, functions, instructions, etc.).

Netnodes are the lowest-level and most-general-purpose data storage mechanism accessible within an IDA database. As a module programmer, you will seldom be required to work directly with netnodes. Many of the higher-level data structures hide the fact that they ultimately rely on netnodes for persistent storage within a database. Some of the ways that netnodes are used within a database are detailed in the file nalt.hpp, in which we learn, for example, that information about the shared libraries and functions that a binary imports is stored in a netnode named import_node (yes, netnodes may have names). Netnodes are also the persistent storage mechanisms that facilitate IDC’s global arrays.

Netnodes are described in extensive detail in the file netnode.hpp. But from a high-level perspective, netnodes are storage structures used internally by IDA for a variety of purposes. However, their precise structure is kept hidden, even to SDK programmers. To provide an interface to these storage structures, the SDK defines a netnode class, which functions as an opaque wrapper around this internal storage structure. The netnode class contains a single data member called netnodenumber, which is an integer identifier used to access the internal representation of a netnode. Every netnode is uniquely identified by its netnodenumber. On 32-bit systems the netnodenumber is a 32-bit quantity, allowing for 232 unique netnodes. On 64-bit systems, a netnodenumber is a 64-bit integer, which allows for 264 unique netnodes. In most cases, the netnodenumber represents a virtual address within the database, which creates a natural mapping between each address within a database and any net-node that might be required to store information associated with that address. Comment text is an example of arbitrary information that may be associated with an address and thus stored within a netnode associated with that address.

The recommended way to manipulate netnodes is by invoking member functions of the netnode class using an instantiated netnode object. Reading through netnode.hpp, you will notice that a number of nonmember functions exist that seem to support netnode manipulation. Use of these functions is discouraged in favor of member functions. You will note, however, that most of the member functions in the netnode class are thin wrappers around one of the nonmember functions.

Internally, netnodes can be used to store several different types of information. Each netnode may be associated with a name of up to 512 characters and a primary value of up to 1,024 bytes. Member functions of the netnode class are provided to retrieve (name) or modify (rename) a netnode’s name. Additional member functions allow you to treat a netnode’s primary value as an integer (set_long, long_value), a string (set, valstr), or an arbitrary binary blob[116] (set, valobj). The function used inherently determines how the primary value is treated.

Here is where things get a little complicated. In addition to a name and a primary value, every netnode is also capable of storing 256 sparse arrays in which the array elements can be arbitrarily sized with values up to a maximum of 1,024 bytes each. These arrays fall into three overlapping categories. The first category of arrays is indexed using 32-bit index values and can potentially hold in excess of 4 billion items. The second category of arrays is indexed using 8-bit index values and can thus hold up to 256 items. The last category of arrays is actually hash tables that use strings for keys. Regardless of which of the three categories is used, each element of the array will accept values up to 1,024 bytes in size. In short, a netnode can hold a tremendous amount of data—now we just need to learn how to make it all happen.

If you are wondering where all of this information gets stored, you are not alone. All netnode content is stored within btree nodes in an IDA database. Btree nodes in turn are stored in an ID0 file, which in turn is archived into an IDB file when you close your database. Any netnode content that you create will not be visible in any of IDA’s display windows; the data is yours to manipulate as you please. This is why netnodes are an ideal place for persistent storage for any plug-ins and scripts that you may wish to use to store results from one invocation to the next.

Creating Netnodes

A potentially confusing point about netnodes is that declaring a netnode variable within one of your modules does not necessarily create an internal representation of that netnode within the database. A netnode is not created internally until one of the following events takes place:

  • The netnode is assigned a name.

  • The netnode is assigned a primary value.

  • A value is stored into one of the netnode’s internal arrays.

There are three constructors available for declaring netnodes within your modules. The prototypes for each, extracted from netnode.hpp, and examples of their use are shown in Example 16-1.

Example 16-1. Declaring netnodes

#ifdef __EA64__
  typedef ulonglong nodeidx_t;
  #else
  typedef ulong nodeidx_t;
  #endif
  class netnode {
    netnode();
    netnode(nodeidx_t num);
    netnode(const char *name, size_t namlen=0, bool do_create=false);
    bool create(const char *name, size_t namlen=0);
    bool create();
     //... remainder of netnode class follows
  };
  netnode n0;                       //uses
  netnode n1(0x00401110);           //uses
  netnode n2("$ node 2");           //uses
  netnode n3("$ node 3", 0, true);  //uses

In this example, only one netnode (n3) is guaranteed to exist within the database after the code has executed. Netnodes n1 and n2 may exist if they had been previously created and populated with data. Whether it previously existed or not, n1 is capable of receiving new data at this point. If n2 did not exist, meaning that no netnode named $ node 2 could be found in the database, then n2 must be explicitly created ( or ) before data can be stored into it. If we want to guarantee that we can store data into n2, we need to add the following safety check:

if (BADNODE == (nodeidx_t)n2) {
   n2.create("$ node 2");
}

The preceding example demonstrates the use of the nodeidx_t operator, which allows a netnode to be cast to a nodeidx_t. The nodeidx_t operator simply returns the netnodenumber data member of the associated netnode and allows netnode variables to be easily converted into integers.

An important point to understand about netnodes is that a netnode must have a valid netnodenumber before you can store data into the netnode. A netnodenumber may be explicitly assigned, as with n1 via a constructor shown at in the previous example. Alternatively, a netnodenumber may be internally generated when a netnode is created using the create flag in a constructor (as with n3 via a constructor shown in ) or via the create function (as with n2). Internally assigned netnodenumbers begin with 0xFF000000 and increment with each newly created netnode.

We have thus far neglected netnode n0 in our example. As things currently stand, n0 has neither a number nor a name. We could create n0 by name using the create function in a manner similar to n2. Or we could use the alternate form of create to create an unnamed netnode with a valid, internally generated netnodenumber, as shown here:

n0.create();  //assign an internally generated netnodenumber to n0

At this point it is possible to store data into n0, though we have no way to retrieve that data in the future unless we record the assigned netnodenumber somewhere or assign n0 a name. This demonstrates the fact that netnodes are easy to access when they are associated with a virtual address (similar to n1 in our example). For all other netnodes, assigning a name makes it possible to perform a named lookup for all future references to the netnode (as with n2 and n3 in our example).

Note that for our named netnodes, we have chosen to use names prefixed with “$ ”, which is in keeping with the practice, recommended in netnode.hpp, for avoiding conflicts with names IDA uses internally.

Data Storage in Netnodes

Now that you understand how to create a netnode that you can store data into, let’s return to the discussion of the internal array storage capability of net-nodes. To store a value into an array within a netnode, we need to specify five pieces of information: an index value, an index size (8 or 32 bits), a value to store, the number of bytes the value contains, and an array (one of 256 available for each category of array) in which to store the value. The index size parameter is specified implicitly by the function that we use to store or retrieve the data. The remaining values are passed into that function as parameters. The parameter that selects which of the 256 possible arrays a value is stored in is usually called a tag, and it is often specified (though it need not be) using a character. The netnode documentation distinguishes among a few special types of values termed altvals, supvals, and hashvals. By default, each of these values is typically associated with a specific array tag: 'A' for altvals, 'S' for supvals, and 'H' for hashvals. A fourth type of value, called a charval, is not associated with any specific array tag.

It is important to understand that these value types are associated more with a specific way of storing data into a netnode than with a specific array within a netnode. It is possible to store any type of value in any array simply by specifying an alternate array tag when storing data. In all cases, it is up to you to remember what type of data you stored into a particular array location so that you can use retrieval methods appropriate to the type of the stored data.

Altvals provide a simple interface for storing and retrieving integer data in netnodes. Altvals may be stored into any array within a netnode but default to the 'A' array. Regardless of which array you wish to store integers into, using the altval-related functions greatly simplifies matters. The code in Example 16-2 demonstrates data storage and retrieval using altvals.

Example 16-2. Accessing netnode altvals

netnode n("$ idabook", 0, true);  //create the netnode if it doesn't exist
sval_t index = 1000;  //sval_t is a 32 bit type, this example uses 32-bit indexes
ulong value = 0x12345678;
n.altset(index, value);   //store value into the 'A' array at index
value = n.altval(index);  //retrieve value from the 'A' array at index
n.altset(index, value, (char)3);  //store into array 3
value = n.altval(index, (char)3); //read from array 3

In this example, you see a pattern that will be repeated for other types of netnode values, namely, the use of an XXXset function (in this case, altset) to store a value into a netnode and an XXXval function (in this case, altval) to retrieve a value from a netnode. If we want to store integers into arrays using 8-bit index values, we need to use slightly different functions, as shown in the next example.

netnode n("$ idabook", 0, true);
uchar index = 80;      //this example uses 8-bit index values
ulong value = 0x87654321;
n.altset_idx8(index, value, 'A'),  //store, no default tags with xxx_idx8 functions
value = n.altval_idx8(index, 'A'), //retrieve value from the 'A' array at index
n.altset_idx8(index, value, (char)3);  //store into array 3
value = n.altval_idx8(index, (char)3); //read from array 3

Here you see that the general rule of thumb for the use of 8-bit index values is to use a function with an _idx8 suffix. Also note that none of the _idx8 functions provide default values for the array tag parameter.

Supvals represent the most versatile means of storing and retrieving data in netnodes. Supvals represent data of arbitrary size, from 1 byte to a maximum of 1,024 bytes. When using 32-bit index values, the default array for storing and retrieving supvals is the 'S' array. Again, however, supvals can be stored into any of the 256 available arrays by specifying an appropriate array tag value. Strings are a common form of arbitrary length data and as such are afforded special handling in supval manipulation functions. The code in Example 16-3 provides examples of storing supvals into a netnode.

Example 16-3. Storing netnode supvals

netnode n("$ idabook", 0, true);  //create the netnode if it doesn't exist

char *string_data = "example supval string data";
char binary_data[] = {0xfe, 0xdc, 0x4e, 0xc7, 0x90, 0x00, 0x13, 0x8a,
                      0x33, 0x19, 0x21, 0xe5, 0xaa, 0x3d, 0xa1, 0x95};

//store binary_data into the 'S' array at index 1000, we must supply a
//pointer to data and the size of the data
n.supset(1000, binary_data, sizeof(binary_data));

//store string_data into the 'S' array at index 1001.  If no size is supplied,
//or size is zero, the data size is computed as: strlen(data) + 1
n.supset(1001, string_data);
//store into an array other than 'S' (200 in this case) at index 500
n.supset(500, binary_data, sizeof(binary_data), (char)200);

The supset function requires an array index, a pointer to some data, the length of the data (in bytes), and an array tag that defaults to 'S' if omitted. If the length parameter is omitted, it defaults to zero. When the length is specified as zero, supset assumes that the data being stored is a string, computes the length of the data as strlen(data) + 1, and stores a null termination character along with the string data.

Retrieving data from a supval takes a little care, as you may not know the amount of data contained within the supval before you attempt to retrieve it. When you retrieve data from a supval, bytes are copied out of the netnode into a user-supplied output buffer. How do you ensure that your output buffer is of sufficient size to receive the supval data? The first method is to retrieve all supval data into a buffer that is at least 1,024 bytes. The second method is to preset the size of your output buffers by querying the size of the supval. Two functions are available for retrieving supvals. The supval function is used to retrieve arbitrary data, while the supstr function is specialized for retrieving string data. Each of these functions expects a pointer to your output buffer along with the size of the buffer. The return value for supval is the number of bytes copied into the output buffer, while the return value for supstr is the length of the string copied to the output buffer not including the null terminator, even though the null terminator is copied to the buffer. Each of these functions recognizes the special case in which a NULL pointer is supplied in place of an output buffer pointer. In such cases, supval and supstr return the number of bytes of storage (including any null terminator) required to hold the supval data. Example 16-4 demonstrates retrieval of supval data using the supval and supstr functions.

Example 16-4. Retrieving netnode supvals

//determine size of element 1000 in 'S' array.  The NULL pointer indicates
//that we are not supplying an output buffer
int len = n.supval(1000, NULL, 0);

char *outbuf = new char[len];  //allocate a buffer of sufficient size
n.supval(1000, outbuf, len);   //extract data from the supval

//determine size of element 1001 in 'S' array.  The NULL pointer indicates
//that we are not supplying an output buffer.
len = n.supstr(1001, NULL, 0);

char *outstr = new char[len];  //allocate a buffer of sufficient size
n.supval(1001, outstr, len);   //extract data from the supval

//retrieve a supval from array 200, index 500
char buf[1024];
len = n.supval(500, buf, sizeof(buf), (char)200);

Using supvals, it is possible to access any data stored in any array within a netnode. For example, supval functions can be used to store and retrieve altval data by limiting the supset and supval operations to the size of an altval. Reading through netnode.hpp, you will see that this is in fact the case by observing the inlined implementation of the altset function, as shown here:

bool altset(sval_t alt, nodeidx_t value, char tag=atag) {
   return supset(alt, &value, sizeof(value), tag);
}

Hashvals offer yet another interface to netnodes. Rather than being associated with integer indexes, hashvals are associated with key strings. Overloaded versions of the hashset function make it easy to associate integer data or array data with a hash key, while the hashval, hashstr, and hashval_long functions allow retrieval of hashvals when provided with the appropriate hash key. Tag values associated with the hashXXX functions actually choose one of 256 hash tables, with the default table being 'H'. Alternate tables are selected by specifying a tag other than 'H'.

The last interface to netnodes that we will mention is the charval interface. The charval and charset functions offer a simple means to store single-byte data into a netnode array. There is no default array associated with charval storage and retrieval, so you must specify an array tag for every charval operation. Charvals are stored into the same arrays as altvals and supvals, and the charval functions are simply wrappers around 1-byte supvals.

Another capability provided by the netnode class is the ability to iterate over the contents of a netnode array (or hash table). Iteration is performed using XXX1st, XXXnxt, XXXlast, and XXXprev functions that are available for altvals, supvals, hashvals, and charvals. The example in Example 16-5 illustrates iteration across the default altvals array ('A').

Iteration over supvals, charvals, and hashvals is performed in a very similar manner; however, you will find that the syntax varies depending on the type of values being accessed. For example, iteration over hashvals returns hashkeys rather than array indexes, which must then be used to retrieve hashvals.

Example 16-5. Enumerating netnode altvals

netnode n("$ idabook", 0, true);
//Iterate altvals first to last
for (nodeidx_t idx = n.alt1st(); idx != BADNODE; idx = n.altnxt(idx)) {
   ulong val = n.altval(idx);
   msg("Found altval['A'][%d] = %d
", idx, val);
}

//Iterate altvals last to first
for (nodeidx_t idx = n.altlast(); idx != BADNODE; idx = n.altprev(idx)) {
   ulong val = n.altval(idx);
   msg("Found altval['A'][%d] = %d
", idx, val);
}

Deleting Netnodes and Netnode Data

The netnode class also provides functions for deleting individual array elements, the entire contents of an array, or the entire contents of a netnode. Removing an entire netnode is fairly straightforward.

netnode n("$ idabook", 0, true);
n.kill();                        //entire contents of n are deleted

When deleting individual array elements, or entire array contents, you must take care to choose the proper deletion function because the names of the functions are very similar and choosing the wrong form may result in significant loss of data. Commented examples demonstrating deletion of altvals follow:

netnode n("$ idabook", 0, true);
 n.altdel(100);       //delete item 100 from the default altval array ('A')
  n.altdel(100, (char)3); //delete item 100 from altval array 3
 n.altdel();          //delete the entire contents of the default altval array
  n.altdel_all('A'),      //alternative to delete default altval array contents
  n.altdel_all((char)3);  //delete the entire contents of altval array 3;

Note the similarity in the syntax to delete the entire contents of the default altval array and the syntax to delete a single element from the default altval array . If for some reason you fail to specify an index when you want to delete a single element, you may end up deleting an entire array. Similar functions exist to delete supval, charval, and hashval data.

Useful SDK Datatypes

IDA’s API defines a number of C++ classes designed to model components typically found in executable files. The SDK contains classes to describe functions, program sections, data structures, individual assembly language instructions, and individual operands within each instruction. Additional classes are defined to implement the tools that IDA uses to manage the disassembly process. Classes falling into this latter category define general database characteristics, loader module characteristics, processor module characteristics, and plug-in module characteristics, and they define the assembly syntax to be used for each disassembled instruction.

Some of the more common general-purpose classes are described here. We defer discussion of classes that are more specific to plug-ins, loaders, and processor modules until the appropriate chapters covering those topics. Our goal here is to introduce classes, their purposes, and some important data members of each class. Useful functions for manipulating each class are described in Commonly Used SDK Functions in Commonly Used SDK Functions.

area_t (area.hpp)

This struct describes a range of addresses and is the base class for several other classes. The struct contains two data members, startEA (inclusive) and endEA (exclusive), that define the boundaries of the address range. Member functions are defined that compute the size of the address range and that can perform comparisons between two areas.

func_t (funcs.hpp)

This class inherits from area_t. Additional data fields are added to the class to record binary attributes of the function, such as whether the function uses a frame pointer or not, and attributes describing the function’s local variables and arguments. For optimization purposes, some compilers may split functions into several noncontiguous regions within a binary. IDA terms these regions chunks or tails. The func_t class is also used to describe tail chunks.

segment_t (segment.hpp)

The segment_t class is another subclass of area_t. Additional data fields describe the name of the segment, the permissions in effect in the segment (readable, writeable, executable), the type of the segment (code, data, etc.), and the number of bits used in a segment address (16, 32, or 64).

idc_value_t (expr.hpp)

This class describes the contents of an IDC value, which may contain at any time a string, an integer, or a floating-point value. The type is utilized extensively when interacting with IDC functions from within a compiled module.

idainfo (ida.hpp)

This struct is populated with characteristics describing the open database. A single global variable named inf, of type idainfo, is declared in ida.hpp. Fields within this struct describe the name of the processor module that is in use, the input file type (such as f_PE or f_MACHO via the filetype_t enum), the program entry point (beginEA), the minimum address within the binary (minEA), the maximum address in the binary (maxEA), the endianness of the current processor (mf), and a number of configuration settings parsed from ida.cfg.

struc_t (struct.hpp)

This class describes the layout of structured data within a disassembly. It is used to describe structures within the Structures window as well as to describe the composition of function stack frames. A struc_t contains flags describing attributes of the structure (such as whether it is a structure or union or whether the structure is collapsed or expanded in the IDA display window), and it also contains an array of structure members.

member_t (struct.hpp)

This class describes a single member of a structured datatype. Included data fields describe the byte offset at which the member begins and ends within its parent structure.

op_t (ua.hpp)

This class describes a single operand within a disassembled instruction. The class contains a zero-based field to store the number of the operand (n), an operand type field (type), and a number of other fields whose meaning varies depending on the operand type. The type field is set to one of the optype_t constants defined in ua.hpp and describes the operand type or addressing mode used for the operand.

insn_t (ua.hpp)

This class contains information describing a single disassembled instruction. Fields within the class describe the instruction’s address within the disassembly (ea), the instruction’s type (itype), the instruction’s length in bytes (size), and an array of six possible operand values (Operands) of type op_t (IDA limits each instruction to a maximum of six operands). The itype field is set by the processor module. For standard IDA processor modules, the itype field is set to one of the enumerated constants defined in allins.hpp. When a third-party processor module is used, the list of potential itype values must be obtained from the module developer. Note that the itype field generally bears no relationship whatsoever to the binary opcode for the instruction.

The preceding list is by no means a definitive guide to all of the datatypes used within the SDK. This list is intended merely as an introduction to some of the more commonly used classes and some of the more commonly accessed fields within those classes.

Commonly Used SDK Functions

While the SDK is programmed using C++ and defines a number of C++ classes, in many cases the SDK favors traditional C-style nonmember functions for manipulation of objects within a database. For most API datatypes, it is more common to find nonmember functions that require a pointer to an object than it is to find a member function to manipulate the object in the manner you desire.

In the summaries that follow, we cover API functions that provide functionality similar to many of the IDC functions introduced in Chapter 15. It is unfortunate that functions that perform identical tasks are named one thing in IDC and something different within the API.

Basic Database Access

The following functions, declared in bytes.hpp, provide access to individual bytes, words, and dwords within a database.

uchar get_byte(ea_t addr) Reads current byte value from virtual address addr.
ushort get_word(ea_t addr) Reads current word value from virtual address addr.
ulong get_long(ea_t addr) Reads current double word value from virtual address addr.
get_many_bytes(ea_t addr, void *buffer, ssize_t len) Copies len bytes from the addr into the supplied buffer.
patch_byte(ea_t addr, ulong val) Sets a byte value at virtual address addr.
patch_word(long addr, ulonglong val) Sets a word value at virtual address addr.
patch_long(long addr, ulonglong val) Sets a double word value at virtual address addr.
patch_many_bytes(ea_t addr, const void *buffer, size_t len) Patches the database beginning at addr with len bytes from the user-supplied buffer.
ulong get_original_byte(ea_t addr) Reads the original byte value (prior to patching) from virtual address addr.
ulonglong get_original_word(ea_t addr) Reads the original word value from virtual address addr.
ulonglong get_original_long(ea_t addr) Reads the original double word value from virtual address addr.
bool isLoaded(ea_t addr) Returns true if addr contains valid data, false otherwise.

Additional functions exist for accessing alternative data sizes. Note that the get_original_XXX functions get the very first original value, which is not necessarily the value at an address prior to a patch. Consider the case when a byte value is patched twice; over time this byte has held three different values. After the second patch, both the current value and the original value are accessible, but there is no way to obtain the second value (which was set with the first patch).

User Interface Functions

Interaction with the IDA user interface is handled by a single dispatcher function named callui. Requests for various user interface services are made by passing a user interface request (one of the enumerated ui_notification_t constants) to callui along with any additional parameters required by the request. Parameters required for each request type are specified in kernwin.hpp. Fortunately, a number of convenience functions that hide many of the details of using callui directly are also defined in kernwin.hpp. Several common convenience functions are described here:

msg(char *format, ...) Prints a formatted message to the message window. This function is analogous to C’s printf function and accepts a printf-style format string.
warning(char *format, ...) Displays a formatted message in a dialog.
char *askstr(int hist, char *default, char *format, ...) Displays an input dialog asking the user to enter a string value. The hist parameter dictates how the drop-down history list in the dialog should be populated and should be set to one of the HIST_xxx constants defined in kernwin.hpp. The format string and any additional parameters are use to form a prompt string.
char *askfile_c(int dosave, char *default, char *prompt, ...) Displays a file save (dosave = 1) or file open (dosave = 0) dialog, initially displaying the directory and file mask specified by default (such as C:\windows\*.exe). Returns the name of the selected file or NULL if the dialog was canceled.
askyn_c(int default, char *prompt, ...) Prompts the user with a yes or no question, highlighting a default answer (1 = yes, 0 = no,-1 = cancel). Returns an integer representing the selected answer.
AskUsingForm_c(const char *form, ...) The form parameter is an ASCII string specification of a dialog and its associated input elements. This function may be used to build customized user interface elements when none of the SDK’s other convenience functions meet your needs. The format of the form string is detailed in kernwin.hpp.
get_screen_ea() Returns the virtual address of the current cursor location.
jumpto(ea_t addr) Jumps the disassembly window to the specified address.

Many more user interface capabilities are available using the API than are available with IDC scripting, including the ability to create customized single- and multicolumn list selection dialogs. Users interested in these capabilities should consult kernwin.hpp and the choose and choose2 functions in particular.

Manipulating Database Names

The following functions are available for working with named locations within a database:

get_name(ea_t from, ea_t addr, char *namebuf, size_t maxsize) Returns the name associated with addr. Returns the empty string if the location has no name. This function provides access to local names when from is any address in the function that contains addr. The name is copied into the provided output buffer.
set_name(ea_t addr, char *name, int flags) Assigns the given name to the given address. The name is created with attributes specified in the flags bitmask. Possible flag values are described in name.hpp.
get_name_ea(ea_t funcaddr, char *localname) Searches for the given local name within the function containing funcaddr. Returns the address of the name or BADADDR (-1) if no such name exists in the given function.

Function Manipulation

The API functions for accessing information about disassembled functions are declared in funcs.hpp. Functions for accessing stack frame information are declared in frame.hpp. Some of the more commonly used functions are described here:

func_t *get_func(ea_t addr) Returns a pointer to a func_t object that describes the function containing the indicated address.
size_t get_func_qty() Returns the number of functions present in the database.
func_t *getn_func(size_t n) Returns a pointer to a func_t object that represents the nth function in the database where n is between zero (inclusive) and get_func_qty() (exclusive).
func_t *get_next_func(ea_t addr) Returns a pointer to a func_t object that describes the next function following the specified address.
get_func_name(ea_t addr, char *name, size_t namesize) Copies the name of the function containing the indicated address into the supplied name buffer.
struc_t *get_frame(ea_t addr) Returns a pointer to a struc_t object that describes the stack frame for the function that contains the indicated address.

Structure Manipulation

The struc_t class is used to access function stack frames as well as structured datatypes defined within type libraries. Some of the basic functions for interacting with structures and their associated members are described here. Many of these functions make use of a type ID (tid_t) datatype. The API includes functions for mapping a struc_t to an associated tid_t and vice versa. Note that both the struc_t and member_t classes contain a tid_t data member, so obtaining type ID information is simple if you already have a pointer to a valid struc_t or member_t object.

tid_t get_struc_id(char *name) Looks up the type ID of a structure given its name.
struc_t *get_struc(tid_t id) Obtains a pointer to a struc_t representing the structure specified by the given type ID.
asize_t get_struc_size(struc_t *s) Returns the size of the given structure in bytes.
member_t *get_member(struc_t *s, asize_t offset) Returns a pointer to a member_t object that describes the structure member that resides at the specified offset into the given structure.
member_t *get_member_by_name(struc_t *s, char *name) Returns a pointer to a member_t object that describes the structure member identified by the given name.
tid_t add_struc(uval_t index, char *name, bool is_union=false) Appends a new structure with the given name into the standard structures list. The structure is also added to the Structures window at the given index. If index is BADADDR, the structure is added as the last structure in the Structures window.
add_struc_member(struc_t *s, char *name, ea_t offset, flags_t flags, typeinfo_t *info, asize_t size) Adds a new member with the given name to the given structure. The member is either added at the indicated offset within the structure or appended to the end of the structure if offset is BADADDR. The flags parameter describes the datatype of the new member. Valid flags are defined using the FF_XXX constants described in bytes.hpp. The info parameter provides additional information for complex datatypes; it may be set to NULL for primitive datatypes. The typeinfo_t datatype is defined in nalt.hpp. The size parameter specifies the number of bytes occupied by the new member.

Segment Manipulation

The segment_t class stores information related to the different segments within a database (such as .text and .data) as listed in the View ▸ Open Subviews ▸ Segments window. Recall that what IDA terms segments are often referred to as sections by various executable file formats such as PE and ELF. The following functions provide basic access to segment_t objects. Additional functions dealing with the segment_t class are declared in segment.hpp.

segment_t *getseg(ea_t addr) Returns a pointer to the segment_t object that contains the given address.
segment_t *ida_export get_segm_by_name(char *name) Returns a pointer to the segment_t object with the given name.
add_segm(ea_t para, ea_t start, ea_t end, char *name, char *sclass) Creates a new segment in the current database. The segment’s boundaries are specified with the start (inclusive) and end (exclusive) address parameters, while the segment’s name is specified by the name parameter. The segment’s class loosely describes the type of segment being created. Predefined classes include CODE and DATA. A complete list of predefined classes may be found in segment.hpp. The para parameter describes the base address of the section when segmented addresses (seg:offset) are being used, in which case start and end are interpreted as offsets rather than as virtual addresses. When segmented addresses are not being used, or all segments are based at 0, this parameter should be set to 0.
add_segm_ex(segment_t *s, char *name, char *sclass, int flags) Alternate method for creating new segments. The fields of s should be set to reflect the address range of the segment. The segment is named and typed according to the name and sclass parameters. The flags parameter should be set to one of the ADDSEG_XXX values defined in segment.hpp.
int get_segm_qty() Returns the number of sections present within the database.
segment_t *getnseg(int n) Returns a pointer to a segment_t object populated with information about the nth program section in the database.
int set_segm_name(segment_t *s, char *name, ...) Changes the name of the given segment. The name is formed by treating name as a format string and incorporating any additional parameters as required by the format string.
get_segm_name(ea_t addr, char *name, size_t namesize) Copies the name of the segment containing the given address into the user-supplied name buffer. Note the name may be filtered to replace characters that IDA considers invalid (characters not specified as NameChars in ida.cfg) with a dummy character (typically an underscore as specified by SubstChar in ida.cfg).
get_segm_name(segment_t *s, char *name, size_t namesize) Copies the potentially filtered name of the given segment into the user-supplied name buffer.
get_true_segm_name(segment_t *s, char *name, size_t namesize) Copies the exact name of the given segment into the user-supplied name buffer without filtering any characters.

One of the add_segm functions must be used to actually create a segment. Simply declaring and initializing a segment_t object does not actually create a segment within the database. This is true with all of the wrapper classes such as func_t and struc_t. These classes merely provide a convenient means to access attributes of an underlying database entity. The appropriate functions to create, modify, or delete actual database objects must be utilized in order to make persistent changes to the database.

Code Cross-References

A number of functions and enumerated constants are defined in xref.hpp for use with code cross-references. Some of these are described here:

get_first_cref_from(ea_t from) Returns the first location to which the given address transfers control. Returns BADADDR (-1) if the given address refers to no other addresses.
get_next_cref_from(ea_t from, ea_t current) Returns the next location to which the given address (from) transfers control, given that current has already been returned by a previous call to get_first_cref_from or get_next_cref_from. Returns BADADDR if no more cross-references exist.
get_first_cref_to(ea_t to) Returns the first location that transfers control to the given address. Returns BADADDR (-1) if there are no references to the given address.
get_next_cref_to(ea_t to, ea_t current) Returns the next location that transfers control to the given address (to), given that current has already been returned by a previous call to get_first_cref_to or get_next_cref_to. Returns BADADDR if no more cross-references to the given location exist.

Data Cross-References

The functions for accessing data cross-reference information (also declared in xref.hpp) are very similar to the functions used to access code cross-reference information. These functions are described here:

get_first_dref_from(ea_t from) Returns the first location to which the given address refers to a data value. Returns BADADDR (-1) if the given address refers to no other addresses.
get_next_dref_from(ea_t from, ea_t current) Returns the next location to which the given address (from) refers a data value, given that current has already been returned by a previous call to get_first_dref_from or get_next_dref_from. Returns BADADDR if no more cross-references exist.
get_first_dref_to(ea_t to) Returns the first location that refers to the given address as data. Returns BADADDR (-1) if there are no references to the given address.
get_next_dref_to(ea_t to, ea_t current) Returns the next location that refers to the given address (to) as data, given that current has already been returned by a previous call to get_first_dref_to or get_next_dref_to. Returns BADADDR if no more cross-references to the given location exist.

The SDK contains no equivalent to IDC’s XrefType function. A variable named lastXR is declared in xref.hpp; however, it is not exported. If you need to determine the exact type of a cross-reference, you must iterate cross-references using an xrefblk_t structure. The xrefblk_t is described in “Enumerating Cross-References

Iteration Techniques Using the IDA API

Using the IDA API, there are often several different ways to iterate over various database objects. In the following examples we demonstrate some common iteration techniques:

Enumerating Functions

The first technique for iterating through the functions within a database mimics the manner in which we performed the same task using IDC:

for (func_t *f = get_next_func(0); f != NULL; f = get_next_func(f->startEA)) {
   char fname[1024];
   get_func_name(f->startEA, fname, sizeof(fname));
   msg("%08x: %s
", f->startEA, fname);
}

Alternatively, we can simply iterate through functions by index numbers, as shown in the next example:

for (int idx = 0; idx < get_func_qty(); idx++) {
   char fname[1024];
   func_t *f = getn_func(idx);
   get_func_name(f->startEA, fname, sizeof(fname));
   msg("%08x: %s
", f->startEA, fname);
}

Finally, we can work at a somewhat lower level and make use of a data structure called an areacb_t, also known as an area control block, defined in area.hpp. Area control blocks are used to maintain lists of related area_t objects. A global areacb_t named funcs is exported (in funcs.hpp) as part of the IDA API. Using the areacb_t class, the previous example can be rewritten as follows:

 int a = funcs.get_next_area(0);
  while (a != −1) {
     char fname[1024];
    func_t *f = (func_t*)funcs.getn_area(a);  // getn_area returns an area_t
     get_func_name(f->startEA, fname, sizeof(fname));
     msg("%08x: %s
", f->startEA, fname);
    a = funcs.get_next_area(f->startEA);
  }

In this example, the get_next_area member function and is used repeatedly to obtain the index values for each area in the funcs control block. A pointer to each related func_t area is obtained by supplying each index value to the getn_area member function . Several global areacb_t variables are declared within the SDK, including the segs global, which is an area control block containing segment_t pointers for each section in the binary.

Enumerating Structure Members

Within the SDK, stack frames are modeled using the capabilities of the struc_t class. The example in Example 16-6 utilizes structure member iteration as a means of printing the contents of a stack frame.

Example 16-6. Enumerating stack frame members

func_t *func = get_func(get_screen_ea());  //get function at cursor location
msg("Local variable size is %d
", func->frsize);
msg("Saved regs size is %d
", func->frregs);
struc_t *frame = get_frame(func);          //get pointer to stack frame
if (frame) {
   size_t ret_addr = func->frsize + func->frregs;  //offset to return address
   for (size_t m = 0; m < frame->memqty; m++) {    //loop through members
      char fname[1024];
      get_member_name(frame->members[m].id, fname, sizeof(fname));
      if (frame->members[m].soff < func->frsize) {
         msg("Local variable ");
      }
      else if (frame->members[m].soff > ret_addr) {
         msg("Parameter ");
      }
      msg("%s is at frame offset %x
", fname, frame->members[m].soff);
      if (frame->members[m].soff == ret_addr) {
         msg("%s is the saved return address
", fname);
      }
   }
}

This example summarizes a function’s stack frame using information from the function’s func_t object and the associated struc_t representing the function’s stack frame. The frsize and and frregs fields specify the size of the local variable portion of the stack frame and the number of bytes dedicated to saved registers, respectively. The saved return address can be found within the frame following the local variables and the saved registers. Within the frame itself, the memqty field specifies the number of defined members contained in the frame structure, which also corresponds to the size of the members array. A loop is used to retrieve the name of each member and determine whether the member is a local variable or an argument based on its starting offset (soff) within the frame structure.

Enumerating Cross-References

In Chapter 15 we saw that it is possible to enumerate cross-references from IDC scripts. The same capabilities exist within the SDK, though in a some-what different form. As an example, let’s revisit the idea of listing all calls of a particular function (see Example 15-4 in Enumerating Exported Functions). The following function almost works.

void list_callers(char *bad_func) {
   char name_buf[MAXNAMELEN];
   ea_t func = get_name_ea(BADADDR, bad_func);
   if (func == BADADDR) {
      warning("Sorry, %s not found in database", bad_func);
   }
   else {
      for (ea_t addr = get_first_cref_to(func); addr != BADADDR;
           addr = get_next_cref_to(func, addr)) {
         char *name = get_func_name(addr, name_buf, sizeof(name_buf));
         if (name) {
            msg("%s is called from 0x%x in %s
", bad_func, addr, name);
         }
         else {
            msg("%s is called from 0x%x
", bad_func, addr);
         }
      }
   }
}

The reason this function almost works is that there is no way to determine the type of cross-reference returned for each iteration of the loop (recall that there is no SDK equivalent for IDC’s XrefType). In this case we should verify that each cross-reference to the given function is in fact a call type (fl_CN or fl_CF) cross-reference.

When you need to determine the type of a cross-reference within the SDK, you must use an alternative form of cross-reference iteration facilitated by the xrefblk_t structure, which is described in xref.hpp. The basic layout of an xrefblk_t is shown in the following listing. (For full details, please see xref.hpp.)

struct xrefblk_t {
    ea_t from;     // the referencing address - filled by first_to(),next_to()
    ea_t to;       // the referenced address - filled by first_from(), next_from()
    uchar iscode;  // 1-is code reference; 0-is data reference
    uchar type;    // type of the last returned reference
    uchar user;    // 1-is user defined xref, 0-defined by ida

    //fill the "to" field with the first address to which "from" refers.
   bool first_from(ea_t from, int flags);

    //fill the "to" field with the next address to which "from" refers.
    //This function assumes a previous call to first_from.
   bool next_from(void);

    //fill the "from" field with the first address that refers to "to".
   bool first_to(ea_t to,int flags);

    //fill the "from" field with the next address that refers to "to".
    //This function assumes a previous call to first_to.
   bool next_to(void);
  };

The member functions of xrefblk_t are used to initialize the structure and and perform the iteration and , while the data members are used to access information about the last cross-reference that was retrieved. The flags value required by the first_from and first_to functions dictates which type of cross-references should be returned. Legal values for the flags parameter include the following (from xref.hpp):

#define XREF_ALL        0x00            // return all references
#define XREF_FAR        0x01            // don't return ordinary flow xrefs
#define XREF_DATA       0x02            // return data references only

Note that no flag value restricts the returned references to code only. If you are interested in code cross-references, you must either compare the xrefblk_t type field to specific cross-reference types (such as fl_JN) or test the iscode field to determine if the last returned cross-reference was a code cross-reference.

The following modified version of the list_callers function demonstrates the use of an xrefblk_t iteration structure.

void list_callers(char *bad_func) {
     char name_buf[MAXNAMELEN];
     ea_t func = get_name_ea(BADADDR, bad_func);
     if (func == BADADDR) {
        warning("Sorry, %s not found in database", bad_func);
     }
     else {
        xrefblk_t xr;
        for (bool ok = xr.first_to(func, XREF_ALL); ok; ok = xr.next_to()) {
          if (xr.type != fl_CN && xr.type != fl_CF) continue;
           char *name = get_func_name(xr.from, name_buf, sizeof(name_buf));
           if (name) {
              msg("%s is called from 0x%x in %s
", bad_func, xr.from, name);
           }
           else {
              msg("%s is called from 0x%x
", bad_func, xr.from);
           }
        }
     }
  }

Through the use of an xrefblk_t, we now have the opportunity to examine the type of each cross-reference returned by the iterator and decide whether it is interesting to us or not. In this example we simply ignore any cross-reference that is not related to a function call. We did not use the iscode member of xrefblk_t because iscode is true for jump and ordinary flow cross-references in addition to call cross-references. Thus, iscode alone does not guarantee that the current cross-reference is related to a function call.



[116] Binary large object, or blob, is a term often used to refer to arbitray binary data of varying size.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.189.228