IDA’s API is defined by the contents of the header files in <SDKDIR>/include. There is no single-source index of available functions (though Steve Micallef has collected a rather nice subset in his plug-in writing guide). Many prospective SDK programmers find this fact initially difficult to come to terms with. The reality is that there is never an easy-to-find answer to the question, “How do I do x using the SDK?” The two principal options for answering such questions are to post the questions to an IDA user’s forum or attempt to answer them yourself by searching through the API documentation. What documentation, you say? Why, the header files, of course. Granted, these are not the most searchable of documents, but they do contain the complete set of API features. In this case, grep
(or a suitable replacement, preferably built into your programming editor) is your friend. The catch is knowing what to search for, which is not always obvious.
There are a few ways to try to narrow your searches through the API. The first way is to leverage your knowledge of the IDC scripting language and attempt to locate similar functionality within the SDK using keywords and possibly function names derived from IDC. However—and this is an extremely frustrating point—while the SDK may contain functions that perform tasks identical to those of IDC functions, the names of those functions are seldom identical. This results in programmers learning two sets of API calls, one for use with IDC and one for use with the SDK. In order to address this situation, Appendix B presents a complete list of IDC functions and the corresponding SDK 6.1 actions that are carried out to execute those functions.
The second technique for narrowing down SDK-related searches is to become familiar with the content and, more important, the purpose of the various SDK header files. In general, related functions and associated data structures are grouped into headers files based on functional groups. For example, SDK functions that allow interaction with a user are grouped into kernwin.hpp. When a grep
-style search fails to locate a capability that you require, some knowledge of which header file relates to that capability will narrow your search and hopefully limit the number of files that you need to dig deeper into.
While the SDK’s readme.txt files provide a high-level overview of the most commonly used header files, this section highlights some other useful information for working with these files. First, the majority of the header files use the .hpp suffix, while a few use the .h suffix. This can easily lead to trivial errors when naming header files to be included in your files. Second, ida.hpp is the main header file for the SDK and should be included in all SDK-related projects. Third, the SDK utilizes preprocessor directives designed to preclude access to functions that Hex-Rays considers dangerous (such as strcpy
and sprintf
). For a complete list of these functions refer to the USE_DANGEROUS_FUNCTIONS
macro prior to including ida.hpp in your own files. An example is shown here:
#define USE_DANGEROUS_FUNCTIONS #include <ida.hpp>
Failure to define USE_DANGEROUS_FUNCTIONS
will result in a build error to the effect that dont_use_snprintf
is an undefined symbol (in the case of an attempt to use the snprintf
function). In order to compensate for restricting access to these so-called dangerous functions, the SDK defines safer equivalents for each, generally in the form of a qstr
XXXX
function such as qstrncpy
and qsnprintf
. These safer versions are also declared in pro.h.
Along similar lines, the SDK restricts access to many standard file input/output variables and functions such as stdin
, stdout
, fopen
, fwrite
, and fprintf
. This restriction is due in part to limitations of the Borland compiler. Here again the SDK defines replacement functions in the form of q
XXX
counterparts such as qfopen
and qfprintf
. If you require access to the standard file functions, then you must define the USE_STANDARD_FILE_FUNCTIONS
macro prior to including fpro.h (which is included from kernwin.hpp, which is, in turn, included from several other files).
In most cases, each SDK header file contains a brief description of the file’s purpose and fairly extensive comments describing the data structures and functions that are declared in the file. Together these comments constitute IDA’s API documentation. Brief descriptions of some of the more commonly used SDK header files follow.
This file defines the area_t
struct, which represents a contiguous block of addresses within a database. This struct serves as the base class for several other classes that build on the concept of an address range. It is seldom necessary to include this file directly, as it is typically included in files defining subclasses of area_t
.
This file declares functions used to work with IDA’s autoanalyzer. The autoanalyzer performs queued analysis tasks when IDA is not busy processing userinput events.
This file declares functions for working with individual database bytes. Functions declared in this file are used to read and write individual database bytes as well as manipulate the characteristics of those bytes. Miscellaneous functions also provide access to flags associated with instruction operands, while other functions allow manipulation of regular and repeatable comments.
This file declares functions offering programmatic control of IDA’s debugger.
This header declares functions for working with a file’s entry points. For shared libraries, each exported function or data value is considered an entry point.
This file declares functions and data structures for working with IDC constructs. It is possible to modify existing IDC functions, add new IDC functions, or execute IDC statements from within modules.
This file contains the alternative file I/O functions, such as qfopen
, discussed previously.
This header contains functions used to manipulate stack frames.
This header contains functions and data structures for working with disassembled functions as well as functions for working with FLIRT signatures.
This file declares support routines for generating graphs using either DOT or GDL.
This is the main header file required for working with the SDK. This file contains the definition of the idainfo
structure as well as the declaration of the global variable inf
, which contains a number of fields containing information about the current database as well as fields initialized from configuration file settings.
This file contains declarations of structures that form the foundation of processor modules. The global variable ph
, which describes the current processor module, and the global variable ash
, which describes the current assembler, are defined in this file.
This file declares functions for interacting with the user and the user interface. The SDK equivalents of IDC’s Ask
XXX
functions are declared here, as are functions used to set the display position and configure hotkey associations.
This file declares functions for generating formatted, colorized disassembly lines.
This file contains the declarations for the loader_t
and plugin_t
structures required for the creation of loader modules and plug-in modules, respectively, as well as functions useful during the file-loading phase and functions for activating plug-ins.
This file declares functions for manipulating named locations (as opposed to names within structures or stack frames, which are covered in stuct.hpp and funcs.hpp, respectively).
Netnodes are the lowest-level storage structure accessible via the API. The details of netnodes are typically hidden by the IDA user interface. This file contains the definition of the netnode
class and functions for low-level manipulation of netnodes.
This file includes the top-level typedefs and macros required in any SDK module. You do not need to explicitly include this file in your projects, as it is included from ida.hpp. Among other things, the IDA_SDK_VERSION
macro is defined in this file. IDA_SDK_VERSION
provides a means to determine with which version of the SDK a module is being built, and it can be tested to provide conditional compilation when using different versions of the SDK. Note that IDA_SDK_VERSION
was introduced with SDK version 5.2. Prior to SDK 5.2, there is no official way to determine which SDK is being used. An unofficial header file that defines IDA_SDK_VERSION
for older versions of the SDK (sdk_versions.h) is available on this book’s website.
This file declares functions for performing different types of searches on a database.
This file contains the declaration of the segment_t
class, a subclass of area_t
, which is used to describe individual sections (.text
, .data
, etc.) within a binary. Functions for working with segments are also declared here.
This file contains the declaration of the struc_t
class and functions for manipulating structures within a database.
This file declares functions for working with IDA type libraries. Among other things, functions declared here offer access to function signatures, including function return types and parameter sequences.
This file declares the op_t
and insn_t
classes used extensively in processor modules. Also declared here are functions used for disassembling individual instructions and for generating the text for various portions of each disassembled line.
This file declares the datatypes and functions required for adding, deleting, and iterating code and data cross-references.
The preceding list describes approximately half of the header files that ship with the SDK. You are encouraged to familiarize yourself not only with the files in this list but also with all of the other header files as well, as you dig deeper into the SDK. Functions that make up the published API are marked as ida_export
. Only functions designated as ida_export
are exported in the link libraries that ship with the SDK. Don’t be misled by the use of idaapi
, as it merely signifies that a function is to use the stdcall
calling convention on Windows platforms only. You may occasionally run across interesting-looking functions that are not designated as ida_export
; you cannot use these functions in your modules.
Much of IDA’s API is built around C++ classes that model various aspects of a disassembled binary. The netnode
class, on the other hand, seems wrapped in mystery because it appears to have no direct relationship to constructs within binary files (sections, functions, instructions, etc.).
Netnodes are the lowest-level and most-general-purpose data storage mechanism accessible within an IDA database. As a module programmer, you will seldom be required to work directly with netnodes. Many of the higher-level data structures hide the fact that they ultimately rely on netnodes for persistent storage within a database. Some of the ways that netnodes are used within a database are detailed in the file nalt.hpp, in which we learn, for example, that information about the shared libraries and functions that a binary imports is stored in a netnode named import_node
(yes, netnodes may have names). Netnodes are also the persistent storage mechanisms that facilitate IDC’s global arrays.
Netnodes are described in extensive detail in the file netnode.hpp. But from a high-level perspective, netnodes are storage structures used internally by IDA for a variety of purposes. However, their precise structure is kept hidden, even to SDK programmers. To provide an interface to these storage structures, the SDK defines a netnode
class, which functions as an opaque wrapper around this internal storage structure. The netnode
class contains a single data member called netnodenumber
, which is an integer identifier used to access the internal representation of a netnode. Every netnode is uniquely identified by its netnodenumber
. On 32-bit systems the netnodenumber
is a 32-bit quantity, allowing for 232 unique netnodes. On 64-bit systems, a netnodenumber
is a 64-bit integer, which allows for 264 unique netnodes. In most cases, the netnodenumber
represents a virtual address within the database, which creates a natural mapping between each address within a database and any net-node that might be required to store information associated with that address. Comment text is an example of arbitrary information that may be associated with an address and thus stored within a netnode associated with that address.
The recommended way to manipulate netnodes is by invoking member functions of the netnode
class using an instantiated netnode
object. Reading through netnode.hpp, you will notice that a number of nonmember functions exist that seem to support netnode manipulation. Use of these functions is discouraged in favor of member functions. You will note, however, that most of the member functions in the netnode
class are thin wrappers around one of the nonmember functions.
Internally, netnodes can be used to store several different types of information. Each netnode may be associated with a name of up to 512 characters and a primary value of up to 1,024 bytes. Member functions of the netnode
class are provided to retrieve (name
) or modify (rename
) a netnode’s name. Additional member functions allow you to treat a netnode’s primary value as an integer (set_long
, long_value
), a string (set
, valstr
), or an arbitrary binary blob[116] (set
, valobj
). The function used inherently determines how the primary value is treated.
Here is where things get a little complicated. In addition to a name and a primary value, every netnode
is also capable of storing 256 sparse arrays in which the array elements can be arbitrarily sized with values up to a maximum of 1,024 bytes each. These arrays fall into three overlapping categories. The first category of arrays is indexed using 32-bit index values and can potentially hold in excess of 4 billion items. The second category of arrays is indexed using 8-bit index values and can thus hold up to 256 items. The last category of arrays is actually hash tables that use strings for keys. Regardless of which of the three categories is used, each element of the array will accept values up to 1,024 bytes in size. In short, a netnode can hold a tremendous amount of data—now we just need to learn how to make it all happen.
If you are wondering where all of this information gets stored, you are not alone. All netnode content is stored within btree nodes in an IDA database. Btree nodes in turn are stored in an ID0 file, which in turn is archived into an IDB file when you close your database. Any netnode content that you create will not be visible in any of IDA’s display windows; the data is yours to manipulate as you please. This is why netnodes are an ideal place for persistent storage for any plug-ins and scripts that you may wish to use to store results from one invocation to the next.
A potentially confusing point about netnodes is that declaring a netnode
variable within one of your modules does not necessarily create an internal representation of that netnode within the database. A netnode is not created internally until one of the following events takes place:
The netnode is assigned a name.
The netnode is assigned a primary value.
A value is stored into one of the netnode’s internal arrays.
There are three constructors available for declaring netnodes within your modules. The prototypes for each, extracted from netnode.hpp, and examples of their use are shown in Example 16-1.
Example 16-1. Declaring netnodes
#ifdef __EA64__ typedef ulonglong nodeidx_t; #else typedef ulong nodeidx_t; #endif class netnode { netnode(); netnode(nodeidx_t num); netnode(const char *name, size_t namlen=0, bool do_create=false); bool create(const char *name, size_t namlen=0); bool create(); //... remainder of netnode class follows }; netnode n0; //uses netnode n1(0x00401110); //uses netnode n2("$ node 2"); //uses netnode n3("$ node 3", 0, true); //uses
In this example, only one netnode (n3
) is guaranteed to exist within the database after the code has executed. Netnodes n1
and n2
may exist if they had been previously created and populated with data. Whether it previously existed or not, n1
is capable of receiving new data at this point. If n2
did not exist, meaning that no netnode named $ node 2
could be found in the database, then n2
must be explicitly created ( or ) before data can be stored into it. If we want to guarantee that we can store data into n2
, we need to add the following safety check:
if (BADNODE == (nodeidx_t)n2) { n2.create("$ node 2"); }
The preceding example demonstrates the use of the nodeidx_t
operator, which allows a netnode to be cast to a nodeidx_t
. The nodeidx_t
operator simply returns the netnodenumber
data member of the associated netnode and allows netnode
variables to be easily converted into integers.
An important point to understand about netnodes is that a netnode must have a valid netnodenumber
before you can store data into the netnode. A netnodenumber
may be explicitly assigned, as with n1
via a constructor shown at in the previous example. Alternatively, a netnodenumber
may be internally generated when a netnode is created using the create
flag in a constructor (as with n3
via a constructor shown in ) or via the create
function (as with n2
). Internally assigned netnodenumbers
begin with 0xFF000000
and increment with each newly created netnode.
We have thus far neglected netnode n0
in our example. As things currently stand, n0
has neither a number nor a name. We could create n0
by name using the create
function in a manner similar to n2
. Or we could use the alternate form of create
to create an unnamed netnode with a valid, internally generated netnodenumber
, as shown here:
n0.create(); //assign an internally generated netnodenumber to n0
At this point it is possible to store data into n0
, though we have no way to retrieve that data in the future unless we record the assigned netnodenumber
somewhere or assign n0
a name. This demonstrates the fact that netnodes are easy to access when they are associated with a virtual address (similar to n1
in our example). For all other netnodes, assigning a name makes it possible to perform a named lookup for all future references to the netnode (as with n2
and n3
in our example).
Note that for our named netnodes, we have chosen to use names prefixed with “$
”, which is in keeping with the practice, recommended in netnode.hpp, for avoiding conflicts with names IDA uses internally.
Now that you understand how to create a netnode that you can store data into, let’s return to the discussion of the internal array storage capability of net-nodes. To store a value into an array within a netnode, we need to specify five pieces of information: an index value, an index size (8 or 32 bits), a value to store, the number of bytes the value contains, and an array (one of 256 available for each category of array) in which to store the value. The index size parameter is specified implicitly by the function that we use to store or retrieve the data. The remaining values are passed into that function as parameters. The parameter that selects which of the 256 possible arrays a value is stored in is usually called a tag, and it is often specified (though it need not be) using a character. The netnode documentation distinguishes among a few special types of values termed altvals, supvals, and hashvals. By default, each of these values is typically associated with a specific array tag: 'A'
for altvals, 'S'
for supvals, and 'H'
for hashvals. A fourth type of value, called a charval, is not associated with any specific array tag.
It is important to understand that these value types are associated more with a specific way of storing data into a netnode than with a specific array within a netnode. It is possible to store any type of value in any array simply by specifying an alternate array tag when storing data. In all cases, it is up to you to remember what type of data you stored into a particular array location so that you can use retrieval methods appropriate to the type of the stored data.
Altvals provide a simple interface for storing and retrieving integer data in netnodes. Altvals may be stored into any array within a netnode but default to the 'A'
array. Regardless of which array you wish to store integers into, using the altval-related functions greatly simplifies matters. The code in Example 16-2 demonstrates data storage and retrieval using altvals.
Example 16-2. Accessing netnode altvals
netnode n("$ idabook", 0, true); //create the netnode if it doesn't exist sval_t index = 1000; //sval_t is a 32 bit type, this example uses 32-bit indexes ulong value = 0x12345678; n.altset(index, value); //store value into the 'A' array at index value = n.altval(index); //retrieve value from the 'A' array at index n.altset(index, value, (char)3); //store into array 3 value = n.altval(index, (char)3); //read from array 3
In this example, you see a pattern that will be repeated for other types of netnode values, namely, the use of an XXX
set
function (in this case, altset
) to store a value into a netnode and an XXX
val
function (in this case, altval
) to retrieve a value from a netnode. If we want to store integers into arrays using 8-bit index values, we need to use slightly different functions, as shown in the next example.
netnode n("$ idabook", 0, true); uchar index = 80; //this example uses 8-bit index values ulong value = 0x87654321; n.altset_idx8(index, value, 'A'), //store, no default tags with xxx_idx8 functions value = n.altval_idx8(index, 'A'), //retrieve value from the 'A' array at index n.altset_idx8(index, value, (char)3); //store into array 3 value = n.altval_idx8(index, (char)3); //read from array 3
Here you see that the general rule of thumb for the use of 8-bit index values is to use a function with an _idx8
suffix. Also note that none of the _idx8
functions provide default values for the array tag parameter.
Supvals represent the most versatile means of storing and retrieving data in netnodes. Supvals represent data of arbitrary size, from 1 byte to a maximum of 1,024 bytes. When using 32-bit index values, the default array for storing and retrieving supvals is the 'S'
array. Again, however, supvals can be stored into any of the 256 available arrays by specifying an appropriate array tag value. Strings are a common form of arbitrary length data and as such are afforded special handling in supval manipulation functions. The code in Example 16-3 provides examples of storing supvals into a netnode.
Example 16-3. Storing netnode supvals
netnode n("$ idabook", 0, true); //create the netnode if it doesn't exist char *string_data = "example supval string data"; char binary_data[] = {0xfe, 0xdc, 0x4e, 0xc7, 0x90, 0x00, 0x13, 0x8a, 0x33, 0x19, 0x21, 0xe5, 0xaa, 0x3d, 0xa1, 0x95}; //store binary_data into the 'S' array at index 1000, we must supply a //pointer to data and the size of the data n.supset(1000, binary_data, sizeof(binary_data)); //store string_data into the 'S' array at index 1001. If no size is supplied, //or size is zero, the data size is computed as: strlen(data) + 1 n.supset(1001, string_data); //store into an array other than 'S' (200 in this case) at index 500 n.supset(500, binary_data, sizeof(binary_data), (char)200);
The supset
function requires an array index, a pointer to some data, the length of the data (in bytes), and an array tag that defaults to 'S'
if omitted. If the length parameter is omitted, it defaults to zero. When the length is specified as zero, supset
assumes that the data being stored is a string, computes the length of the data as strlen
(data) + 1, and stores a null termination character along with the string data.
Retrieving data from a supval takes a little care, as you may not know the amount of data contained within the supval before you attempt to retrieve it. When you retrieve data from a supval, bytes are copied out of the netnode into a user-supplied output buffer. How do you ensure that your output buffer is of sufficient size to receive the supval data? The first method is to retrieve all supval data into a buffer that is at least 1,024 bytes. The second method is to preset the size of your output buffers by querying the size of the supval. Two functions are available for retrieving supvals. The supval
function is used to retrieve arbitrary data, while the supstr
function is specialized for retrieving string data. Each of these functions expects a pointer to your output buffer along with the size of the buffer. The return value for supval
is the number of bytes copied into the output buffer, while the return value for supstr
is the length of the string copied to the output buffer not including the null terminator, even though the null terminator is copied to the buffer. Each of these functions recognizes the special case in which a NULL
pointer is supplied in place of an output buffer pointer. In such cases, supval
and supstr
return the number of bytes of storage (including any null terminator) required to hold the supval data. Example 16-4 demonstrates retrieval of supval data using the supval
and supstr
functions.
Example 16-4. Retrieving netnode supvals
//determine size of element 1000 in 'S' array. The NULL pointer indicates //that we are not supplying an output buffer int len = n.supval(1000, NULL, 0); char *outbuf = new char[len]; //allocate a buffer of sufficient size n.supval(1000, outbuf, len); //extract data from the supval //determine size of element 1001 in 'S' array. The NULL pointer indicates //that we are not supplying an output buffer. len = n.supstr(1001, NULL, 0); char *outstr = new char[len]; //allocate a buffer of sufficient size n.supval(1001, outstr, len); //extract data from the supval //retrieve a supval from array 200, index 500 char buf[1024]; len = n.supval(500, buf, sizeof(buf), (char)200);
Using supvals, it is possible to access any data stored in any array within a netnode. For example, supval functions can be used to store and retrieve altval data by limiting the supset and supval operations to the size of an altval. Reading through netnode.hpp, you will see that this is in fact the case by observing the inlined implementation of the altset
function, as shown here:
bool altset(sval_t alt, nodeidx_t value, char tag=atag) { return supset(alt, &value, sizeof(value), tag); }
Hashvals offer yet another interface to netnodes. Rather than being associated with integer indexes, hashvals are associated with key strings. Overloaded versions of the hashset
function make it easy to associate integer data or array data with a hash key, while the hashval
, hashstr
, and hashval_long
functions allow retrieval of hashvals when provided with the appropriate hash key. Tag values associated with the hash
XXX
functions actually choose one of 256 hash tables, with the default table being 'H'
. Alternate tables are selected by specifying a tag other than 'H'
.
The last interface to netnodes that we will mention is the charval interface. The charval
and charset
functions offer a simple means to store single-byte data into a netnode array. There is no default array associated with charval storage and retrieval, so you must specify an array tag for every charval operation. Charvals are stored into the same arrays as altvals and supvals, and the charval functions are simply wrappers around 1-byte supvals.
Another capability provided by the netnode
class is the ability to iterate over the contents of a netnode array (or hash table). Iteration is performed using XXX
1st
, XXX
nxt
, XXX
last
, and XXX
prev
functions that are available for altvals, supvals, hashvals, and charvals. The example in Example 16-5 illustrates iteration across the default altvals array ('A'
).
Iteration over supvals, charvals, and hashvals is performed in a very similar manner; however, you will find that the syntax varies depending on the type of values being accessed. For example, iteration over hashvals returns hashkeys rather than array indexes, which must then be used to retrieve hashvals.
Example 16-5. Enumerating netnode altvals
netnode n("$ idabook", 0, true); //Iterate altvals first to last for (nodeidx_t idx = n.alt1st(); idx != BADNODE; idx = n.altnxt(idx)) { ulong val = n.altval(idx); msg("Found altval['A'][%d] = %d ", idx, val); } //Iterate altvals last to first for (nodeidx_t idx = n.altlast(); idx != BADNODE; idx = n.altprev(idx)) { ulong val = n.altval(idx); msg("Found altval['A'][%d] = %d ", idx, val); }
The netnode
class also provides functions for deleting individual array elements, the entire contents of an array, or the entire contents of a netnode. Removing an entire netnode is fairly straightforward.
netnode n("$ idabook", 0, true); n.kill(); //entire contents of n are deleted
When deleting individual array elements, or entire array contents, you must take care to choose the proper deletion function because the names of the functions are very similar and choosing the wrong form may result in significant loss of data. Commented examples demonstrating deletion of altvals follow:
netnode n("$ idabook", 0, true); n.altdel(100); //delete item 100 from the default altval array ('A') n.altdel(100, (char)3); //delete item 100 from altval array 3 n.altdel(); //delete the entire contents of the default altval array n.altdel_all('A'), //alternative to delete default altval array contents n.altdel_all((char)3); //delete the entire contents of altval array 3;
Note the similarity in the syntax to delete the entire contents of the default altval array and the syntax to delete a single element from the default altval array . If for some reason you fail to specify an index when you want to delete a single element, you may end up deleting an entire array. Similar functions exist to delete supval, charval, and hashval data.
IDA’s API defines a number of C++ classes designed to model components typically found in executable files. The SDK contains classes to describe functions, program sections, data structures, individual assembly language instructions, and individual operands within each instruction. Additional classes are defined to implement the tools that IDA uses to manage the disassembly process. Classes falling into this latter category define general database characteristics, loader module characteristics, processor module characteristics, and plug-in module characteristics, and they define the assembly syntax to be used for each disassembled instruction.
Some of the more common general-purpose classes are described here. We defer discussion of classes that are more specific to plug-ins, loaders, and processor modules until the appropriate chapters covering those topics. Our goal here is to introduce classes, their purposes, and some important data members of each class. Useful functions for manipulating each class are described in Commonly Used SDK Functions in Commonly Used SDK Functions.
area_t
(area.hpp)This struct describes a range of addresses and is the base class for several other classes. The struct contains two data members, startEA
(inclusive) and endEA
(exclusive), that define the boundaries of the address range. Member functions are defined that compute the size of the address range and that can perform comparisons between two areas.
func_t
(funcs.hpp)This class inherits from area_t
. Additional data fields are added to the class to record binary attributes of the function, such as whether the function uses a frame pointer or not, and attributes describing the function’s local variables and arguments. For optimization purposes, some compilers may split functions into several noncontiguous regions within a binary. IDA terms these regions chunks or tails. The func_t
class is also used to describe tail chunks.
segment_t
(segment.hpp)The segment_t
class is another subclass of area_t
. Additional data fields describe the name of the segment, the permissions in effect in the segment (readable, writeable, executable), the type of the segment (code, data, etc.), and the number of bits used in a segment address (16, 32, or 64).
idc_value_t
(expr.hpp)This class describes the contents of an IDC value, which may contain at any time a string, an integer, or a floating-point value. The type is utilized extensively when interacting with IDC functions from within a compiled module.
idainfo
(ida.hpp)This struct is populated with characteristics describing the open database. A single global variable named inf
, of type idainfo
, is declared in ida.hpp. Fields within this struct describe the name of the processor module that is in use, the input file type (such as f_PE
or f_MACHO
via the filetype_t
enum), the program entry point (beginEA
), the minimum address within the binary (minEA
), the maximum address in the binary (maxEA
), the endianness of the current processor (mf
), and a number of configuration settings parsed from ida.cfg.
struc_t
(struct.hpp)This class describes the layout of structured data within a disassembly. It is used to describe structures within the Structures window as well as to describe the composition of function stack frames. A struc_t
contains flags describing attributes of the structure (such as whether it is a structure or union or whether the structure is collapsed or expanded in the IDA display window), and it also contains an array of structure members.
member_t
(struct.hpp)This class describes a single member of a structured datatype. Included data fields describe the byte offset at which the member begins and ends within its parent structure.
op_t
(ua.hpp)This class describes a single operand within a disassembled instruction. The class contains a zero-based field to store the number of the operand (n
), an operand type field (type
), and a number of other fields whose meaning varies depending on the operand type. The type
field is set to one of the optype_t
constants defined in ua.hpp and describes the operand type or addressing mode used for the operand.
insn_t
(ua.hpp)This class contains information describing a single disassembled instruction. Fields within the class describe the instruction’s address within the disassembly (ea
), the instruction’s type (itype
), the instruction’s length in bytes (size
), and an array of six possible operand values (Operands
) of type op_t
(IDA limits each instruction to a maximum of six operands). The itype
field is set by the processor module. For standard IDA processor modules, the itype
field is set to one of the enumerated constants defined in allins.hpp. When a third-party processor module is used, the list of potential itype
values must be obtained from the module developer. Note that the itype
field generally bears no relationship whatsoever to the binary opcode for the instruction.
The preceding list is by no means a definitive guide to all of the datatypes used within the SDK. This list is intended merely as an introduction to some of the more commonly used classes and some of the more commonly accessed fields within those classes.
While the SDK is programmed using C++ and defines a number of C++ classes, in many cases the SDK favors traditional C-style nonmember functions for manipulation of objects within a database. For most API datatypes, it is more common to find nonmember functions that require a pointer to an object than it is to find a member function to manipulate the object in the manner you desire.
In the summaries that follow, we cover API functions that provide functionality similar to many of the IDC functions introduced in Chapter 15. It is unfortunate that functions that perform identical tasks are named one thing in IDC and something different within the API.
The following functions, declared in bytes.hpp, provide access to individual bytes, words, and dwords within a database.
uchar get_byte(ea_t addr) Reads current byte value from virtual address addr . |
ushort get_word(ea_t addr) Reads current word value from virtual address addr . |
ulong get_long(ea_t addr) Reads current double word value from virtual address addr . |
get_many_bytes(ea_t addr, void *buffer, ssize_t len) Copies len bytes from the addr into the supplied buffer. |
patch_byte(ea_t addr, ulong val) Sets a byte value at virtual address addr . |
patch_word(long addr, ulonglong val) Sets a word value at virtual address addr . |
patch_long(long addr, ulonglong val) Sets a double word value at virtual address addr . |
patch_many_bytes(ea_t addr, const void *buffer, size_t len) Patches the database beginning at addr with len bytes from the user-supplied buffer . |
ulong get_original_byte(ea_t addr) Reads the original byte value (prior to patching) from virtual address addr . |
ulonglong get_original_word(ea_t addr) Reads the original word value from virtual address addr . |
ulonglong get_original_long(ea_t addr) Reads the original double word value from virtual address addr . |
bool isLoaded(ea_t addr) Returns true if addr contains valid data, false otherwise. |
Additional functions exist for accessing alternative data sizes. Note that the get_original_
XXX
functions get the very first original value, which is not necessarily the value at an address prior to a patch. Consider the case when a byte value is patched twice; over time this byte has held three different values. After the second patch, both the current value and the original value are accessible, but there is no way to obtain the second value (which was set with the first patch).
Interaction with the IDA user interface is handled by a single dispatcher function named callui
. Requests for various user interface services are made by passing a user interface request (one of the enumerated ui_notification_t
constants) to callui
along with any additional parameters required by the request. Parameters required for each request type are specified in kernwin.hpp. Fortunately, a number of convenience functions that hide many of the details of using callui
directly are also defined in kernwin.hpp. Several common convenience functions are described here:
msg(char *format, ...) Prints a formatted message to the message window. This function is analogous to C’s printf function and accepts a printf -style format string. |
warning(char *format, ...) Displays a formatted message in a dialog. |
char *askstr(int hist, char *default, char *format, ...) Displays an input dialog asking the user to enter a string value. The hist parameter dictates how the drop-down history list in the dialog should be populated and should be set to one of the HIST_ xxx constants defined in kernwin.hpp. The format string and any additional parameters are use to form a prompt string. |
char *askfile_c(int dosave, char *default, char *prompt, ...) Displays a file save (dosave = 1) or file open (dosave = 0) dialog, initially displaying the directory and file mask specified by default (such as C:\windows\*.exe ). Returns the name of the selected file or NULL if the dialog was canceled. |
askyn_c(int default, char *prompt, ...) Prompts the user with a yes or no question, highlighting a default answer (1 = yes, 0 = no,-1 = cancel). Returns an integer representing the selected answer. |
AskUsingForm_c(const char *form, ...) The form parameter is an ASCII string specification of a dialog and its associated input elements. This function may be used to build customized user interface elements when none of the SDK’s other convenience functions meet your needs. The format of the form string is detailed in kernwin.hpp. |
get_screen_ea() Returns the virtual address of the current cursor location. |
jumpto(ea_t addr) Jumps the disassembly window to the specified address. |
Many more user interface capabilities are available using the API than are available with IDC scripting, including the ability to create customized single- and multicolumn list selection dialogs. Users interested in these capabilities should consult kernwin.hpp and the choose
and choose2
functions in particular.
The following functions are available for working with named locations within a database:
get_name(ea_t from, ea_t addr, char *namebuf, size_t maxsize) Returns the name associated with addr . Returns the empty string if the location has no name. This function provides access to local names when from is any address in the function that contains addr . The name is copied into the provided output buffer. |
set_name(ea_t addr, char *name, int flags) Assigns the given name to the given address. The name is created with attributes specified in the flags bitmask. Possible flag values are described in name.hpp. |
get_name_ea(ea_t funcaddr, char *localname) Searches for the given local name within the function containing funcaddr . Returns the address of the name or BADADDR (-1) if no such name exists in the given function. |
The API functions for accessing information about disassembled functions are declared in funcs.hpp. Functions for accessing stack frame information are declared in frame.hpp. Some of the more commonly used functions are described here:
func_t *get_func(ea_t addr) Returns a pointer to a func_t object that describes the function containing the indicated address. |
size_t get_func_qty() Returns the number of functions present in the database. |
func_t *getn_func(size_t n) Returns a pointer to a func_t object that represents the nth function in the database where n is between zero (inclusive) and get_func_qty() (exclusive). |
func_t *get_next_func(ea_t addr) Returns a pointer to a func_t object that describes the next function following the specified address. |
get_func_name(ea_t addr, char *name, size_t namesize) Copies the name of the function containing the indicated address into the supplied name buffer. |
struc_t *get_frame(ea_t addr) Returns a pointer to a struc_t object that describes the stack frame for the function that contains the indicated address. |
The struc_t
class is used to access function stack frames as well as structured datatypes defined within type libraries. Some of the basic functions for interacting with structures and their associated members are described here. Many of these functions make use of a type ID (tid_t
) datatype. The API includes functions for mapping a struc_t
to an associated tid_t
and vice versa. Note that both the struc_t
and member_t
classes contain a tid_t
data member, so obtaining type ID information is simple if you already have a pointer to a valid struc_t
or member_t
object.
tid_t get_struc_id(char *name) Looks up the type ID of a structure given its name. |
struc_t *get_struc(tid_t id) Obtains a pointer to a struc_t representing the structure specified by the given type ID. |
asize_t get_struc_size(struc_t *s) Returns the size of the given structure in bytes. |
member_t *get_member(struc_t *s, asize_t offset) Returns a pointer to a member_t object that describes the structure member that resides at the specified offset into the given structure. |
member_t *get_member_by_name(struc_t *s, char *name) Returns a pointer to a member_t object that describes the structure member identified by the given name . |
tid_t add_struc(uval_t index, char *name, bool is_union=false) Appends a new structure with the given name into the standard structures list. The structure is also added to the Structures window at the given index . If index is BADADDR , the structure is added as the last structure in the Structures window. |
add_struc_member(struc_t *s, char *name, ea_t offset, flags_t flags, typeinfo_t *info, asize_t size) Adds a new member with the given name to the given structure. The member is either added at the indicated offset within the structure or appended to the end of the structure if offset is BADADDR . The flags parameter describes the datatype of the new member. Valid flags are defined using the FF_ XXX constants described in bytes.hpp. The info parameter provides additional information for complex datatypes; it may be set to NULL for primitive datatypes. The typeinfo_t datatype is defined in nalt.hpp. The size parameter specifies the number of bytes occupied by the new member. |
The segment_t
class stores information related to the different segments within a database (such as .text
and .data
) as listed in the View ▸ Open Subviews ▸ Segments window. Recall that what IDA terms segments are often referred to as sections by various executable file formats such as PE and ELF. The following functions provide basic access to segment_t
objects. Additional functions dealing with the segment_t
class are declared in segment.hpp.
segment_t *getseg(ea_t addr) Returns a pointer to the segment_t object that contains the given address. |
segment_t *ida_export get_segm_by_name(char *name) Returns a pointer to the segment_t object with the given name. |
add_segm(ea_t para, ea_t start, ea_t end, char *name, char *sclass) Creates a new segment in the current database. The segment’s boundaries are specified with the start (inclusive) and end (exclusive) address parameters, while the segment’s name is specified by the name parameter. The segment’s class loosely describes the type of segment being created. Predefined classes include CODE and DATA . A complete list of predefined classes may be found in segment.hpp. The para parameter describes the base address of the section when segmented addresses (seg:offset ) are being used, in which case start and end are interpreted as offsets rather than as virtual addresses. When segmented addresses are not being used, or all segments are based at 0, this parameter should be set to 0. |
add_segm_ex(segment_t *s, char *name, char *sclass, int flags) Alternate method for creating new segments. The fields of s should be set to reflect the address range of the segment. The segment is named and typed according to the name and sclass parameters. The flags parameter should be set to one of the ADDSEG_ XXX values defined in segment.hpp. |
int get_segm_qty() Returns the number of sections present within the database. |
segment_t *getnseg(int n) Returns a pointer to a segment_t object populated with information about the nth program section in the database. |
int set_segm_name(segment_t *s, char *name, ...) Changes the name of the given segment. The name is formed by treating name as a format string and incorporating any additional parameters as required by the format string. |
get_segm_name(ea_t addr, char *name, size_t namesize) Copies the name of the segment containing the given address into the user-supplied name buffer. Note the name may be filtered to replace characters that IDA considers invalid (characters not specified as NameChars in ida.cfg) with a dummy character (typically an underscore as specified by SubstChar in ida.cfg). |
get_segm_name(segment_t *s, char *name, size_t namesize) Copies the potentially filtered name of the given segment into the user-supplied name buffer. |
get_true_segm_name(segment_t *s, char *name, size_t namesize) Copies the exact name of the given segment into the user-supplied name buffer without filtering any characters. |
One of the add_segm
functions must be used to actually create a segment. Simply declaring and initializing a segment_t
object does not actually create a segment within the database. This is true with all of the wrapper classes such as func_t
and struc_t
. These classes merely provide a convenient means to access attributes of an underlying database entity. The appropriate functions to create, modify, or delete actual database objects must be utilized in order to make persistent changes to the database.
A number of functions and enumerated constants are defined in xref.hpp for use with code cross-references. Some of these are described here:
get_first_cref_from(ea_t from) Returns the first location to which the given address transfers control. Returns BADADDR (-1) if the given address refers to no other addresses. |
get_next_cref_from(ea_t from, ea_t current) Returns the next location to which the given address (from ) transfers control, given that current has already been returned by a previous call to get_first_cref_from or get_next_cref_from . Returns BADADDR if no more cross-references exist. |
get_first_cref_to(ea_t to) Returns the first location that transfers control to the given address. Returns BADADDR (-1) if there are no references to the given address. |
get_next_cref_to(ea_t to, ea_t current) Returns the next location that transfers control to the given address (to ), given that current has already been returned by a previous call to get_first_cref_to or get_next_cref_to . Returns BADADDR if no more cross-references to the given location exist. |
The functions for accessing data cross-reference information (also declared in xref.hpp) are very similar to the functions used to access code cross-reference information. These functions are described here:
get_first_dref_from(ea_t from) Returns the first location to which the given address refers to a data value. Returns BADADDR (-1) if the given address refers to no other addresses. |
get_next_dref_from(ea_t from, ea_t current) Returns the next location to which the given address (from ) refers a data value, given that current has already been returned by a previous call to get_first_dref_from or get_next_dref_from . Returns BADADDR if no more cross-references exist. |
get_first_dref_to(ea_t to) Returns the first location that refers to the given address as data. Returns BADADDR (-1) if there are no references to the given address. |
get_next_dref_to(ea_t to, ea_t current) Returns the next location that refers to the given address (to ) as data, given that current has already been returned by a previous call to get_first_dref_to or get_next_dref_to . Returns BADADDR if no more cross-references to the given location exist. |
The SDK contains no equivalent to IDC’s XrefType
function. A variable named lastXR
is declared in xref.hpp; however, it is not exported. If you need to determine the exact type of a cross-reference, you must iterate cross-references using an xrefblk_t
structure. The xrefblk_t
is described in “Enumerating Cross-References
Using the IDA API, there are often several different ways to iterate over various database objects. In the following examples we demonstrate some common iteration techniques:
The first technique for iterating through the functions within a database mimics the manner in which we performed the same task using IDC:
for (func_t *f = get_next_func(0); f != NULL; f = get_next_func(f->startEA)) { char fname[1024]; get_func_name(f->startEA, fname, sizeof(fname)); msg("%08x: %s ", f->startEA, fname); }
Alternatively, we can simply iterate through functions by index numbers, as shown in the next example:
for (int idx = 0; idx < get_func_qty(); idx++) { char fname[1024]; func_t *f = getn_func(idx); get_func_name(f->startEA, fname, sizeof(fname)); msg("%08x: %s ", f->startEA, fname); }
Finally, we can work at a somewhat lower level and make use of a data structure called an areacb_t
, also known as an area control block, defined in area.hpp. Area control blocks are used to maintain lists of related area_t
objects. A global areacb_t
named funcs
is exported (in funcs.hpp) as part of the IDA API. Using the areacb_t
class, the previous example can be rewritten as follows:
int a = funcs.get_next_area(0); while (a != −1) { char fname[1024]; func_t *f = (func_t*)funcs.getn_area(a); // getn_area returns an area_t get_func_name(f->startEA, fname, sizeof(fname)); msg("%08x: %s ", f->startEA, fname); a = funcs.get_next_area(f->startEA); }
In this example, the get_next_area
member function and is used repeatedly to obtain the index values for each area in the funcs
control block. A pointer to each related func_t
area is obtained by supplying each index value to the getn_area
member function . Several global areacb_t
variables are declared within the SDK, including the segs
global, which is an area control block containing segment_t
pointers for each section in the binary.
Within the SDK, stack frames are modeled using the capabilities of the struc_t
class. The example in Example 16-6 utilizes structure member iteration as a means of printing the contents of a stack frame.
Example 16-6. Enumerating stack frame members
func_t *func = get_func(get_screen_ea()); //get function at cursor location msg("Local variable size is %d ", func->frsize); msg("Saved regs size is %d ", func->frregs); struc_t *frame = get_frame(func); //get pointer to stack frame if (frame) { size_t ret_addr = func->frsize + func->frregs; //offset to return address for (size_t m = 0; m < frame->memqty; m++) { //loop through members char fname[1024]; get_member_name(frame->members[m].id, fname, sizeof(fname)); if (frame->members[m].soff < func->frsize) { msg("Local variable "); } else if (frame->members[m].soff > ret_addr) { msg("Parameter "); } msg("%s is at frame offset %x ", fname, frame->members[m].soff); if (frame->members[m].soff == ret_addr) { msg("%s is the saved return address ", fname); } } }
This example summarizes a function’s stack frame using information from the function’s func_t
object and the associated struc_t
representing the function’s stack frame. The frsize
and and frregs
fields specify the size of the local variable portion of the stack frame and the number of bytes dedicated to saved registers, respectively. The saved return address can be found within the frame following the local variables and the saved registers. Within the frame itself, the memqty
field specifies the number of defined members contained in the frame structure, which also corresponds to the size of the members
array. A loop is used to retrieve the name of each member and determine whether the member is a local variable or an argument based on its starting offset (soff
) within the frame structure.
In Chapter 15 we saw that it is possible to enumerate cross-references from IDC scripts. The same capabilities exist within the SDK, though in a some-what different form. As an example, let’s revisit the idea of listing all calls of a particular function (see Example 15-4 in Enumerating Exported Functions). The following function almost works.
void list_callers(char *bad_func) { char name_buf[MAXNAMELEN]; ea_t func = get_name_ea(BADADDR, bad_func); if (func == BADADDR) { warning("Sorry, %s not found in database", bad_func); } else { for (ea_t addr = get_first_cref_to(func); addr != BADADDR; addr = get_next_cref_to(func, addr)) { char *name = get_func_name(addr, name_buf, sizeof(name_buf)); if (name) { msg("%s is called from 0x%x in %s ", bad_func, addr, name); } else { msg("%s is called from 0x%x ", bad_func, addr); } } } }
The reason this function almost works is that there is no way to determine the type of cross-reference returned for each iteration of the loop (recall that there is no SDK equivalent for IDC’s XrefType
). In this case we should verify that each cross-reference to the given function is in fact a call type (fl_CN
or fl_CF
) cross-reference.
When you need to determine the type of a cross-reference within the SDK, you must use an alternative form of cross-reference iteration facilitated by the xrefblk_t
structure, which is described in xref.hpp. The basic layout of an xrefblk_t
is shown in the following listing. (For full details, please see xref.hpp.)
struct xrefblk_t { ea_t from; // the referencing address - filled by first_to(),next_to() ea_t to; // the referenced address - filled by first_from(), next_from() uchar iscode; // 1-is code reference; 0-is data reference uchar type; // type of the last returned reference uchar user; // 1-is user defined xref, 0-defined by ida //fill the "to" field with the first address to which "from" refers. bool first_from(ea_t from, int flags); //fill the "to" field with the next address to which "from" refers. //This function assumes a previous call to first_from. bool next_from(void); //fill the "from" field with the first address that refers to "to". bool first_to(ea_t to,int flags); //fill the "from" field with the next address that refers to "to". //This function assumes a previous call to first_to. bool next_to(void); };
The member functions of xrefblk_t
are used to initialize the structure and and perform the iteration and , while the data members are used to access information about the last cross-reference that was retrieved. The flags
value required by the first_from
and first_to
functions dictates which type of cross-references should be returned. Legal values for the flags
parameter include the following (from xref.hpp):
#define XREF_ALL 0x00 // return all references #define XREF_FAR 0x01 // don't return ordinary flow xrefs #define XREF_DATA 0x02 // return data references only
Note that no flag value restricts the returned references to code only. If you are interested in code cross-references, you must either compare the xrefblk_t type
field to specific cross-reference types (such as fl_JN
) or test the iscode
field to determine if the last returned cross-reference was a code cross-reference.
The following modified version of the list_callers
function demonstrates the use of an xrefblk_t
iteration structure.
void list_callers(char *bad_func) { char name_buf[MAXNAMELEN]; ea_t func = get_name_ea(BADADDR, bad_func); if (func == BADADDR) { warning("Sorry, %s not found in database", bad_func); } else { xrefblk_t xr; for (bool ok = xr.first_to(func, XREF_ALL); ok; ok = xr.next_to()) { if (xr.type != fl_CN && xr.type != fl_CF) continue; char *name = get_func_name(xr.from, name_buf, sizeof(name_buf)); if (name) { msg("%s is called from 0x%x in %s ", bad_func, xr.from, name); } else { msg("%s is called from 0x%x ", bad_func, xr.from); } } } }
Through the use of an xrefblk_t
, we now have the opportunity to examine the type of each cross-reference returned by the iterator and decide whether it is interesting to us or not. In this example we simply ignore any cross-reference that is not related to a function call. We did not use the iscode
member of xrefblk_t
because iscode
is true for jump and ordinary flow cross-references in addition to call cross-references. Thus, iscode
alone does not guarantee that the current cross-reference is related to a function call.
[116] Binary large object, or blob, is a term often used to refer to arbitray binary data of varying size.
18.117.189.228