Appendix A. A Techniques for Out-of-Memory Testing[236]

This appendix describes some techniques for testing and fixing out-of-memory (OOM) robustness issues and memory leaks. OOM robustness means that when some memory allocation operation fails, the error is handled gracefully. The program should not crash. It should not produce incorrect results. It should not leak memory.

As a case example, I will be using Redland,[237] a set of open source libraries for processing resource description framework (RDF) data. RDF is a set of related World Wide Web Consortium (W3C) specifications.[238] They were originally designed as a metadata model. Currently RDF is used for modeling all kinds of data, using various syntax notations. Usually the data is about web resources and RDF is the core of what is often called the 'semantic web'.

In a project I was working on, I wanted to convert existing legacy data sources to RDF and run SPARQL queries on them. SPARQL is an SQL-like standardized query language for RDF data. Instead of writing everything from scratch, I decided to port existing open source components to Symbian OS and ended up porting Redland. It provided me with RDF parsers, serializers, storage and a query engine that supports SPARQL. Redland is written in C without any complex dependencies. Porting the core functionality was relatively straightforward using the P.I.P.S. libraries for Symbian OS. Redland's Apache 2.0 and LGPL 2.1 dual license was also generous enough for my needs.

Why Test for Out-of-Memory Errors?

The importance of detecting and fixing errors caused by out-of-memory situations is heightened on the Symbian platform. This is because of some of its architectural design choices and its use in resource-constrained devices. Emphasizing graceful handling of low-memory situations is also reflected in the UNI-03 Symbian Signed test criterion.

Redland, like many open source projects, has been developed on Unix-like operating systems where memory is plentiful and supplemented by virtual memory. Programs are combined in shell scripting style where individual programs run only for a short while and resources of the process are then automatically freed by the operating system.

In contrast to desktop systems, Symbian devices have only a little RAM and no virtual memory. Therefore OOM errors are more likely to happen. Additionally in Symbian OS, the scripting architectural style is rarely used and program lifecycles are different from in Unix. For example, in my project, I use Redland in long-running background server processes. Leaking memory in such a process is a sure way to memory allocation failures and all kinds of errors.

Many native Symbian OS programs behave well in OOM situations. OOM errors are usually manifested by KErrNoMemory leaves (exceptions) that are handled in an appropriate trap harness. Resource cleanup is handled by CleanupStack. Symbian OS C++ programmers are familiar working with these concepts.

Standard C does not have exceptions. OOM checking and resource cleanup are all up to the programmer. When I started working on Redland, I sampled the code to see how it felt. I noticed it had OOM checks in some places but not consistently. Often the return value from malloc() and other allocation functions was not checked at all. The basic handling principle for fatal errors was simply to give up and abort() the program. Not very graceful! Redland obviously needed changes in it so that it could be used in a program targeting large-scale deployment. To support and focus the fixing work, additional tests were required – the test suite supplied with Redland only tested functionality in close-to-ideal scenarios and without checking for resource leaks.

Out-of-Memory Loop

There is a well-established testing technique on Symbian OS called the OOM loop. Its basic idea is a kind of fault injection. Allocation failure injection is activated using the __UHEAP_SETFAIL() heap failure macro. The program being tested is then expected to either function properly or fail gracefully.[239]

I started by implementing some integration test cases that exercised the Redland libraries in a similar fashion. I was planning to use them in the actual program. I attempted to run the test functions in an OOM loop with lots of iterations, for example, using the EDeterministic heap failure mode to fail every kth allocation for k = 1..2000:

for ( TInt k = 1; k <= 2000; ++k ),
  {
  // Set memory allocation to fail after k allocs
  __UHEAP_SETFAIL( RHeap::EDeterministic, k );
  // Heap marker for detecting memory leaks
  __UHEAP_MARK;
  // Execute the test function
  TRAPD( err, DoTestL() );
  // Check for memory leaks
  __UHEAP_MARKEND;
  if ( !err )
    {
    // The test function did not leave
    // If it ran successfully,
    // we have found an upper bound for k in for this test function
    break;
    }
  }

I didn't like the way the __UHEAP_MARKEND macro worked: if it detected a memory leak, it would simply kill the program with an ALLOC panic. When hunting down leaks, I wanted to have the option to experiment with the code dynamically while it was still running and not just to do some post-mortem analysis with tools, such as HookLogger, which, while useful, did not always help me enough to figure out what was going on in a complex, dynamic system. Therefore I changed the leak detection mechanism to use functions available in RHeap to query for allocated cell counts, before and after the test function, and the address of the first leaked cell, if any:

TInt allocsize;
TInt alloccount;
TInt alloccount2;
TUint32 leaked;
RHeap &heap = User::Heap();
for ( TInt k = 1; k <= 2000; ++k )
  {
  // Inject failures every k alloc
  __UHEAP_SETFAIL( RHeap::EDeterministic, k );
  // Store the current alloc count of user heap
  alloccount = heap.AllocSize( allocsize );
  // Set heap marker for detecting memory leaks
  heap.__DbgMarkStart();
  // Execute the test function
  TRAPD( err, DoTestL() );
  // Get current alloc count of user heap
  alloccount2 = heap.AllocSize( allocsize );
  // Expect 0 leaked cells
  leaked = heap.__DbgMarkEnd( 0 );
  // Reset heap alloc failures
  // Total reset sets the nesting level of all allocated cells to zero
  // so that previously leaked cells are no longer checked
  __UHEAP_TOTAL_RESET;
  // Issue a breakpoint if detected a memory leak
  if ( leaked )
    {
    RDebug::Printf( "leaked \%d cells, first one at \%p",
      alloccount2 endash  alloccount,
      reinterpret_cast< void * >( leaked ) );
    __BREAKPOINT();
    }
  if ( !err )
    {
    // The test function did not leave
    // If it ran successfully,
    // we have found an upper bound for k in for this test function
    break;
    }
  }

The abort() issue mentioned in Section A.1 became a problem immediately: the test program terminated when I wanted to keep it running. To solve this problem, I created a porting layer DLL to replace the standard C library implementation of abort() with my own version that throws an exception:

extern "C" (
EXPORT_C void abort()
  {
  User::Leave( KErrAbort );
  }
} // extern "C"

I had to link against this DLL before the standard C library so the linker would choose my version and not the standard version. Similar function replacement would have been possible without creating a new DLL, for example by using the C preprocessor string substitution functionality or compiling the replacement code directly into the project. A DLL was required later when I was adding failure detection and debugging support features directly to the allocators (this is explained in Section A.3). I decided to keep all standard library replacement functions in the same place.

Replacing program termination with throwing an exception allowed caller C++ code to catch the exception and deal with it properly. It also introduced a number of memory leaks: abort() normally just terminates the process. The C library implementation and the operating system may free some resources of the process (such as closing open files or freeing allocated memory) but it is not strictly mandatory. In any case, destructors and functions registered with atexit() are not called. Becoming more graceful is not easy and straightforward.

Now I was able to enter the following test-driven, bug-fixing loop:

  1. Write new test code or extend old tests. Run all tests. Repeat until some of the tests fail.

  2. Fix any problems discovered. Repeat until all tests pass again.

  3. Go back to Step 1.

This way I discovered and fixed literally hundreds of bugs in the libraries. Most of them were relatively simple failures to check the return code of some potentially failing function, simple memory leaks and so on. Some bugs were a little more complicated, for example, requiring design-level clarifications to rules for passing object ownership.

Improved Heap Failure Tool

The OOM loop approach described in Section A.2 also had its issues:

  1. It was hard to determine where to set heap failure limits i.e. the maximum value of k.

  2. Not all allocation failures resulted in observable bugs but they still made the system under test run in a slightly inconsistent state.

  3. Some complicated bugs were hard to debug because the allocation failure and the observed error were highly decoupled i.e. very far from each other.

  4. Some integration test cases would detect errors in dependent libraries (e.g. the SQLite database or libxml2 parser). I was not interested in fixing them for the time being.

To counter these issues, I decided to not use the Symbian OS heap failure tool but to write my own. One way to implement a heap failure tool could be to write a subclass of RAllocator and pass it as a parameter to RThread::Create(). However, this wouldn't address issue 4 as all the libraries used in the same thread share the same allocator. Something else was needed.

Fortunately for me, the Redland libraries use only a small set of memory management functions: malloc(), calloc(), realloc() and free(). No other functions that allocate memory were used, for example strdup(). This made it easy to implement my own versions of these functions in the porting layer DLL that already contained the abort() replacement:

extern "C" {
EXPORT_C void *malloc( unsigned int size )
  {
  return AllocWrapper( User::Alloc( size ) );
  }
EXPORT_C void *calloc( unsigned int n, unsigned int count )
  {
  return AllocWrapper( User::AllocZ( n * count ) );
  }
EXPORT_C void *realloc( void *p, unsigned int size )
  {
  return AllocWrapper( User::ReAlloc( p, size ) );
  }
EXPORT_C void free( void *p )
  {
  User::Free( p );
  }
} // extern "C"

The AllocWrapper() function is the workhorse of my heap failure tool; it takes in a pointer to a block of allocated memory and may inject a failure by freeing the block and returning a NULL pointer instead. I describe the function in more detail later.

The heap failure tool needs some state information so it knows when to inject a failure and when not to. I decided to store this state information in the DLL thread-local storage (TLS). I added an OOM counter that would keep track of all allocation failures, injected and real. I also added some DLL API functions to set the heap failure parameters and to query and reset the OOM counter. For debugging support, I wanted to start single-stepping in a debugger starting from the point of failure injection, so I added a state variable for that too:

struct TAllocState
  {
  // OOM counter
  TInt iOomCount;
  // Heap failure mode
  RAllocator::TAllocFail iFailureMode;
  // Fail every iFailCount in deterministic failure mode
  // In random failure mode, fail approximately once every iFailCount
  // allocation
  TInt iFailCount;
  // Alloc counter
  TInt iAllocCount;
  // State information for pseudorandom generator
  TInt64 iRandomSeed;
  // Issue a debugger breakpoint on failed allocation
  TBool iBreakOnFailure;
  };
// Access alloc state in DLL thread-local storage
inline TAllocState *AllocState()
  {
  return reinterpret_cast< TAllocState * >( Dll::Tls() );
  }
// Initialize the failure tool
// Must be called before calling any of the other functions
EXPORT_C TInt FailureToolInit()
  {
  TAllocState *state;
  // Check for existing alloc state
  state = AllocState();
  if ( state )
    {
    return KErrAlreadyExists;
    }
  // Create new state
  state = reinterpret_cast< TAllocState * >( User::AllocZ(
                                sizeof( TAllocState ) ) );
if ( !state )
    {
    return KErrNoMemory;
    }
  // Set non-zero default values
  state->iFailureMode = RAllocator::ENone;
  // Store state in thread-local storage
  Dll::SetTls( state );
  return KErrNone;
  }
// Clean up the failure tool
EXPORT_C void FailureToolFinish()
  {
  TAllocState *state = AllocState();
  User::Free( state ); // ok to free NULL
  Dll::SetTls( 0 );
  }
// Set failure mode
EXPORT_C void FailureToolSetAllocFail( RAllocator::TAllocFail aMode,
                                                       TInt aCount )
  {
  TAllocState *state = AllocState();
  state->iFailureMode = aMode;
  state->iFailCount = aCount;
  state->iAllocCount = 0;
  state->iRandomSeed = 0;
  }
// Enable/disable breakpoints on alloc failures
EXPORT_C void FailureToolSetBreakOnFailure( TBool aBreakOnFailure )
  {
  TAllocState *state = AllocState();
  state->iBreakOnFailure = aBreakOnFailure;
  }
// Query and reset OOM counter
EXPORT_C TInt FailureToolOomOccured()
  {
  TAllocState *state = AllocState();
  TInt count = state->iOomCount;
  state->iOomCount = 0;
  return count;
  }

Now we can finally have a look at the AllocWrapper() failure tool implementation. I didn't feel the need to implement all heap failure modes supported by the native heap failure tool. I was happy with just the deterministic (EDeterministic) and pseudorandom (ERandom) modes.

// Test whether to fail an alloc in the current state
bool AllocShouldFail( TAllocState *aState )
{
  switch ( aState->iFailureMode )
    {
    // Fail pseudorandomly with 1/failcount probability
    case RAllocator::ERandom:
    return Math::Rand( aState->iRandomSeed ) \% aState->iFailCount == 0;
    // Fail deterministically after failcount successful allocs
    case RAllocator::EDeterministic:
    return ++aState->iAllocCount \% aState->iFailCount == 0;
    // Do not fail
    case RAllocator::ENone: // fall-through
    // Not supported modes, do not fail
    case RAllocator::ETrueRandom:
    case RAllocator::EFailNext:
    case RAllocator::EReset:
    default:
    return false;
    }
  }
// Inject OOM failures
TAny *AllocWrapper( TAny *aMemory )
  {
  TAllocState *state = AllocState();
  // Inject a failure?
  if ( AllocShouldFail( state ) )
    {
    User::Free( aMemory );
    aMemory = 0;
    }
  // OOM occured, injected or real?
  if ( !aMemory )
    {
    // Increment OOM counter
    ++state->iOomCount;
    // Issue emulator breakpoint?
    if ( state->iBreakOnFailure )
      {
      __BREAKPOINT();
      }
    }
  return aMemory;
  }

Finally I integrated this improved heap failure tool into the OOM loop from section A.2:

User::LeaveIfError( FailureToolInit() );
//...
TInt allocsize;
TInt alloccount;
TInt alloccount2;
TUint32 leaked;
TInt oom;
RHeap &heap = User::Heap();
// Debugging support
TBool die = EFalse; // set to true in debugger to terminate the OOM
                    // loop
TBool oombreak = EFalse; // set to true in debugger to enable
                         // breakpoint on OOM
for ( TInt k = 1; !die; ++k )
  {
  FailureToolSetBreakOnFailure( oombreak );
  // Reset OOM counter
  FailureToolOomOccured();
  // Set memory allocation to fail after k allocs
  FailureToolSetAllocFail ( RHeap::EDeterministic, k );
  // Store the current alloc count of user heap
  alloccount = heap.AllocSize( allocsize );
  // Set heap marker for detecting memory leaks
  heap.__DbgMarkStart();
  // Execute the test function
  TRAPD( err, DoTestL() );
  // Get current alloc count of user heap
  alloccount2 = heap.AllocSize( allocsize );
  // Expect 0 leaked cells
  leaked = heap.__DbgMarkEnd( 0 );
  // Reset heap alloc failures
  // Total reset sets the nesting level of all allocated cells to zero
  // so that previously leaked cells are no longer checked
  __UHEAP_TOTAL_RESET;
  // Get OOM count + reset the counter
  oom = FailureToolOomOccured();
  // Issue a breakpoint if detected a memory leak
  if ( leaked )
    {
    RDebug::Printf( "[\%d] leaked \%d cells, first one at \%p",
      k,
      alloccount2 endash  alloccount,
      reinterpret_cast< void * >( leaked ) );
    __BREAKPOINT();
    }
  if ( !err && !oom )
    {
// The test function did not leave and there was no OOM
    // If it ran successfully,
    // we have found an upper bound for k in for this test function
    RDebug::Printf( "[\%d] finished", k );
    break;
    }
  }
// ...
FailureToolFinish();

This setup fully addresses the issues I listed at the beginning of this section:

  1. A suitable upper bound for k was reached in deterministic failure mode when the test function ran without leaving and produced correct expected results, and no OOMs were registered.

  2. I could query the porting layer state for OOM failure count using FailureToolOomOccurred() to see whether there had been any undetected OOM errors while running the test code.

  3. On an injected allocation failure, I could set the heap failure tool to issue a debugger breakpoint. This way I quickly discovered the root causes for errors occurring much later in the test case.

  4. Dependent libraries were not affected since they were not linked against the allocator functions in my heap failure tool.

The OOM counter was also useful in a non-testing setup. I could use it to invalidate the results of any operation to make sure the system was not running in an inconsistent state.

Summary

In this appendix, I have described some techniques for testing for out-of-memory issues. I started with the well established OOM loop technique with a heap failure tool and evolved them to a more sophisticated testing system that suited my needs when porting the Redland RDF libraries to Symbian OS.

The improved heap failure tool concept is not directly usable in every project – for example, the set of allocator functions that need to be instrumented in the failure tool may vary from project to project and adaptations are needed.

I exercised Redland with about 200 test functions which were run with failure count values ranging from 1 to 2000 or 10000, depending on test function complexity. On average, the number of test case and failure injection pattern combinations was in the order of hundreds of thousands. With these tests I detected and fixed literally hundreds of OOM-related errors in Redland libraries. Of course, I have submitted all bug fixes back to the Redland open source project to benefit the whole community.



[236] This appendix is edited and extended from blogs.forum.nokia.com/blog/lauri-aaltos-forum-nokia-blog/2008/11/12/fixing-out-of-memory-issues-in-redland-rdf-libraries.

[237] www.librdf.org.

[238] www.w3.org/RDF.

[239] John Pagonis writes extensively about the OOM loop construct in his Symbian Developer Network technical paper at developer.symbian.com/wiki/pages/viewpage.action?pageId=432.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.254.138