Controlling Memory-Mapped Regions

A memory-mapped region often requires its attributes to be queried or changed in some fashion. This section looks at four system calls designed for this purpose:

mprotect(2) Change the access of the indicated memory pages.
madvise(2) Advise the UNIX kernel how you intend to use your memory region.
mincore(2) Determine if pages of mapped memory are currently in memory.
msync(2) Where modifications exist, indicate what regions of memory should be updated to the mapped files.

Changing the Access Protection

A memory-mapped region, entirely or in part, may have its access protections changed by the mprotect(2) system call. Its function synopsis is as follows:

#include <sys/types.h>
#include <sys/mman.h>

int mprotect(const void *addr, size_t len, int prot);

The function mprotect(2) allows the application to change the region starting at address addr for a length of len bytes, so as to use the protection specified by the argument prot. The prot flags permitted are

PROT_NONE Region grants no access (this flag excludes use of the other flags).
PROT_READ Region grants read access.
PROT_WRITE Region grants write access.
PROT_EXEC Program instructions may be executed in the memory-mapped region.

The function mprotect(2) returns the value 0 when successful. Otherwise, -1 is returned, and the error code is found in errno.

Warning

Not all UNIX implementations permit the caller to change memory region protection on a page-by-page basis. For maximum portability, the entire memory region should be specified.


The messages.c program was modified to call mprotect(2) in the file mprotect.c. The changes made to the program are shown in the context diff(1) form in Listing 26.3.

Code Listing 26.3. mprotect.c—Changes to messages.c to Make Message Text Read-Only
$ diff -c messages.c mprotect.c
*** messages.c  Sun Jul  9 18:11:00 2000
--- mprotect.c  Sun Jul  9 18:59:19 2000
***************
*** 1,4 ****
! /* messages.c */

  #include <stdio.h>
  #include <unistd.h>
--- 1,4 ----
! /* mprotect.c */

  #include <stdio.h>
  #include <unistd.h>
***************
*** 103,108 ****
--- 103,114 ----
       * Now parse the messages :
       */
      parse_messages();
+
+     /*
+      * Make the message text read only now :
+      */
+     if ( mprotect(msgs,msgs_len+1,PROT_READ) )
+         fprintf(stderr,"%s: mprotect(PROT_READ)
",strerror(errno));
  }

  /*
$

The mprotect(2) call follows the parse_messages() function call in Listing 26.3. At this point, it is desirable to use a read-only status, since this will prevent buggy code from altering the message text. If an attempt is made to change the error message text, a SIGBUS signal will be raised instead.

Advising the Kernel About Memory Use

To achieve maximum performance, you may find it desirable for your application to inform the UNIX kernel about the status of a memory region or about its usage patterns. The system call madvise(2) permits this to be accomplished:

#include <sys/types.h>
#include <sys/mman.h>

int madvise(void *addr, size_t len, int behavior);

The madvise(2) function returns 0 when successful. The value -1 is returned when the call fails, leaving the error code in the variable errno.

The madvise(2) system call allows you to hint to the kernel about the memory region starting at addr for a length of len bytes. The behavior is specified by one of the following values:

MADV_NORMAL Normal behavior; no special treatment is required.
MADV_RANDOM Expect memory pages to be referenced at random. Sequential prefetching is to be discouraged.
MADV_SEQUENTIAL Expect memory pages to be referenced sequentially. This encourages prefetching and decreases the priority of previously fetched pages.
MADV_WILLNEED Indicates a range of memory pages that should temporarily have a higher priority, since they will be needed.
MADV_DONTNEED Indicates a range of memory pages that are no longer required (their priority is reduced). It is likely that future references to these pages will incur a page fault.
MADV_FREE Indicates that the modifications in the memory pages indicated do not need to be saved. Furthermore, this permits the kernel to release the physical memory pages used. The next time the page is referenced, it may be zeroed, or it may still contain the original data.

In addition to these, some platforms support the following behavior:

MADV_SPACEAVAIL Ensures that the necessary resources are reserved.

Linux and UnixWare 7 do not support the madvise(2) function at all. Table 26.2 provides a cross-reference grid of supported behaviors.

Table 26.2. A Cross-Reference Guide to madvise(2) Behavior Support on Different Platforms
madvise(2) Behavior Platform
 FreeBSD SGI IRIX 6.5 HPUX 11 UnixWare 7 Solaris 8 IBM AIX 4.3 Linux
MADV_NORMAL X  X  X X  
MADV_RANDOM X  X  X X  
MADV_SEQUENTIAL X  X  X X  
MADV_WILLNEED X  X  X X  
MADV_DONTNEED X X   X X  
MADV_FREE X    X   
MADV_SPACEAVAIL   X   X  

Listing 26.4 shows a context diff(1) listing, illustrating the changes between mprotect.c and madvise.c. In madvise.c, calls to madvise(2) have been added.

Code Listing 26.4. madvise.c—Changes Made to mprotect.c to Indicate Access Behavior Patterns to the Kernel
*** mprotect.c    Sun Jul  9 18:59:19 2000
--- madvise.c     Sun Jul  9 19:40:33 2000
***************
*** 1,4 ****
! /* mprotect.c */

  #include <stdio.h>
  #include <unistd.h>
--- 1,4 ----
! /* madvise.c */

  #include <stdio.h>
  #include <unistd.h>
***************
*** 100,105 ****
--- 100,111 ----
      close(fd);      /* no longer require file to be open */

      /*
+      * Advise kernel of sequential behavior :
+      */
+     if ( madvise(msgs,msgs_len+1,MADV_SEQUENTIAL) )
+         fprintf(stderr,"%s: madvise(MADV_SEQUENTIAL)
",strerror(errno));
+
+     /*
       * Now parse the messages :
       */
      parse_messages();
***************
*** 109,114 ****
--- 115,126 ----
       */
      if ( mprotect(msgs,msgs_len+1,PROT_READ) )
          fprintf(stderr,"%s: mprotect(PROT_READ)
",strerror(errno));
+
+     /*
+      * Advise kernel of random behavior :
+      */
+     if ( madvise(msgs,msgs_len+1,MADV_RANDOM) )
+         fprintf(stderr,"%s: madvise(MADV_SEQUENTIAL)
",strerror(errno));
  }

  /*

The first madvise(2) call occurs before the error message file is parsed, to indicate sequential access with MADV_SEQUENTIAL. Recall that the parsing of the messages is sequential from the start to the end of the mapped message file.

Once the messages have been parsed, however, the access pattern changes to that of a random nature, since any error message may be called upon demand. Hence, the second call to madvise(2) selects behavior MADV_RANDOM.

Querying Pages in Memory

It is possible to query the kernel to determine which memory pages are currently in memory. This is accomplished by the mincore(2) system call, and its synopsis is as follows:

#include <sys/types.h>
#include <sys/mman.h>

int mincore(const void *addr, size_t len, char *vec);

The mincore(2) function accepts a starting address addr and a length of len bytes. All pages within this range are then reported by setting values in the vec character array. The array vec is expected to be large enough to contain all the values that must be reported. Each byte receives 1 if the page is in memory or 0 if the page is not in memory. The number of bytes required depends on the length of the region and the page size returned by the function getpagesize(3).

When successful, the value 0 is returned by mincore(2). Otherwise, -1 is returned, and the error is found in the variable errno.

The following shows a call to mincore(2):

char vec[32];                  /* Reports for up to 32 pages */
if ( mincore(addr,len,&vec[0]) == -1 )
    perror("mincore(2)");      /* Report error */

Table 26.3 shows that support for mincore(2) is not available on many platforms. Also, note that the argument addr is type caddr_t on non-BSD platforms.

Table 26.3. A Cross-Reference Chart for mincore(2) Support on Different Platforms
mincore(2) Support Platform
 FreeBSD SGI IRIX 6.5 HPUX 11 UnixWare 7 Solaris 8 IBM AIX 4.3 Linux
mincore(2) X   X X X  
const void *addr X       
caddr_t addr    X X X  

Synchronizing Changes

When changes are made to writable mapped regions of memory, there are various timing choices for recording changes into the file. The msync(2) system call provides a degree of control over this choice. Its function synopsis is as follows:

#include <sys/types.h>
#include <sys/mman.h>

int msync(void *addr, size_t len, int flags);

The msync(2) call affects the region starting at addr for a length of len bytes. When len is 0, all of the pages of the region are affected. Argument flags determines what synchronization choice is to take effect:

MS_ASYNC Request all changes to be written out, but return immediately. (Not implemented for FreeBSD release 3.4.)
MS_SYNC Perform synchronous writes of all outstanding changes.
MS_INVALIDATE Immediately invalidate all cached modifications to pages. Future references to these pages require the pages to be fetched from the file.

The MS_SYNC flag is similar to calling fsync(2) on an open file descriptor. It forces all changes out to the disk media and returns once this has been accomplished. The MS_INVALIDATE flag allows the application to discard all changes that have been made. This saves the kernel from synchronizing the memory region with the file.

The function msync(2) returns 0 when successful. Otherwise, -1 is returned with the error code deposited in errno. The following shows an example of a msync(2) call to cause all changes to be immediately written to the file:

if ( msync(addr,0,MS_SYNC) == -1 )
    perror("msync(2)");

Table 26.4 shows the support available for msync(2) on the different platforms.

Table 26.4. A Cross-Reference Chart of msync(2) Support on Different Platforms
msync(2) Support Platform
 FreeBSD SGI IRIX 6.5 HPUX 11 UnixWare 7 Solaris 8 IBM AIX 4.3 Linux
MS_ASYNC  X X X X X X
MS_SYNC X X X X X X X
MS_INVALIDATE X X X X X X X
void *addr X X X  X X  
const void *addr       X
caddr_t addr    X    

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.247