11.1. Files in Solaris

Generically defined, a file is an entity that stores data as an array of bytes, beginning at byte zero and extending to the end of the file. The contents of the file (the data) can take any number of forms: a simple text file, a binary executable file, a directory file, etc. Solaris supports many types of files, several of which are defined at the kernel level, meaning that some component of the kernel has intimate knowledge of the file's format by virtue of the file type. An example is a directory file on a UFS file system—directory files have a specific format that is known to the UFS kernel routines designed for directory I/O.

The number of file types in the kernel has increased over the last several years with the addition new kernel abstractions in the form of pseudofiles. Pseudofiles provide a means by which the kernel can abstract as a file a binary object, like a data structure in memory. Users and programmers view the object as a file, in that the traditional file I/O operations are supported on it (for the most part). It's a pseudofile because it is not an on-disk file; it's not a real file in the traditional sense.

Under the covers, the operations performed on the object are managed by the file system on which the file resides. A specific file type often belongs to an underlying file system that manages the storage and retrieval of the file and defines the kernel functions for I/O and control operations on the file. (See Chapter 14, “The UNIX File System”, for details about file systems.) Table 11-1 lists the various types of files implemented in Solaris.

Table 11-1. Solaris File Types
File Type File System Character Designation Description
Regular UFS A traditional on-disk file. Can be a text file, binary shared object, or executable file.
Directory UFS d A file that stores the names of other files and directories. Other file systems can implement directories within their own file hierarchy.
Symbolic Link UFS l A file that represents a link to another file, potentially in another directory or on another file system.
Character Special specfs c A device special file for devices capable of character mode I/O. Device files represent I/O devices on the system and provide a means of indexing into the device driver and uniquely identifying a specific device.
Block Special specfs b As above, a device special file for devices capable of block-mode I/O, such as disk and tape devices.
Named Pipe (FIFO) fifofs p A file that provides a bidirectional communication path between processes running on the same system.
Door doorfs D Part of the door interprocess communication facility. Doors provide a means of doing very fast interprocess procedure calling and message and data passing.
Socket sockfs s A communication endpoint for network I/O, typically used for TCP or UDP connections between processes on different systems. UNIX domain sockets are also supported for interprocess communication between processes on the same system. The “s” character designation appears only for AF_UNIX sockets.

The character designation column in Table 11-1 refers to the character produced in the lefthand column of an ls -l command. When a long file listing is executed, a single character designates the type of file in the listing.

Within a process, a file is identified by a file descriptor: an integer value returned to the process by the kernel when a file is opened. An exception is made if the standard I/O interfaces are used. In that case, the file is represented in the process as a pointer to a FILE structure, and the file descriptor is embedded in the FILE structure. The file descriptor references an array of per-process file entry (uf_entry) structures, which form the list of open files within the process. These per-process file entries link to a file structure, which is a kernel structure that maintains specific status information about the file on behalf of the process that has the file opened. If a specific file is opened by multiple processes, the kernel maintains a file structure for each process; that is, the same file may have multiple file structures referencing it. The primary reason for this behavior is to maintain a per-process read/write file pointer for the file, since different processes may be reading different segments of the same file.

The kernel implements a virtual file abstraction in the form of a vnode, where every opened file in Solaris is represented by a vnode in the kernel. A given file has but one vnode that represents it in the kernel, regardless of the number of processes that have the file opened. The vnode implementation is discussed in detail in “The vnode”. In this discussion, we allude to the vnode and other file-specific structures as needed for clarity.

Beyond the vnode virtual file abstraction, a file-type-specific structure describes the file. The structure is implemented as part of the file system on which the file resides. For example, files on the default Unix File System (UFS) are described by an inode that is linked to the v_data pointer of the vnode.

Figure 11.1 illustrates the relationships of the various file-related components, providing a path from the file descriptor to the actual file. The figure shows how a file is viewed at various levels. Within a process, a file is referenced as a file descriptor. The file descriptor indexes the per-process u_flist array of uf_entry structures, which link to the kernel file structure. The file is abstracted in the kernel as a virtual file through the vnode, which links to the file-specific structures (based on the file type) through the v_data pointer in the vnode.

Figure 11.1. File-Related Structures


The process-level uf_entry structures are allocated dynamically in groups of 24 as files are opened, up to the per-process open file limit. The uf_entry structure contains a pointer to the file structure (uf_ofile) and a uf_pofile flag field used by the kernel to maintain file state information. The possible flags are FRESERVED, to indicate that the slot has been allocated, FCLOSING, to indicate that a file-close is in progress, and FCLOSEXEC, a user-settable close-on-exec flag, which instructs the kernel to close the file descriptor if an exec(2) call is executed. uf_entry also maintains a reference count in the uf_refcnt member. This count provides a means of tracking multiple references to the file in multithreaded processes.

The kernel establishes a default hard and soft limit for the number of files a process can have opened at any time. rlim_fd_max is the hard limit, and rlim_fd_cur is the current limit (or soft limit). A process can have up to rlim_fd_cur file descriptors and can increase the number up to rlim_fd_max. You can set these parameters systemwide by placing entries in the /etc/system file:

set rlim_fd_max=8192
set rlim_fd_cur=1024

You can alter the per-process limits either directly from the command line with the limit(1) or ulimit(1) shell commands or programmatically with setrlimit(2). The actual number of open files that a process can maintain is driven largely by the file APIs used. For 32-bit systems, if the stdio(3S) interfaces are used, the limit is 256 open files. This limit results from the data type used in the FILE structure for the actual file descriptor. An unsigned 8-bit data type, which has a range of values of 0–255, is used. Thus, the maximum number of file descriptors is limited to 256 for 32-bit stdio(3S)-based programs. For 64-bit systems (and 64-bit processes), the stdio(3S) limit is 64 Kbytes.

The select(3C) interface, which provides a mechanism for file polling, imposes another API limit. select(3C) limits the number of open files to 1 Kbyte on 32-bit systems, with the exception of 32-bit Solaris 7. In 32-bit Solaris 7, select(3C) can poll up to 64-Kbyte file descriptors. If you use file descriptors greater than 1-Kbyte with select(3C) on 32-bit Solaris 7, then you must declare FD_SETSIZE in the program code. On 64-bit Solaris 7, a 64-bit process has a default file descriptor set size (FD_SETSIZE) of 64 Kbytes. Table 11-2 summarizes file descriptor limitations.

Table 11-2. File Descriptor Limits
Interface (API) Limit Notes
stdio(3S) 256 All 32-bit systems.
stdio(3S) 64K (65536) 64-bit programs only (Solaris 7 and later).
select(3C) 1K (1024) All 32-bit systems. Default value for 32-bit Solaris 7.
select(3C) 64K (65536) Attainable value on 32-bit Solaris 7. Requires you to add: #define FD_SETSIZE 65536 to program code before inclusion on additional system header files.
select(3C) 64K (65536) Default for 64-bit Solaris 7 (and beyond).

Those limitations aside, there remain only the practical limits that govern the number of files that can be opened on a per-process and systemwide basis. A practical limit from a per-process perspective really comes down to two things: how the application software is designed; and what constitutes a manageable number of file descriptors within a single process, such that the maintenance, performance, portability, and availability requirements of the software can be met. The file descriptors and uf_entry structures do not require a significant amount of memory space, even in large numbers, so per-process address space limitations are typically not an issue when it comes to the number of open files.

11.1.1. Kernel File Structures

The Solaris kernel does not implement a system file table in the traditional sense. That is, the systemwide list of file structures is not maintained in an array or as a linked list. A kernel object cache segment is allocated to hold file structures, and they are simply allocated and linked to the process and vnode as files are created and opened.

We can see in Figure 11.1 that each process uses file descriptors to reference a file. The file descriptors ultimately link to the kernel file structure, defined as a file_t data type, shown below.

Example. Header File <sys/file.h>
typedef struct file {
        kmutex_t         f_tlock;         /* short-term lock */
        ushort_t         f_flag;
        ushort_t         f_pad;           /* Explicit pad to 4-byte boundary */
        struct vnode     *f_vnode;        /* pointer to vnode structure */
        offset_t         f_offset;        /* read/write character pointer */
        struct cred      *f_cred;         /* credentials of user who opened it */
        caddr_t          f_audit_data;    /* file audit data */
        int              f_count;         /* reference count */
} file_t;

The fields maintained in the file structure are, for the most part, self-explanatory. The f_tlock kernel mutex lock protects the various structure members. These include the f_count reference count, which lists how many threads have the file opened, and the f_flag file flags, described in “File Open Modes and File Descriptor Flags”.

Solaris allocates file structures for opened files as needed, growing the open file count dynamically to meet the requirements of the system load. Therefore, the maximum number of files that can be opened systemwide at any time is limited by available kernel address space, and nothing more. The actual size to which the kernel can grow depends on the hardware architecture of the system and the Solaris version the system is running. The key point is that a fixed kernel limit on a maximum number of file structures does not exist.

The system initializes space for file structures during startup by calling file_cache(), a routine in the kernel memory allocator code that creates a kernel object cache. The initial allocation simply sets up the file_cache pointer with space for one file structure. However, the kernel will have allocated several file structures by the time the system has completed the boot process and is available for users, as all of the system processes that get started have some opened files. As files are opened/created, the system either reuses a freed cache object for the file entry or creates a new one if needed. You can use /etc/crash as root to examine the file structures.

# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> file
ADDRESS     RCNT    TYPE/ADDR            OFFSET   FLAGS
3000009e008   1    FIFO/300009027e0          0   read write
3000009e040   1    UFS /3000117dc68        535   write appen
3000009e078   1    SPEC/300008ed698       3216   write appen
3000009e0b0   1    UFS /300010d8c98          0   write
3000009e0e8   1    UFS /30001047ca0          4   read write
3000009e120   2    DOOR/30000929348          0   read write
3000009e158   1    SPEC/30000fb45d0          0   read
3000009e1c8   1    UFS /300014c6c98        106   read write
3000009e200   1    SPEC/30000c376a0          0   write
3000009e238   2    DOOR/30000929298          0   read write
3000009e270   3    SPEC/300008ecf18          0   read
3000009e2a8   1    UFS /30000f5e0f0          0   read
3000009e2e0   1    SPEC/30000fb46c0          0   read write
3000009e318   1    UFS /300001f9dd0          0   read
3000009e350   1    FIFO/30000902c80          0   read write

The ADDRESS column is the kernel virtual memory address of the file structure. RCNT is the reference count field (f_count). TYPE is the type of file, and ADDR is the kernel virtual address of the vnode. OFFSET is the current file pointer, and FLAGS are the flags bits currently set for the file.

You can use sar(1M) for a quick look at how many files are opened systemwide.

# sar -v 3 3

SunOS devhome 5.7 Generic sun4u    08/01/99

11:38:09  proc-sz    ov  inod-sz      ov  file-sz    ov   lock-sz
11:38:12  100/5930    0 37181/37181    0  603/603     0    0/0
11:38:15  100/5930    0 37181/37181    0  603/603     0    0/0
11:38:18  101/5930    0 37181/37181    0  607/607     0    0/0

This example shows 603 opened files. The format of the sar output is a holdover from the early days of static tables, which is why it is displayed as 603/603. Originally, the value on the left represented the current number of occupied table slots, and the value on the right represented the maximum number of slots. Since file structure allocation is completely dynamic in nature, both values will always be the same.

For a specific process, you can use the pfiles(1) command to create a list of all the files opened.

$ pfiles 585
585:    /space1/framemaker,v5.5.3/bin/sunxm.s5.sparc/maker -xrm *iconX:0 -xrm
  Current rlimit: 64 file descriptors
   0: S_IFCHR mode:0666 dev:32,24 ino:143523 uid:0 gid:3 rdev:13,2
      O_RDONLY|O_LARGEFILE
   1: S_IFCHR mode:0666 dev:32,24 ino:143523 uid:0 gid:3 rdev:13,2
      O_WRONLY|O_APPEND|O_LARGEFILE
   2: S_IFCHR mode:0666 dev:32,24 ino:143523 uid:0 gid:3 rdev:13,2
      O_WRONLY|O_APPEND|O_LARGEFILE
   3: S_IFIFO mode:0666 dev:176,0 ino:4132162568 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   4: S_IFDOOR mode:0444 dev:176,0 ino:4127633624 uid:0 gid:0 size:0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[202]
   5: S_IFREG mode:0644 dev:32,9 ino:3643 uid:19821 gid:10 size:297984
      O_RDONLY FD_CLOEXEC
   6: S_IFREG mode:0644 dev:32,8 ino:566 uid:19821 gid:10 size:29696
      O_RDWR FD_CLOEXEC
   7: S_IFREG mode:0644 dev:32,8 ino:612 uid:19821 gid:10 size:0
      O_RDWR FD_CLOEXEC
   8: S_IFREG mode:0644 dev:32,8 ino:666 uid:19821 gid:10 size:0
      O_RDWR FD_CLOEXEC
   9: S_IFCHR mode:0000 dev:32,24 ino:360 uid:0 gid:0 rdev:41,104
      O_RDWR FD_CLOEXEC
  10: S_IFREG mode:0644 dev:32,9 ino:38607 uid:19821 gid:10 size:65083
      O_RDONLY
  11: S_IFREG mode:0644 dev:32,8 ino:667 uid:19821 gid:10 size:4096
      O_RDWR

In the preceding example, the pfiles command is executed on PID 585. The PID and process name are dumped, followed by a listing of the process's opened files. For each file, we see a listing of the file descriptor (the number to the left of the colon), the file type, file mode bits, the device from which the file originated, the inode number, file UID and GID, and the file size.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.135.67