13.5. Path-Name Management

All but a few of the vnode methods operate on vnode pointers, rather than on path names or file descriptors. Before calling file system vnode methods, the vnode framework first converts path names and file descriptors into vnode references. File descriptors may be directly translated into vnodes for the files they referenced, whereas path names must be converted into vnodes by a lookup of the path-name components and a reference to the underlying file. The file-system-independent lookuppn() function converts path names to vnodes. An additional wrapper, lookupname(), converts path names from user-mode system calls.

13.5.1. The lookupname() and lookupppn() Methods

Given a path name, the lookupppn() method attempts to return a pointer to the vnode the path represents. If the file is already opened, then a new reference to the file is established, and if not, the file is first opened. The lookuppn() function decomposes the components of the path name, separating them by “/” and “.”, and calls the file-system-specific vop_lookup() method for each component of the path name.

If the path name begins with a “/”, path-name traversal starts at the user's root directory. Otherwise, it starts at the vnode pointed to by the user's current directory. lookuppn() traverses the path one component at a time, using the vop_lookup() vnode method. vop_lookup() takes a directory vnode and a component as arguments and returns a vnode representing that component.

If a directory vnode has v_vfsmountedhere set, then it is a mount point. If lookuppn() encounters a mount point while going down the file system tree, then it follows the vnode's v_vfsmountedhere pointer to the mounted file system and calls the vfs_root() method to obtain the root vnode for the file system. Path-name traversal then continues from this point.

If lookuppn() encounters a root vnode (VROOT flag in v_flag set) when following “..”, then lookuppn() follows the vfs_vnodecovered pointer in the vnode's associated vfs to obtain the covered vnode.

If lookuppn() encounters a symbolic link, then it calls the vn_readlink() vnode method to obtain the symbolic link. If the symbolic link begins with a “/”, the path name traversal is restarted from the root directory; otherwise, the traversal continues from the last directory. The caller of lookuppn() specifies whether the last component of the path name is to be followed if it is a symbolic link.

This procedure continues until the path name is exhausted or an error occurs. When lookuppn() completes, it returns a vnode representing the desired file.

13.5.2. The vop_lookup() Method

The vop_lookup() method searches a directory for a path-name component matching the supplied path name. The vop_lookup() method accepts a directory vnode and a string path-name component as an argument and returns a vnode pointer to the vnode representing the file. If the file cannot be located, then ENOENT is returned. Many regular file systems will first check the directory name lookup cache, and if an entry is found there, the entry is returned. If the entry is not found in the directory name cache, then a real lookup of the file is performed.

13.5.3. The vop_readdir() Method

The vop_readdir() method reads chunks of the directory into a uio structure. Each chunk can contain as many entries as will fit within the size supplied by the uio structure. The uio_resid structure member shows the size of the getdents request in bytes, which is divided by the size of the directory entry made by the vop_readdir() method to calculate how many directory entries to return.

Directories are read from disk with the buffered kernel file functions fbread and fbwrite. These functions, described in Table 13-11, are provided as part of the generic file system infrastructure.

Table 13-11. Functions for Cached Access to Files from Within the Kernel
Function Name Description
fbread() Returns a pointer to locked kernel virtual address for the given <vp, off> for len bytes. The read may not cross a boundary of MAXBSIZE (8192) bytes.
fbzero() Similar to fbread(), but calls segmap_pagecreate(), not segmap_fault(), so that SOFTLOCK can create the pages without using VOP_GETPAGE(). Then, fbzero() zeroes up to the length rounded to a page boundary.
fbwrite() Direct write.
fbwritei() Writes directly and invalidates pages.
fbdwrite() Delayed write.
fbrelse() Releases fbp.
fbrelsei() Releases fbp and invalidates pages.

13.5.4. Path-Name Traversal Functions

Several path-name manipulation functions assist with decomposition of path names. The path-name functions use a path-name structure, shown below, to pass around path-name components.

Example. Header File <sys/pathname.h>
/*
 * Path-name structure.
 * System calls that operate on path names gather the path name
 * from the system call into this structure and reduce it by
 * peeling off translated components. If a symbolic link is
 * encountered, the new path name to be translated is also
 * assembled in this structure.
 *
 * By convention pn_buf is not changed once it's been set to point
 * to the underlying storage; routines which manipulate the path name
 * do so by changing pn_path and pn_pathlen. pn_pathlen is redundant
 * since the path name is null-terminated but is provided to make
 * some computations faster.
 */
typedef struct pathname {
        char    *pn_buf;                /* underlying storage */
        char    *pn_path;               /* remaining pathname */
        size_t  pn_pathlen;             /* remaining length */
        size_t  pn_bufsize;             /* total size of pn_buf */
} pathname_t;

The path-name functions are shown in Table 13-12.

Table 13-12. Path-Name Traversal Functions from <sys/pathname.h>
Method Description
pn_alloc() Allocates a new path-name buffer.
pn_get() Copies path-name string from user and mounts arguments into a struct path name.
pn_set() Sets a path name to the supplied string.
pn_insert() Combines two path names.
pn_getsymlink() Follows a symbolic link for a path name.
pn_getcomponent() Extracts the next delimited path-name component.
pn_setlast() Appends a component to a path name.
pn_skipslash() Skips over consecutive slashes in the path name.
pn_fixslash() Eliminates any trailing slashes in the path name.
pn_free() Frees a struct path name.

13.5.5. The Directory Name Lookup Cache (DNLC)

The directory name lookup cache is based on BSD 4.2 code. It was ported to Solaris 2.0 and threaded and has undergone some significant revisions. Most of the enhancements to the DNLC have been performance and threading, but a few visible changes are noteworthy. Table 13-13 summarizes the important changes to the DNLC

Table 13-13. Solaris DNLC Changes
Year OS Rev Comment
1984 BSD 4.2 14-character name maximum
1990 SunOS 2.0 31-character name maximum
1994 SunOS 5.4 Performance (new locking/search algorithm)
1998 SunOS 5.7 Variable name length

13.5.5.1. DNLC Operation

Each time we open a file, we call the open() system call with a path name. That path name must be translated to a vnode by the process of reading the directory and finding the corresponding name that matches the requested name. No place in the vnode stores the name of the file; so, to prevent us from having to reread the directory every time we translate the path name, we cache pathname-to-vnode mappings in the directory name lookup cache. The cache is managed as an LRU cache, so that most frequently used directory entries are kept in the cache. The early-style DNLC in Solaris uses a fixed name length in the cache entries. Hence, if a file name is opened with a name larger than can fit, it will not be entered into the DNLC. The old-style (pre-SunOS 5.4) DNLC is shown in Figure 13.6.

Figure 13.6. Solaris 2.3 Name Cache


The number of entries in the DNLC is controlled by the ncsize parameter, which is initialized to 4 * (max_nprocs + maxusers) + 320 at system boot.

Most of the DNLC work is done with two functions: dnlc_enter() and dnlc_lookup(). When a file system wants to look up the name of a file, it first checks the DNLC with the dnlc_lookup() function, which queries the DNLC for an entry that matches the specified file name and directory vnode. If no entry is found, dnlc_lookup fails and the file system reads the directory from disk. When the file name is found, it is entered into the DNLC with the dnlc_enter() function. The DNLC stores entries on a hashed list (nc_hash[]) by file name and directory vnode pointer. Once the correct nc_hash chain is identified, the chain is searched linearly until the correct entry is found.

The original BSD DNLC had 8 nc_hash entries, which was increased to 64 in SunOS 4.x. Solaris 2.0 sized the nc_hash list at boot, attempting to make the average length of each chain no more than 4 entries. It used the total DNLC size, ncsize, divided by the average length to establish the number of nc_hash entries. Solaris 2.3 had the average length of the chain dropped to 2 in an attempt to increase DNLC performance; however, other problems, related to the LRU list locking and described below, adversely affected performance.

Each entry in the DNLC is also linked to an LRU list, in order of last use. When a new entry is added into the DNLC, the algorithm replaces the oldest entry from the LRU list with the new file name and directory vnode. Each time a lookup is done, the DNLC also takes the entry from the LRU and places it at the end of the list so that it won't be reused immediately. The DNLC uses the LRU list to attempt to keep most-used references in the cache. Although the DNLC list had been made short, the LRU list still caused contention because it required that a single lock be held around the entire chain.

The old DNLC structure is shown below. Note that the name field is statically sized at 31 characters.

Example. Header File <sys/dnlc.h>
#define NC_NAMLEN       31      /* maximum name segment length we bother with */

struct ncache {
        struct ncache *hash_next;       /* hash chain, MUST BE FIRST */
        struct ncache *hash_prev;
        struct ncache *lru_next;        /* LRU chain */
        struct ncache *lru_prev;
        struct vnode *vp;               /* vnode the name refers to */
        struct vnode *dp;               /* vnode of parent of name */
        char namlen;                    /* length of name */
        char name[NC_NAMLEN];           /* segment name */
        struct cred *cred;              /* credentials */
        int hash;                       /* hash signature */
};

13.5.5.2. The New Solaris DLNC Algorithm

In Solaris 2.4, replacement of the SVR4 DNLC algorithm yielded a significant improvement in scalability. The Solaris 2.4 DNLC algorithm removed LRU list lock contention by eliminating the LRU list completely. In addition, the list now takes into account the number of references to a vnode and whether the vnode has any pages in the page cache. This design allows the DNLC to cache the most relevant vnodes, rather than just the most frequently looked-up vnodes.

Example. Header File <sys/dnlc.h>
								struct ncache {
        struct ncache *hash_next;       /* hash chain, MUST BE FIRST */
        struct ncache *hash_prev;
        struct ncache *next_free;       /* freelist chain */
        struct vnode *vp;               /* vnode the name refers to */
        struct vnode *dp;               /* vnode of parent of name */
        struct cred *cred;              /* credentials */
        char *name;                     /* segment name */
        int namlen;                     /* length of name */
        int hash;                       /* hash signature */
};

The lookup algorithm uses a rotor pointing to a hash chain, which switches chains for each invocation of dnlc_enter() that needs a new entry. The algorithm starts at the end of the chain and takes the first entry that has a vnode reference count of 1 or no pages in the page cache. In addition, during lookup, entries are moved to the front of the chain so that each chain is sorted in LRU order. Figure 13.7 illustrates the Solaris 2.4 DNLC.

Figure 13.7. Solaris 2.4 DNLC


The Solaris 7 DNLC was enhanced to use the kernel memory allocator to allocate a variable length string for the name; this change removed the 31-character limit. In the Solaris 7 DNLC structure, shown below, note that the name field has changed from a static structure to a pointer.

13.5.5.3. DNLC Support Functions

Table 13-14 lists the DNLC support functions.

Table 13-14. Solaris 7 DNLC Functions from sys/dnlc.h
Function Description
dnlc_lookup() Locates an ncache entry that matches the supplied name and directory vnode pointer. Returns a pointer to the vnode for that entry or returns NULL.
dnlc_update() Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer but the vnode is different, then the entry is overwritten. Otherwise, the function returns with no action.
dnlc_enter() Enters a new ncache entry into the DNLC for the given name and directory vnode pointer. If an entry already exists for the name and directory pointer, the function returns with no action.
dnlc_remove() Removes the entry matching the supplied name and directory vnode pointer.
dnlc_purge() Called by the vfs framework when an umountall() is called.
dnlc_purge_vp() Purges all entries matching the vnode supplied.

13.5.6. File System Modules

A file system is implemented as an instance of the vfs and vnode objects in a self-contained, loadable kernel module. The operating system provides the infrastructure for mounting and interfacing with the file system, and each file system implementation can abstract the file system object methods in different ways. The modules are loaded from the file system directory in /kernel/fs during the first mount operation. File systems provide module initialization functions; a typical file system initialization section declares a module constructor and destructor, as described in “Kernel Module Loading and Linking”.

13.5.7. Mounting and Unmounting

When a file system is first mounted, the file system framework attempts to autoload the file system from the /kernel/fs directory. The autoload procedure calls the initialization routines in the file system; at that point, the file system can register itself in the file system switch table. The file system is required to fill in the vfssw structure during the initialization function. Once this phase is completed, the file system is available for mount requests and the mount method of the file system is called.

When the mount method is called for the file system, a vfs object for the instance of the mounted file system is created; then, the mount method must fill in the vfs structures. Typically, the root vnode of the file system is either created or opened at this time. The following example shows a simple file system and its initialization functions.

extern struct mod_ops mod_fsops;
static struct modlfs modlfs = {
        &mod_fsops,
        "vnode file pseudo file system",
        &vfw
};

static struct modlinkage modlinkage = {
        MODREV_1,
        &modlfs,
        NULL
};

int
_init(void)
{
        int     error;

        mutex_init(&vnfslock, NULL, MUTEX_DEFAULT, NULL);
        rw_init(&vnfsnodes_lock, NULL, RW_DEFAULT, NULL);
        error = mod_install(&modlinkage);
        if (error) {
                mutex_destroy(&vnfslock);
                rw_destroy(&vnfsnodes_lock);
        }
        myfs_init_otherstuff();
        return (error);
}

int
_fini(void)
{
        int     error;

        vnfs_vnlist_destroy();
        error = mod_remove(&modlinkage);
        if (error)
                return (error);
        mutex_destroy(&vnfslock);
        rw_destroy(&vnfsnodes_lock);
        return (0);
}

int
_info(struct modinfo *modinfop)
{
        return (mod_info(&modlinkage, modinfop));
}

static struct vfssw vfw = {
        "myfs",
        myfsinit,
        &myfs_vfsops,
        0
};

static int
myfsinit(struct vfssw *vswp, int fstype)
{
        vswp->vsw_vfsops = &myfs_vfsops;
        myfstype = fstype;
        (void) myfs_init();
        return (0);
}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.86.208