CHAPTER 5
SMF: The Service Management Facility

I get knocked down, but I get up again, 'cause you're never gonna keep me down.

—From the song Tubthumping by Chumbawamba

Much of modern computer hardware is self-monitoring and self-correcting. It tests itself and reports real and impending errors so that preemptive maintenance can be performed, often in the form of "hot swap" components that can be replaced without interrupting system activity. What would a similar approach to system software look like? It would need a framework for identifying and classifying services and their dependencies, for monitoring and reporting their status, and for some form of autorecovery. UNIX has historically lacked such a framework, relying instead on ad hoc solutions to determine which services are running, which services are not running that should be and why, and which potential services are available.

Think about how you have typically configured service programs in early versions of UNIX and Linux. You created a shell script in one of the /etc/rc.d directories, prefixed the script name with an S or a K to identify it as a start or kill script, and gave the script a number that determines when it is run. Thus, the /etc/rc.d/rc5.d/S80sendmail script starts the sendmail service when the system enters run level 5. It starts up after the sshd service specified in /etc/rc.d/rc5.d/S55sshd. So, does sendmail depend on the sshd service being ready before it starts? You can't tell from the scripts! An error in the name or location of your service script can prevent it from running; locating such errors has also been difficult. Administrators usually resort to searching the system log files and process tables using grep; such a simplistic approach often results in incomplete information about the nature of the problem:

  • Service processes cannot be easily identified and listed separately from user processes.
  • It's difficult to determine or define when services should run.
  • There is no standard framework for defining service dependencies; if two services have no interdependencies, they can't start in parallel, saving boot time.
  • There are no standard rules for restarting services.
  • There are too many differing configuration files in too many different locations, each with its own format and syntax rules.

To address these issues, OpenSolaris includes the Service Management Facility (SMF), which defines a framework and administrative tools for configuring and monitoring system services.

What's a Service?

You already know what a service is; it's a persistently running application, usually started at system boot time and generally not associated with an interactive user's login session. Programs such as the Apache web server, the MySQL database, NFS file servers, the sendmail email daemon, firewalls, DNS servers, and the sshd login daemon are all typical examples of services started when you boot your system. Services listen for and respond to requests for some action such as opening and sending a file with NFS, queuing and printing files, delivering and forwarding email, or responding to database queries.

SMF provides a framework to assign OpenSolaris services a standard state model, naming standards, dependency assignments, and restarter methods, all under control of a service daemon (svc.startd), which is notified of service outages and recovers them according to your specifications. You can install OpenSolaris and use it to develop and run applications without using its special features such as containers and DTrace, but because SMF replaces the familiar /etc/rc* files and methods for managing services, this is one new feature of OpenSolaris that you shouldn't ignore.

Note Your custom rc service scripts and those installed by certain ISV application software will still work; they are executed within their assigned run level after SMF-managed services are started. You just won't be able to manage these services with SMF until you prepare and register a service manifest that calls your script.

To understand OpenSolaris services, you need to learn how to refer to them by their true names. Services are referenced using a Fault Managed Resource Identifier (FMRI), which is a character string that looks a lot like a URL. For example, the sshd service daemon's full FMRI on your local system is svc://localhost/network/ssh:default.

FMRIs have the following components:

  • A scheme, which indicates the type of service, either svc for an SMF-managed service or lrc for a legacy rc script–managed service.
  • A location, which indicates the host name where the service is running. Usually this will be localhost, but later versions of SMF will allow other locations for dependency purposes.
  • A functional category, which indicates the type of service. Some types of service are as follows:
    • Application: User service program or daemon
    • System: Platform-independent system service
    • Device: I/O and other hardware devices (generally used for dependencies)
    • Network: Network services, including those converted from inetd
    • Milestone: Run levels, such as those in SVR4
    • Platform: Services specific to the local hardware
    • Site: Services specific to the organization's site
  • A description, which names the service.
  • An instance, which is used to indicate services that may have multiple copies running, such as NFS service daemons.

So, the FMRI for the sshd service daemon, svc://localhost/network/ssh:default, indicates that ssh is an SMF-managed network service running with one default instance on the local system.

When you need to refer to a service using any of the SMF programs, you often don't need to give its full FMRI. Like with OpenSolaris's path conventions for file names, you can use the FMRI's absolute or relative name depending on where you are or what program you are using. So, when referring to the FRMI of the sshd service, you could use any one of the following:

svc://localhost/network/ssh:default
svc:/network/ssh:default
network/ssh:default
ssh

Now that you've seen how to refer to services by their names, you can start using the SMF tools shown in Table 5-1 to monitor and manage services.

Table 5-1.    Service Management Tools *

*svcs and svcprop are in /usr/bin, and svcadm and svccfg are in /usr/sbin; set your path appropriately.
Program Name Description
svcs Reports service status information, dependencies, instances, and error diagnostics
svcadm Administers individual service instances, enables, disables, and restarts
svccfg Configures service parameters and data files
svcprop Reports service properties and privileges

Every service has a state that indicates its current functional activity. Services move from one state to another because of system events (such as run-level changes), error conditions, or administrator actions. A service might not be able to move to a desired state because of unfulfilled dependencies or other conditions. Table 5-2 shows the possible states for OpenSolaris services.

Table 5-2.    Possible Service States

State Description
uninitialized This is the starting state for all services before svc.startd moves the service to a new state.
disabled The service has been disabled by the administrator.
offline The service is enabled but not yet online, usually because it's waiting for a dependency to be satisfied.
online The service has been enabled and has successfully started; all its dependencies have been satisfied.
degraded The service is enabled and running but with a level of degraded performance that is specified in the service's configuration.
maintenance The service cannot be started by svc.startd because of an error or unsatisfied dependency and must be manually administered to clear the fault conditions.

You can now explore the services on your OpenSolaris system using the SMF tools and observe their states. We'll show some examples next to get you started. First, list the services on your system using the svcs command; use the -a flag to list all the registered services. Figure 5-1 shows typical output from this command (some output lines have been deleted to shorten the list for printing).

Images

Figure 5-1.    Sample output (abbreviated) from the svcs -a command

Notice the variety of service types and their states. The pppd point-to-point network protocol daemon, for example, which is started by the /etc/init.d/pppd script, is listed as an lrc, or legacy rc, service. Remember that this is all you can learn from SMF about such services—the fact that they are running and the time that they were started—because only svc services are managed by SMF. Also note that some services are in the disabled state, while some are running, that is, in the online state.

A Bit About Milestones

You'll notice in the output listed in Figure 5-1 that there are several milestone services listed. Milestones group services for administrative and end user availability. These groups correspond with the traditional UNIX/Linux run levels shown in Table 5-3.

Table 5-3.    OpenSolaris Boot Milestones (Run Levels)

SVR4 Run Level SMF Milestone
- none; no services are enabled, and only the kernel is running
s, S single-user; traditional single-user mode for administrative purposes
2 multi-user
3 multi-user-server
5 all

If you need to put your system into single-user mode, for example, you can still use the /usr/sbin/init s command for this. SMF recognizes that run levels are groupings of services, so it provides specific FMRIs for each run level. Thus, the "SMF way" of going to single-user mode is as follows:

# svcadm milestone single-user

and the command to return to run level 3 would be as follows:

# svcadm milestone multi-user-server

The following command will enable all services dependent on the multi-user-server milestone:

# svcadm milestone all

More About Services

Let's examine the ssh service in more detail. The old ways of stopping this service would be to kill its process or to run its rc script with the stop parameter, something like this:

# /etc/rc.d/init.d/sshd stop

If you kill the OpenSolaris sshd process, as shown in Figure 5-2, and then check to see whether it's been stopped, you see that it's still there but running with a new process ID! How did that happen? It was restarted by the SMF service daemon, svc.startd.

Images

Figure 5-2.    Automatic restart of sshd by SMF

So, how do you stop the sshd service? You use the svcadm command-line program, as shown in Figure 5-3, or use the System Images Administration Images Services menu and uncheck the SSH Server box, as shown in Figure 5-4.

Images

Figure 5-3.    Disabling the ssh service using the svcadm command

Note that in Figure 5-3 we first used both the ps command and the svcs ssh command to show that the ssh service was running. We then disabled the service with svcadm and verified that it was indeed disabled and that its process was gone. Also note that we did not need to give the full FMRI for the service since there was only one local instance; recall that this is like using absolute or relative path names for files.

Images

Figure 5-4.    Using the Services GUI to select and disable the ssh service

You use the svcadm command for typical service administration tasks by using the flags shown in Table 5-4.

Table 5-4.    Service Management Action Flags

Action Flag Description
enable Sets the service as enabled and starts it if all of its dependencies are satisfied
disable Sets the service as disabled; stops it and doesn't restart it
restart Stops and restarts the service, assuming its dependencies are satisfied
refresh Reloads the service's configuration files and restarts the service
clear Removes the "maintenance" state after a repair; if the service was previously set as enabled, restarts it

When you disable a service, it stays disabled even after a system reboot unless you indicate that the service is being disabled only for the current boot session. For example, the following command will disable the ssh service, and it will not restart at the next reboot:

# svcadm disable ssh

If you intended to disable ssh for only the current boot session, you would use the -t (temporary) flag so that normally enabled services will start again at the next system reboot:

# svcadm disable -t ssh

The power of SMF is really revealed in your ability to define and manage interservice dependencies in the service's manifest file; if a service is not working, you need to know whether something is amiss with the service program itself or with some file or process that the service needs in order to function. SMF's svcs program lets you display a service's dependency relationships along with critical state information. Figure 5-5 shows a series of example svcs commands.

Images

Figure 5-5.    State and dependency details for the ssh service

The first command, svcs ssh, simply displays the current state and start time of the default instance of the ssh service. More detail is shown in the "long" listing using svcs -l ssh; this listing provides a wealth of information about the service. Table 5-5 briefly explains this output. Later in this chapter you'll see where all this configuration detail is defined.

Table 5-5.    Example State and Dependency Detail for the ssh Service

Field Description
fmri The registered FMRI of the service
name The name given to the service by the writer of the service definition
enabled Indicator of whether the service has its enabled state set (true/false)
state The current state of the service
next_state Indicator of whether the service is transitioning from one state to another, the next state
state_time The time the service entered its current state
logfile The location of the log file used by the service
restarter The name of the service used to restart; this can be the default system restarter or a custom procedure
contract_id The registration number of the service
dependency Listing of services and files needed to be online and available in order for the service to start

It's worth reemphasizing the value of such service details. Each service can have its own log file and restarter process, making it much easier to diagnose service startup errors. Additionally, all of the services needed to support a service are easy to determine. It's almost always the case that services fail because some dependency is not met. Let's see how that works by creating an artificial missing dependency example.

You may already know that sshd needs the /etc/sshd/sshd_config file to configure itself before starting up. Suppose this file is missing. What can SMF tell you when you discover that sshd is not running? Figure 5-6 shows this scenario. The administrator notices that ssh is offline and tries unsuccessfully to enable it. The -x flag of the svcs program provides an explanation.

The svcs -x ssh command reveals that the reason the service is offline is the missing configuration file. Additionally, it refers you to the man page for the service daemon and its log file, along with a URL that provides an online interpretation of the error condition (Figure 5-7). On other UNIX and Linux systems, depending on your system and logging configuration, information about the missing file might not even be logged by sshd in /var/adm/messages. SMF identifies the exact problem for you.

Images

Figure 5-6.    Detail on why the ssh service is not running

Images

Figure 5-7.    The SMF URL that suggests a reason for the ssh service's failure to start

The URL lists details about the error, its impact on the system, and suggestions for administrator action. In fact, any system error will generate and log a message ID that you can enter into the Solaris/OpenSolaris search tool at http://www.sun.com/msg/ to get an explanation of the error condition. Admittedly, some of the explanations and suggested actions at this site can be somewhat generic, but even that is far more helpful than silent service failures or indecipherable error codes.

Occasionally, simply fixing a dependency is not enough to restart a service; it will remain in an offline or maintenance state until all the error conditions are eliminated and all dependencies are met. After you have diagnosed the problems and taken appropriate administrative actions, you can clear the maintenance state and restart the service. Figure 5-8 shows you such a scenario.

Images

Figure 5-8.    Clearing the maintenance state of a service

Say the administrator notices that the keyserv service for storing private encryption keys is not running and is in the maintenance state. Checking the man page, she discovers that the keyserv daemon won't start if the system has no domain name, so she assigns one. She then attempts to restart the keyserv service, but it stubbornly remains in a maintenance state. But she soon remembers that this state must be explicitly cleared using the svcadm clear keyserv command, after which the service enters the online state.

Tip If a service has multiple dependencies that are not yet enabled, you can enable them all recursively at one time using the -r flag of the svcadm command: svcadm enable -r ssh.

If you take another look at Figure 5-5, you'll note that the ssh service has a contract_id. A contract defines a relationship (dependency) between a service process and another resource managed by the kernel, such as processors, memory, devices, or other service processes. SMF uses contracts to organize notification events; if a device or service fails, the kernel will notify the owner of the contract for that resource. SMF services are contracted to the svc.startd daemon so that if they fail or exit, then the appropriate restarter action can be taken. The default restarter will try to restart a service if any of the service's contract members fail.

You can examine and monitor service contract relationship activities using the ctstat and ctwatch commands; they provide a means to get detailed information on failing services.

Creating Your Own Services

You've seen that existing OpenSolaris services have their own FMRIs, service names, log files, restarters, and dependencies. Where are these characteristics defined? And, more importantly, how can you define your own services?

Each OpenSolaris service is configured using a manifest file that defines the service's name, start and stop methods, restart conditions, and dependencies. Manifests are XML files that reside in the /var/svc/manifest directory tree; each service functional category has its own subdirectory for its manifest files. For example, the manifest file for the ssh service that we have been examining is /var/svc/manifest/network/ssh.xml.

Tip Before you decide to create your own service manifest, remember that you are part of the OpenSolaris developer community and that there are other users who may have already created one that you can use. You can find sample manifest files for many types of services at http://blastwave.org/smf/manifests.php and at http://opensolaris.org/os/community/smf/manifests.

Service manifests can be easily created by copying and modifying existing manifests or by using generic manifest templates such as the one at http://www.sun.com/bigadmin/content/selfheal/smf-hds/template.xml, shown in Figure 5-9 (note the "REPLACE_ME" locations in this template; that's where you define the service name, timeout values, and other characteristics of your service).

Images

Figure 5-9.    A sample generic service manifest template

These are the key manifest components you need to define:

  • Service name: The service name includes the functional category and a character string that names the service, such as application/oracle or network/nfsV4.
  • Start/stop methods: These are shell scripts, typically residing in the /lib/svc/method directory, which call the service programs. These scripts are very much like the familiar rc scripts.
  • Dependencies: Identifying dependencies is often the most difficult part of creating service manifests, and it's a good idea to examine existing OpenSolaris manifests to see how other services define them. You need to know what your service needs in order to function, such as network services, file systems, crypto services, or local device availability.
  • Dependents: Are you creating a service that is needed by some other service? For example, if your service starts a firewall program, additional network services can be listed that depend on your service (without modifying those services' manifests).
  • Milestone (run level): Your service may need to start within a particular milestone because other services depend on it; that milestone will not complete until your service and all the other services for that milestone have started.

Other manifest components include the number of service instances, service model, fault response, and reference documentation. Let's continue examining the ssh manifest, /var/svc/manifest/network/ssh.xml, to see how each of these components have been defined; because the file is rather long, we'll show only the relevant sections of the manifest and highlight the key components in bold.

The ssh service name tag also includes a version number for change documentation purposes:

<service
        name='network/ssh'
        type='service'
        version='1'>

The ssh service is dependent on other services such as the local file system, network, and crypto services. It's also dependent on the presence of the sshd_config file and is started within the multi-user-server milestone; in turn, that milestone is defined to be dependent on the ssh service and will not complete until the ssh service is online.

<dependency name='fs-local'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri
                        value='svc:/system/filesystem/local'-/>
...
<dependency name='net- physical'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmrivalue='svc:/network/physical'-/>
       </dependency>
<dependency name='cryptosvc'
                grouping='require_all'
                restart_on='none'
                type='service'>
                <service_fmri-value='svc:/system/cryptosvc'-/>
        </dependency>
...
<dependency name='config_data'
                grouping='require_all'
                restart_on='restart'
                type='path'>
                <service_fmri
                    value='file://localhost/etc/ssh/sshd-/>
         </dependency>
...
<dependent
                 name='ssh_multi-user-server'
                 grouping='optional_all'
                 restart_on='none'>
                         <service_fmri
                             value='svc:/milestone/multi-/>
         </dependent>

The service's start and refresh methods reference shell scripts in the /lib/svc/method directory that accept the parameters start or refresh as input and execute the sshd daemon. The stop method executes a kill on the service's process, as you would expect. All of these actions, however, are performed under the control of the SMF daemon to provide and manage the service's states and transitions.

You can also specify online documentation references for the service; this assists administrators when error reports are logged:

<template>
             <common_name>
                     <loctext xml:lang='C'>
                     SSH server
                     </loctext>
             </common_name>
             <documentation>
                     <manpage title='sshd' section='1M' manpath='/usr/share/man' />
             </documentation>
         </template>

After you have copied or created your service's manifest file, move it to the appropriate functional category directory. You can verify that your file is valid using the svccfg command since it has a built-in XML validator. The following command will validate your file and register it with the SMF service daemon:

# svccfg import yourmanifest.xml

You will then be able to see that your service is available (using the svcs command), and if you've specified your dependencies correctly, you can use the svcadm command to enable your service and the svcs command to examine its state.

Service manifests can be complicated, but you can create some that are quite basic, such as this simple example for starting the MySQL database (after downloading and installing mysql using Package Manager). Create a file, mysql.xml, containing the following:

<?xmlversion="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="manifest" name="MySQL">
<service name="application/database/mysql" type="service" version="1">
<single_instance/>
<dependency name="filesystem" grouping="require_all"
restart_on="none"-type="service">
   <service_fmri value="svc:/system/filesystem/local"/>
</dependency>
<exec_method type="method" name="start" exec="/etc/sfw/mysql/mysql.server start"
timeout_seconds="120"/>
<exec_method type="method" name="stop" exec="/etc/sfw/mysql/mysql.server stop"
timeout_seconds="120"/>
<instance name="default" enabled="false"/>
<stability value="Unstable"/>
<template>
<common_name>
    <loctext xml:lang="C">MySQL RDBMS</loctext>
</common_name>
<documentation>
    <manpage title="mysql" section="1" manpath="/usr/sfw/share/man"/>
</documentation>
</template>
</service>
</service_bundle>

Note the service name, MySQL; its dependency on the local file system service svc:/system/filesystem/local; the start and stop methods that call the /etc/sfw/mysql/mysql.server executable; and the documentation pointer to the mysql man page.

Copy the file into the /var/svc/manifest/application/database directory, activate it by running svccfg import mysql.xml, and then enable the service using svcadm enable mysql.

Editing manifest files can be tedious, and it's easy to introduce XML syntax errors as well as SMF errors. Fortunately, there are tools to assist you in creating and managing these files. One such tool is the Java-based SMF Manifest Creator that was a prize winner in the OpenSolaris Community Innovation Awards contest; download it at http://opensolaris.org/os/project/awards/awards_land/Entries/. Another tool that we've mentioned in earlier chapters is Webmin, a community-developed system management tool for Linux and UNIX systems, including OpenSolaris (see http://webmin.com/). Webmin is also in the OpenSolaris software repository's Administration and Configuration collection, so you can download and install it using Package Manager. After it's installed, you access it with your browser at http://localhost:10000, as shown in Figure 5-10.

Images

Figure 5-10.    The Webmin login page

Webmin includes interfaces to most OpenSolaris system management and configuration tasks (Figure 5-11) including the creation and activation of SMF services, which creates the service manifest files for you (Figure 5-12).

Images

Figure 5-11.    Webmin administrative task menu

Images

Figure 5-12.    The Webmin SMF page

Summary

OpenSolaris's Service Management Facility is designed to provide better control over system services and daemons than traditional UNIX/Linux initialization/termination scripts. It's the one "different" OpenSolaris feature you shouldn't ignore. Even though your legacy rc script methods still work, you will benefit from converting these scripts to SMF-managed services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.149.232