LSF Functions

The LSF software provides a full range of resource management functions. You can establish policies for where and when batch jobs can be run. Monitoring processes are put in place to control the amount of resources consumed by batch jobs. After batch jobs are run, information about job resource usage, number of jobs run, and so on is recorded for later workload analysis and department charge back.

Establishing Policies

The LSF software provides a number of controls that can be used to establish resource management policies. These controls include: setting job priorities, establishing scheduling policies, setting limits on job slots, and setting user permissions.

Job Priorities

Each LSF batch queue has a priority number. The LSF software tries to start jobs from the highest priority queue first. Within each queue, jobs are dispatched in a first come, first served order, unless a fairshare scheduling policy has been established.

Jobs can be dispatched out of turn if pre-execution conditions are not met, specific hosts or resources are busy or unavailable, or a user has reached the user job slot limit. To prevent overloading any host, the LSF software waits for a configured number of dispatching intervals before sending another job to the same host.

Scheduling Policies

The LSF software provides several resource controls to prioritize the order in which batch jobs are run. Batch jobs can be scheduled to run on a firstcome-first-served basis, fair sharing between all batch jobs, and preemptive scheduling where a high priority job can bump a lower priority job that is currently running. Other policies include deadline constraint and backfill of resources reserved for large jobs that are not yet ready to run.

Note

The fairshare scheduling the LSF software uses should not be confused with the scheduling class implemented by the Solaris Resource Manager software. The LSF software is implemented on top of the Solaris timesharing scheduling class and only manages jobs submitted to LSF batch queues. The SRM software provides fairshare scheduling for all jobs running on a Solaris system.


Fairshare in Queues

Fairshare scheduling is an alternative to the default first come, first served scheduling. Fairshare scheduling divides the processing power of the LSF cluster among users and groups to provide fair access to resources for all jobs in a queue. The LSF software allows fairshare policies to be defined at the queue level so that different queues can have different sharing policies.

The main purpose of LSF fairshare scheduling is to prevent a single user from using up all the available job slots, thus locking out other users.

Fairshare in Host Partitions

Host partitions provide fairshare policy at the host level. Unlike queue level fairshare, a host partition provides fairshare of resources on a group of hosts, and it applies to all queues that use hosts in the host partition. Fairshare scheduling at the level of queues and host partitions are mutually exclusive.

Hierarchical Fairshare

Hierarchical fairshare allows resources to be allocated to users in a hierarchical manner. Groups of users can collectively be allocated a share, and that share can be further subdivided and given to subgroups, resulting in a share tree. Note that LSF shares are not related to SRM shares.

Preemptive Scheduling

Preemptive scheduling allows the LSF administrator to configure job queues so that a high priority job can preempt a low priority running job by suspending the low priority job. This is useful to ensure that long running low priority jobs do not hold resources while high priority jobs are waiting for a job slot.

Exclusive Scheduling

Exclusive scheduling makes it possible to run exclusive jobs on a host. A job only runs exclusively if it is submitted to an exclusive queue. An exclusive job runs by itself on a host. The LSF software does not send any other jobs to the host until the exclusive job completes.

Processor Reservation and Backfilling

Processor reservation and backfilling ensure that large parallel jobs are able to run without under-utilizing resources.

There might be delays in the execution of parallel jobs when they are competing with sequential jobs for resources. This is because as job slots become available, they are used in smaller numbers by sequential jobs. This results in the larger number of job slots required by a parallel application never becoming available at any given instant. Processor reservation allows job slots to be reserved for a parallel job until enough are available to start the job. When a job slot is reserved for a job, it is unavailable to other jobs.

Backfilling is the execution of a job that is short enough to fit into the time slot during which the processors are reserved, allowing more efficient use of available resources. Short jobs are said to backfill processors reserved for large jobs.

Job Slot Limits

A job slot is the basic unit of processor allocation in the LSF software. A sequential job uses one slot whereas a parallel job that has N components, uses N job slots, which can span multiple hosts. A job slot can be used by a maximum of one job. A job slot limit restricts the number of job slots that can be used at any one time. Each LSF host, queue, and user can have a slot limit.

Job slot limits are used by queues when deciding whether a particular job belonging to a particular user should be started on a specific host. Depending on whether or not preemptive scheduling policy has been configured for individual queues, each queue can have a different method of counting jobs toward a slot limit.

Resource Controls

Determination of where to run particular jobs is often based on the resources required by the job and the availability of those resources. The following section explains what those resources are.

Resources Based on Location

After an LSF cluster is created, an inventory of all resources in the cluster is taken, so jobs will know the best place to run. Resources can be classified by location into two categories:

  • Host-based resources

  • Shared resources

Host-Based Resources

Host based resources are resources that are not shared among hosts, but are tied to individual hosts. An application must run on that host to access such resources. Examples are CPU, memory, and swap space. Using up these resources on one host does not affect the operation of another host.

Shared Resources

A shared resource is a resource that is not tied to a specific host, but is associated with the entire cluster, or a specific subset of hosts within the cluster. Examples of shared resources include:

  • floating licenses for software packages

  • disk space on a file server which is mounted by several machines

  • the physical network connecting the hosts

An application may use a shared resource by running on any host from which that resource is accessible. For example, in a cluster in which each host has a local disk, but can also access a disk on a file server, the disk on the file server is a shared resource and the local disk is a host--based resource. There will be one value for the entire cluster that measures the utilization of shared resources, but each host based resource is measured separately.

Resources Based on Type

Resources can also be categorized as dynamic and static.

Dynamic Resources

The availability of dynamic, non-shared resources is measured by load indices. Certain load indices are built into the LSF software and are updated at fixed time intervals. Other indices can be specified and are updated when an external load collection program sends them. The following table summarizes the internal load indices collected by the LSF software.

Table 10-1. Internal Load Indices Collected by the LSF Software
Index Measures Units Direction Averaged Over Update Interval
status host status string N/A N/A 15 seconds
rl5s run queue length processes increasing 15 seconds 15 seconds
r1m run queue length processes increasing 1 minute 15 seconds
rl5m run queue length processes increasing 15 minutes 15 seconds
ut CPU utilization percent increasing 1 minute 15 seconds
pg paging activity pages in+pages out per second increasing 1 minute 15 seconds
ls logins users increasing N/A 30 seconds
it idle time minutes decreasing N/A 30 seconds
swp available swap space megabytes decreasing N/A 15 seconds
mem available memory megabytes decreasing N/A 15 seconds
tmp available temporary space megabytes decreasing N/A 120 seconds
io disk I/O kbytes per second increasing 1 minute 15 seconds

The status index is a string indicating the current status of the host. The possible values are:

  • ok

    The host can be selected for execution

  • busy

    A load index exceeds a defined threshold

  • lockU

    The host is locked by a user

  • lockW

    The host's availability time window is closed

  • unavail

    The host is not responding

  • unlicensed

    The host does not have a valid LSF license

The r15s, r1m and r15m load indices have the 15 second, 1 minute, and 15 minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval.

On multiprocessor systems more than one process can executed at a time. The LSF software scales the run queue value on multiprocessor systems to make the CPU load of uniprocessors and multiprocessors comparable. The scaled value is called the effective run queue length.

The LSF software also adjusts the CPU run queue based on the relative speeds of the processors. The normalized run queue length is adjusted for both the number of processors and the CPU speed. The host with the lowest normalized run queue length will run a CPU intensive job the fastest.

The ut index measures CPU utilization, which is the percentage of time spent running system and user code. A host with no process running has a ut value of 0 percent, while a host on which the CPU is completely busy has a ut of 100 percent.

The pg index gives the virtual memory paging rate in pages per second. This index is closely tied to the amount of available RAM and the total process size. If there is not enough RAM to satisfy all processes, the paging rate will be high.

The paging rate is reported in units of pages rather than kilobytes, because the relationship between interactive response and paging rate is largely independent of the page size.

The ls index gives the number of users logged in. Each user is counted once, no matter how many times they have logged into the host.

The it index is the interactive idle time of the host, in minutes. Idle time is measured from the last input or output on a directly attached terminal or a network pseudo terminal supporting a login session.

The tmp index is the space available on the file system that contains the /tmp directory in megabytes.

The swp index gives the currently available swap space in megabytes. This represents the largest process that can be started on the host.

The mem index is an estimate of the real memory currently available to user processes. This represents the approximate size of the largest process that could be started on a host without causing the host to start paging. This is an approximation because the virtual memory behavior is hard to predict.

The io index measures the I/O throughput to disks attached directly to the host in kilobytes per second. It does not include I/O to disks that are mounted from other hosts.

Static Resources

Static resources represent host information that does not change over time such as the maximum RAM available to processes in a machine. Most static resources are determined when the LSF software starts up. The following table lists the static resources reported by the LSF software.

Table 10-2. Static Resources Reported by the LSF Software
Index Measures Units Determined by
type host type string configuration
model host model string configuration
hname host name string configuration
cpuf CPU factor relative configuration
server host can run remote jobs Boolean configuration
rexpri execution priority nice(2) argument LSF
ncpus number of processors processors LSF
ndisks number of local disks disks LSF
maxmem maximum RAM available to users megabytes LSF
maxswp maximum available swap space megabytes LSF
maxtmp maximum space in temporary file system megabytes LSF

The type and model resources are strings specifying the host type and model.

The CPU factor is the speed of the host's CPU relative to other hosts in the cluster. If one processor is twice the speed of another, its CPU factor is twice as large. For multiprocessor hosts, the CPU factor is the speed of a single processor. The LSF software automatically scales the host CPU load to account for additional processors.

The server resource is a Boolean, where its value is 1 if the host is configured to execute tasks from other hosts, and 0 if the host is not an execution host.

Monitoring Resource Usage

Since the amount of available resources on an execution host is constantly changing, some method of monitoring the current state of that host must be deployed to avoid overloading it. Also, some check of the resources consumed by a particular batch job must be performed to assure that the batch job is not consuming more resources than specified in the resource policy for that job. The LSF software uses the Load Information Manager (LIM) as its resource monitoring tool. The LIM process running on each execution host is responsible for collecting load information. The load indices collected by the LIM include:

  • Host status

  • Length of run queue

  • CPU utilization

  • Paging activity

  • Available swap space

  • Available memory

  • I/O activity

The load information is gathered at predefined intervals ranging from 15 seconds to one minute.

To modify or add load indices, you can write an Extended Load Information Manager (ELIM). The ELIM can be any executable program, either an interpreted script or compiled code. Only one ELIM per host is allowed; but each ELIM can monitor and report multiple measures.

LSF Scheduler Components

The following diagram shows the components used by the LSF software to schedule jobs. The mbatchd daemon runs on the master host, which manages the run queues. A job is submitted to these run queues along with the resources the job requires.

The mbatchd daemon periodically scans through the jobs ready to run and compares their resource requirements with the host resources contained in the Load Information Manager (LIM). When the appropriate resources become available for the job, the load conditions on the available hosts are checked to find the least loaded host.

Once the most appropriate host is found, mbatchd sends the job to the sbatchd daemon running on that system. When the job is started, sbatchd keeps track of the resource consumption of the job and reports back to mbatchd.

Figure 10-1. LSF Scheduler Components


Exceptions and Alarms

When managing critical jobs, it is important to ensure that the jobs run properly. When problems are detected during the processing of the job, it becomes necessary to take some form of corrective action. The LSF software allows the user to associate each job with one or more exception handlers, which tell the system to watch for a particular type of error and take a specified action if it occurs. An exception condition represents a problem processing a job. The LSF software can watch for several types of exception conditions during a job's life cycle.

An alarm specifies how a notification should be sent in the event of an exception.

Events

An event is a change or occurrence in the system (such as the creation of a specific file, a tape drive becoming available, or a prior job completing successfully) that can be used to trigger jobs. The LSF software responds to the following types of events:

  • time events

    Points of time (defined by calendars and time expressions) that can be used to trigger the scheduling of jobs.

  • job events

    The starting and completion of other jobs.

  • job group events

    Changes in the status condition of job groups.

  • file events

    Changes in a file's status.

  • user events

    Site specific occurrences, such as a tape mount, defined by the LSF cluster administrator.

  • exception events

    Conditions raised in response to errors in the scheduling or execution of jobs.

Job Starters

A job starter is a specified command (or set of commands) that executes immediately prior to a submitted batch job or interactive job. This can be useful when submitting or running jobs that require specific set up steps to be performed before execution, or jobs that must be performed in a specific environment. Any situation in which a wrapper would be written around the job you want executed, is a candidate for a job starter.

There are two types of job starters in the LSF software: command level and queue level. A command level job starter is user defined and precedes interactive jobs. A queue level job starter is defined by the LSF administrator and precedes batch jobs submitted to a specific queue.

Job starters can also be used to run a job on a specific processor set, or to start a job inside an application-specific shell.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.32.116