Hardware Sensors

Sensors are physical probes that check the health and status of hardware. Manufacturers have put more and more sensors in hardware, providing low-level hardware information to the operating systems. OpenBSD supports a wide variety of hardware sensors, and uses the sensorsd daemon to query them and act upon error states.

Resolving many hardware errors requires shutting down the machine, but advance warning that a component has stopped working changes a hardware failure from an unexpected middle-of-the-day catastrophe to an after-hours annoyance. Some hardware, such as hot-swappable hard drives, can be replaced without interrupting service once you know the hardware has failed.

Device Drivers

Each physical sensor has a device driver. The device driver extracts information from the hardware and publishes it in a sysctl (discussed in Chapter 18). sensorsd reads the sysctl values and can act when they change or cross critical values. For example, here are the sensor-related sysctl values from my laptop:

$ sysctl hw.sensors
hw.sensors.acpitz0.temp0=67.00 degC (zone temperature)
hw.sensors.acpiac0.indicator0=On (power supply)
hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
hw.sensors.acpibat0.volt1=12.35 VDC (current voltage)
hw.sensors.acpibat0.power0=0.00 W (rate)
hw.sensors.acpibat0.watthour0=2.61 Wh (last full capacity)
hw.sensors.acpibat0.watthour1=0.30 Wh (warning capacity)
hw.sensors.acpibat0.watthour2=0.06 Wh (low capacity)
hw.sensors.acpibat0.watthour3=9.57 Wh (remaining capacity), OK
hw.sensors.acpibat0.raw0=2 (battery full), OK
hw.sensors.cpu0.temp0=81.00 degC

This comparatively simple and generic hardware has two temperature sensors and all kinds of power sensors. You can get hundreds of lines of sensor output, depending on your hardware.

Many RAID controllers have their own sensors, and will report when an array has failed. Here, we see three virtual disks provided by an AMI RAID controller:

hw.sensors.ami0.drive0=online (sd0), OK
hw.sensors.ami0.drive1=degraded (sd1), WARNING
hw.sensors.ami0.drive2=failed (sd2), CRITICAL

If you didn’t have sensors, you would need to look at the blinking lights on the drive enclosure. Or you could listen for the really annoying “beep, beep, beep,” which is so easy to hear over the roar of 5,000 server fans, the air conditioners, and someone else’s hardware that has been beeping every time you’ve come in for the last six months.

Note

Some sensors require the Intelligent Platform Management Interface (IPMI). This is a kernel feature that’s disabled by default in OpenBSD, because it makes some machines behave really badly. Chapter 18 discusses enabling IPMI.

The device drivers attach to sensors automatically, and the values get into the kernel automatically, but to do anything with these results in any automated manner, you need sensorsd(8), or you need to configure an external SNMP-based management system and use snmpd(8). We’ll look at using sensorsd(8) here. Using snmpd(8) is discussed in Chapter 16.

Sensor Configuration

The sensors daemon sensorsd(8) watches sensor monitoring data. It logs changes and can execute commands if needed. Because all hardware is different and all environments are different, by default, sensorsd notices changes only in sensor readings. To take action, you must configure sensorsd in /etc/sensorsd.conf.

Sensor Types

OpenBSD supports many types of sensors, as listed in Table 15-2.

Table 15-2. Table 15-2: Supported Sensor Types

Name

Function

temp

Temperature (C)

fan

Fan speed (RPM)

volt

DC voltage

acvolt

AC voltage

resistance

Ohms resistance

power

Wattage

current

Amperage

watthour

Power capacity

amphour

Power capacity

indicator

Device-dependent yes/no

raw

Device-dependent value

percentage

Device-dependent percentage

illuminance

Lighting

drive

Hard drives

timedelta

Time difference between operating system and hardware

humidity

Percent humidity

frequency

Microhertz

angle

Microdegrees

You’ll need to check your hardware manual in order to learn how to use some of these sensors effectively.

Some sensors appear to overlap. For example, why does OpenBSD have all those separate values for power, when you could probably do some math and get a common power gauge? The reason is that these are the values that the actual sensors report, and the developers would prefer to give you the actual measurements. OpenBSD does perform some data rationalization, but only for simple data; all temperature sensors are normalized to degrees Celsius, for example.

Now let’s see what you can do with these sensors.

Settings in sensorsd.conf

The file sensorsd.conf has example entries, but because environments differ so widely, they’re all commented out. It uses a termcap-style configuration syntax, much like /etc/remote (see Chapter 5) or /etc/login.access (see Chapter 6), with colons separating the terms in an entry. Each entry starts with the sensor to be measured, followed by attribute names and settings.

For example, here’s an entry for a temperature sensor in the default sensorsd.conf:

hw.sensors.lm0.temp0:high=50C

For the sensor lm0.temp0, the attribute high is set to 50C.

sensorsd supports four attributes:

  • high . An upper limit

  • low . A lower limit

  • command . A command to run when a limit is crossed or a state changes

  • istatus . Ignore this status

The values reported for a sensor type depend on what makes sense. Where high and low limits make sense for temperature and voltage, some sensors report specific values instead. The RAID controller shown earlier reports drives as degraded, failed, or healthy. A hard-drive sensor that reports a scalar value isn’t useful, as you want to know if a RAID container is healthy or if drives have failed. There’s no middle ground.

You can have both high and low values for a single sensor. For example, whereas temperature might not have a low value in most data centers, voltage certainly will. I work in all sorts of weird places, and not all of them have clean power.

hw.sensors.acpibat0.volt0:low=11.0V:high=13.0V

With a line like this, if the electricity supply to my laptop drops below 11 volts or goes above 13 volts, I will know.

Some systems might have dozens of sensors of a given type, which could make configuration tricky. If my motherboard has 15 temperature sensors, I don’t want to configure each separately. Fortunately, you can configure sensors en masse by type, and since I don’t care which temperature sensor goes above 80 degrees Celsius (if any of them do, I want an alarm), that works.

temp:high=80C

When this rule is applied, sensorsd first looks for a configuration item for a specific sensor. If it doesn’t find that specific rule, it looks for a general rule. You can have one rule for most of your temperature sensors, and then override it for specific sensors, like this:

hw.sensors.lm0.temp5:high=90C
temp=80C

This rule says that most of my temperature sensors alarm at 80 degrees, but one specific sensor doesn’t alarm until 90 degrees.

I care about temperature, but I don’t care if my fancy keyboard sees that there’s no light and wants to trigger its back lighting. You can ignore a sensor, or a type of sensor, with the istatus keyword.

illuminance:istatus

You should categorically ignore certain types of alarms based on your environment and gear. Make up your own mind.

Sensors Triggering Action

Having an entry in /var/log/daemon for when a hard drive fails is nice, but it would be better if the system would send email, page you, or trigger your monitoring system. It should do something—anything—that doesn’t require you to log in and look at a log file. Fortunately, sensorsd can run arbitrary commands upon detecting a problem or crossing a threshold, using the command attribute.

Thanks to the wide variety of sensors and their possible error states and conditions, sensorsd doesn’t have a fine-grained “run this command for an error, but run that other command for recovery.” There are too many possible error states and conditions for this to make any sense. Instead, sensorsd runs a single command upon crossing any threshold or upon any state change, including when it starts up and the state of an individual sensor goes from “unknown” to whatever it starts at.

Consider this sensorsd.conf entry:

temp:high=80C:command=/sbin/reboot

At first glance, this reads “If the temperature is high, reboot the machine.” You think that will unquestionably kill whatever runaway process is saturating your heat-generating CPU (completely setting aside the fact that other hardware besides CPUs generate heat), but sensorsd will run the command whenever the temperature state changes. The state changes at boot time, when the first temperature reading is taken, which means that your system will boot, and then immediately reboot. Your script needs intelligence.

To make scripting easier, sensorsd has a set of variables it can pass to a script:

  • %1 . Is the value within the limit set in sensorsd.conf? This can be one of below, above, within, invalid, or uninitialized.

  • %n . Sensor number.

  • %s . Sensor status.

  • %x . Which device the sensor sits on.

  • %t . Sensor type.

  • %2 . Sensor’s current value.

  • %3 . Sensor’s low limit

  • %4 . Sensor’s high limit.

You might run a temperature command like this:

temp:high=80C:command=/usr/local/script/temp %1 %2 %n

Your script /usr/local/script/temp would take three arguments: the error condition, the temperature, and the sensor name. Your script would check these values and see if a reboot is warranted.

With sensorsd, proper timekeeping, and log file management, your OpenBSD system can largely look after itself.

In the next chapter, we’ll look at how OpenBSD can take care of other hosts.



[41] Hey, I was running out of ways to annoy lasnyder—plausible ways, at least.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.85