Chapter 9. Processes and System Calls: Breaking boundaries

image with no caption

It’s time to think outside the box.

You’ve already seen that you can build complex applications by connecting small tools together on the command line. But what if you want to use other programs from inside your own code? In this chapter, you’ll learn how to use system services to create and control processes. That will give your programs access to email, the Web, and any other tool you’ve got installed. By the end of the chapter, you’ll have the power to go beyond C.

System calls are your hotline to the OS

C programs rely on the operating system for pretty much everything. They make system calls if they want to talk to the hardware. System calls are just functions that live inside the operating system’s kernel. Most of the code in the C Standard Library depends on them. Whenever you call printf() to display something on the command line, somewhere at the back of things, a system call will be made to the operating system to send the string of text to the screen.

image with no caption

Let’s look at an example of a system call. We’ll begin with one called (appropriately) system().

system() takes a single string parameter and executes it as if you had typed it on the command line:

image with no caption

The system() function is an easy way of running other programs from your code—particularly if you’re creating a quick prototype and you’d sooner call external programs rather than write lots and lots of C code.

Then someone busted into the system...

There’s a downside to the system() function. It’s quick and easy to use, but it’s also kinda sloppy. Before getting into the problems with system(), let’s see what it takes to break the program.

The code worked by stitching together a string containing a command, like this:

image with no caption
image with no caption

But what if someone entered a comment like this?

image with no caption

By injecting some command-line code into the text, you can make the program run whatever code you like:

image with no caption

Is this a big problem? If a user can run guard_log, she can just as easily run some other program. But what if your code has been called from a web server? Or if it’s processing data from a file?

Security’s not the only problem

This example injects a piece of code to list the contents of the root directory, but it could have deleted files or launched a virus. But you shouldn’t just worry about security.

  • What if the comments contain apostrophes?

    That might break the quotes in the command.

  • What if the PATH variable causes the system() function to call the wrong program?

  • What if the program we’re calling needs to have a specific set of environment variables set up first?

The system() function is easy to use, but most of the time, you’re going to need something more structured—some way of calling a specific program, with a set of command-line arguments and maybe even some environment variables.

Geek Bits

What’s the kernel?

On most machines, system calls are functions that live inside the kernel of the operating system. But what is the kernel? You never actually see the kernel on the screen, but it’s always there, controlling your computer. The kernel is the most important program on your computer, and it’s in charge of three things:

Processes

No program can run on the system without the kernel loading it into memory. The kernel creates processes and makes sure they get the resources they need. The kernel also watches for processes that become too greedy or crash.

Memory

Your machine has a limited supply of memory, so the kernel has to carefully ration the amount of memory each process can take. The kernel can increase the virtual memory size by quietly loading and unloading sections of memory to disk.

Hardware

The kernel uses device drivers to talk to the equipment that’s plugged into the computer. Your program can use the keyboard and the screen and the graphics processor without knowing too much about them, because the kernel talks to them on your behalf.

System calls are the functions that your program uses to talk to the kernel.

The exec() functions give you more control

When you call the system() function, the operating system has to interpret the command string and decide which programs to run and how to run them. And that’s where the problem is: the operating system needs to interpret the string, and you’ve already seen how easy it is to get that wrong. So, the solution is to remove the ambiguity and tell the operating system precisely which program you want to run. That’s what the exec() functions are for.

exec() functions replace the current process

A process is just a program running in memory. If you type taskmgr on Windows or ps -ef on most other machines, you’ll see the processes running on your system. The operating system tracks each process with a number called the process identifier ( PID).

The exec() functions replace the current process by running some other program. You can say which command-line arguments or environment variables to use, and when the new program starts it will have exactly the same PID as the old one. It’s like a relay race, where your program hands over its process to the new program.

A process is a program running in memory.

image with no caption

There are many exec() functions

Over time, programmers have created several different versions of exec(). Each version has a slightly different name and its own set of parameters. Even though there are lots of versions, there are really just two groups of exec() functions: the list functions and the array functions.

The exec() functions are in unistd.h.

The list functions: execl(), execlp(), execle()

The list functions accept command-line arguments as a list of parameters, like this:

  • The program.

    This might be the full pathname of the program— execl()/ execle()—or just a command name to search for— execlp()—but the first parameter tells the exec() function what program it will run.

  • The command-line arguments.

    You need to list one by one the command-line arguments you want to use. Remember: the first command-line argument is always the name of the program. That means the first two parameters passed to a list version of exec() should always be the same string.

  • NULL.

    That’s right. After the last command-line argument, you need a NULL. This tells the function that there are no more arguments.

  • Environment variables (maybe).

    If you call an exec() function whose name ends with ...e(), you can also pass an array of environment variables. This is just an array of strings like "POWER=4", "SPEED=17", "PORT=OPEN", ....

image with no caption

Watch it!

Spaces in command line arguments can confuse MinGW.

If you pass two arguments “I like” and “turtles,” MinGW programs might send three arguments: “I,” “like,” and “turtles.”

The array functions: execv(), execvp(), execve()

If you already have your command-line arguments stored in an array, you might find these two versions easier to use:

image with no caption

The only difference between these two functions is that execvp will search for the program using the PATH variable.

Passing environment variables

Every process has a set of environment variables. These are the values you see when you type set or env on the command line, and they usually tell the process useful information, such as the location of the home directory or where to find the commands. C programs can read environment variables with the getenv() system call. You can see getenv() being used in the diner_info program on the right.

If you want to run a program using command-line arguments and environment variables, you can do it like this:

image with no caption
image with no caption

The execle() function will set the command-line arguments and environment variables and then replace the current process with diner_info.

image with no caption

But what if there’s a problem?

If there’s a problem calling the program, the existing process will keep running. That’s useful, because it means that if you can’t start that second process, you’ll be able to recover from the error and give the user more information on what went wrong. And luckily, the C Standard Library provides some built-in code to help you with that.

Watch it!

If you’re passing an environment on Cygwin, be sure to include a PATH variable.

On Cygwin, the PATH variable is needed when programs are loaded. So, if you’re passing environment variables on Cygwin, be sure to include PATH=/usr/bin.

Most system calls go wrong in the same way

Because system calls depend on something outside your program, they might go wrong in some way that you can’t control. To deal with this problem, most system calls go wrong in the same way.

Take the execle() call, for example. It’s really easy to see when an exec() call goes wrong. If an exec() call is successful, the current program stops running. So, if the program runs anything after the call to exec(), there must have been a problem:

image with no caption
image with no caption

But just telling if a system call worked is not enough. You normally want to know why a system call failed. That’s why most system calls follow the golden rules of failure.

The Golden Rules of Failure

  • Tidy up as much as you can.

  • Set the errno variable to an error value.

  • Return –1.

The errno variable is a global variable that’s defined in errno.h, along with a whole bunch of standard error values, like:

image with no caption

Now you could check the value of errno against each of these values, or you could look up a standard piece of error text using a function in string.h called strerror():

image with no caption

So, if the system can’t find the program you are running and it sets the errno variable to ENOENT, the above code will display this message:

No such file or directory

Read the news with RSS

RSS feeds are a common way for websites to publish their latest news stories. Each RSS feed is just an XML file containing a summary of stories and links. Of course, it’s possible to write a C program that will read RSS files straight off the Web, but it involves a few programming ideas that you haven’t seen yet. But that’s not a problem if you can find another program that will handle the RSS processing for you.

Do this!

Download RSS Gossip from https://github.com/dogriffiths/rssgossip/zipball/master. Also, if you don’t have Python installed, you can get it here: http://www.python.org/.

RSS Gossip is a small Python script that can search RSS feeds for stories containing a piece of text. To run the script, you will need Python installed. Once you have Python and rssgossip.py, you can search for stories like this:

image with no caption

Brain Power

Look at the code of the newshound program again and think about how it works. Why do you think it failed to run the rssgossip.py script for any of the other newsfeeds?

exec() is the end of the line for your program

The exec() functions replace the current function by running a new program. But what happens to the original program? It terminates, and it terminates immediately. That’s why the program only ran the rssgossip.py script for the first newsfeed. After it had called execle() the first time, the newshound program terminated.

image with no caption
image with no caption

But if you want to start another process and keep your original process running, how do you do it?

fork() will clone your process

You’re going to get around this problem by using a system call named fork().

fork() makes a complete copy of the current process. The brand-new copy will be running the same program, on the same line number. It will have exactly the same variables that contain exactly the same values. The only difference is that the copy process will have a different process identifier from the original.

The original process is called the parent process, and the newly created copy is called the child process.

But how can cloning the current process fix the problems with exec()? Let’s see.

image with no caption

Watch it!

Unlike Linux and the Mac, Windows doesn’t support fork() natively.

To use fork() on a Windows machine, you should first install Cygwin.

Running a child process with fork() + exec()

The trick is to only call an exec() function on a child process. That way, your original parent process will be able to continue running. Let’s look at the process step by step.

1. Make a copy

Begin by making a copy of your current process by calling the fork() system call.

The processes need some way of telling which of them is the parent process and which is the child, so the fork() function returns 0 to the child process, and it will return a nonzero value to the parent process.

image with no caption

2. If you’re the child process, call exec()

At this point, you have two identical processes running, both of them using identical code. But the child process (the one that received a 0 from the fork() call) now needs to replace itself by calling exec():

image with no caption

Now you have two separate processes: the child process is running the rssgossip.py script, and the original parent process is free to continue doing something else.

Your C Toolbox

You’ve got Chapter 9 under your belt, and now you’ve added processes and system calls to your toolbox. For a complete list of tooltips in the book, see Appendix B.

image with no caption
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.132