Chapter 9. Secure coding

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9 Secure coding

After completing this chapter, you will be able to:

Understand the fundamental rules of writing secure code.
Write defensive code that can withstand attack.
Understand common insecure coding issues and how to fix them.
Implement some effective security testing techniques.

Insecure code

Too much code written today is insecure. It’s not because developers are lazy; it’s often because developers simply don’t know what constitutes secure code. But this is where it gets interesting. Most of the time, an insecure system works correctly and passes all functional tests. So why don’t testers find security issues? There is a reason, and it lies in this statement:

“A secure system is a system that does what it is supposed to do, and nothing else.”

Simply put, when the “nothing else” becomes “something else” is when security issues are found, because it’s typically when a malicious actor takes advantage of a weakness.

Here’s an example: a database system using SQL commands will probably pass all your functional tests, even if it has a SQL injection vulnerability. But an attacker can take advantage of this weakness and cause the system to do “something else,” like view another table or delete data in a table or even the table itself. It’s this “something else” that is a security concern!

Your goal as a developer is to avoid having “something else” conditions in your code. When developing systems, there are two high-level goals you need to keep in mind:

Reduce the number of vulnerabilities in your code.
Reduce the severity of the vulnerabilities you miss.

The first basically states, “Try to get everything right,” and the second boils down to, “But assume you won’t.” It’s a healthy tension to consider when writing code. Everything in this chapter revolves around these two guiding principles.

Rule #1: All input is evil

We have said this for over 20 years, and it is as valid today as it was at the turn of the century:

All input is evil.

This statement transcends programming language, operating system, development process, deployment environment, and everything in between. It applies to Windows services written in C++, Linux drivers written in C, containerized applications written in Go on AWS, C# code running in Azure Functions, and a humble web app written in PHP running on nginx.

Important

The idea is simple: incoming data is at the root of numerous security vulnerabilities, so it is important that all data coming into your code be validated for correctness.

About 20 years ago, in the earliest days of trustworthy computing and the Microsoft Security Development Lifecycle (SDL), Michael was charged with educating new software engineering staff about secure design and development. New employees had to attend a four-hour class entitled “Basics of Secure Software Design, Development, and Test.” The associated PowerPoint deck had about 100 slides of dense material. Slide 52, shown in Figure 9-1, was this somewhat tongue-in-cheek statement right before 50 slides of insecure code examples and remedies.

A PowerPoint slide that reads “There are only two types of security issues: (1) Input trust issues (2) Everything else! — **FIGURE 9-1** Slide 52 from a PowerPoint deck from the early 2000s at Microsoft.

Some of the most common and serious code-level security issues are simply the result of failing to verify that data coming from untrusted sources is correct. For example, the infamous Apache log4j (CVE-2021-44228) vulnerability of December 2021 was an invalid input issue. Examples of input trust issues include the following:

Memory corruption (such as buffer overruns)
Integer overflow and underflow
Cross-site scripting (XSS)
Directory traversal
Open redirects
Server-side request forgery (SSRF)
Cross-site request forgery (CSRF)
SQL injection
XML injection
Canonicalization
OS command injection

We cover some of these vulnerabilities in this chapter, but not all of them, because each one can be remedied simply by verifying untrusted data as it enters your code. Our main focus here is on the core concept that underlies each of these issues: untrusted data.

Important

The core lesson is, if your code accepts input from a potentially untrusted source, it must validate that input for correctness.

Using threat models

You can use threat models to help determine whether incoming data is untrusted. Any data that crosses a trust boundary and is then read by some process must be validated for correctness.

Figure 9-2 shows a small portion of the COVID vaccination booking application threat model from Chapter 4, “Threat modeling.” As you can see, data that comes in from the vaccination center is stored in an Azure Storage account and then read by an Azure Function. The box in this diagram represents a trust boundary. The vaccination center is outside the boundary, while everything else is inside. So, the code in the function must validate that the incoming data is correct because the vaccination data crosses a trust boundary. Each vaccination center might be one of our business and healthcare partners, but that does not mean we trust the data they provide to us!

A partial threat model diagram showing data coming from a vaccination center, into an Azure Storage account and an Azure Function. The Azure Function is reading the data from the Storage account as it is uploaded by the vaccination center. — **FIGURE 9-2** Data entering the environment from a vaccination center. The data crosses a trust boundary and must be validated for correctness.

Understanding who an attacker might be

Keeping in mind that “all input is evil,” it’s important to understand who an attacker might be and what an attacker can control. Let’s look at each of these now.

With regard to who an attacker might be, let’s use a simple example. Suppose you have two REST APIs:

The first API is exposed to the internet, and it has no authentication.
The second API is listening only on a small IP address range and requires modern authentication using your tenant Azure Active Directory as the identity provider.

Who is the potential attacker for the first API? Anyone on the planet! Potentially, billions of people—bored teenagers, nation-states, bots, and everything in between. In contrast, the potential attacker for the second API is anyone coming from the small IP address range who has an account in your tenant’s Azure Active Directory. This might only be 100 or so users, depending on how many accounts you have.

Clearly, the first API is more “at risk.” Data coming into that API could be from anyone—which means there is a high probability that the data will be malformed at best and malicious at worst. So, any code associated with this API must be rock-solid. The second API has a much lower probability of being a victim of malicious input because the population of potential attackers is so much smaller. This doesn’t mean you can write any junk code in the second API; it just means you have a finite amount of time to work with, so you need to spend more time making sure the code for the first API is solid. (Remember, though, that insiders can pose a threat, too, as discussed in Chapter 3, “Security patterns.”)

The level of exposure will help drive how you prioritize your time when it comes to code review and testing. We often refer to this exposure level as an application’s attack surface. You should always try to reduce an application’s attack surface. One way to do this is to reduce network exposure. In Azure, reducing network exposure means moving from public access to doing one of the following:

Allowing subnet or VNet access only—for example, by using private endpoints
Restricting access to a small set of IP addresses or CIDR range, often achieved by using PaaS firewalls and network security groups (NSGs) to restrict incoming requests to IP addresses

You can also reduce the attack surface by increasing the level of authentication and authorization. In the case of the first API, you could move from no authentication to authentication with no authorization or authentication with appropriate authorization.

When looking at data flows in a threat model that cross a trust boundary, it’s important to understand how the lower-trust side of the data flow is authenticated and how actions are authorized, as well as the network accessibility of that data flow. Figure 9-3 sums up the concept of attack surface.

A chart showing increasing attack surface. Low attack surface attributes, such as local access only and string authentication and authorization, are at the bottom left, and increasing attack surface attributes such as public access and no authentication are at the top right. — **FIGURE 9-3** How network accessibility and authentication/authorization contribute to attack surface.

The effects of attack surface can be summed in one sentence: increased attack surface is bad; reducing attack surface is good. This is because an increased attack surface means more attackers and more potential vulnerabilities.

Understanding what an attacker controls

Imagine the following low-level C code:

char dst[4];

This simple code allocates 4 bytes (because a char is one byte) of memory for a variable named dst (for destination).

Now look at this code:

char dst[4];
dst[4] = ‘X’;

The second line seems to write 'X' at the fourth element (offset) in the array. But in fact, it writes ‘X’ to the fifth element, because arrays in C (and any self-respecting programming language) start at 0, not 1. The element numbered 4 is in fact element 5: 0, 1, 2, 3, 4.

This is a classic memory corruption or a buffer overrun issue. But is it a security vulnerability? The answer is a resounding no! Because the attacker controls nothing. It is absolutely a code quality bug that should be fixed, however.

Now look at this variant:

Table of Contents for Chapter 9. Secure coding

Create new playlist

Sign In

Sign Up

Chapter 9

Secure coding

Insecure code

Rule #1: All input is evil

Understanding who an attacker might be

Understanding what an attacker controls

Verify explicitly

Determine correctness

Validating the vaccination center ID

Validating the date and time

Validating the vaccination type

Validating open spots

Validating the comment field, including international characters

What to do with errors

Reject known bad data

Encode data

Common vulnerabilities

A01: Broken access control

A02: Cryptographic failures

A03: Injection

A04: Insecure design

A05: Security misconfiguration

A06: Vulnerable and outdated components

A07: Identification and authentication failures

A08: Software and data integrity failures

A09: Security logging and monitoring failures

A10: Server-side request forgery (SSRF)

Comments about using C++

Don’t write glorified C

Use compiler and linker defenses

Use analysis tools

Security code review

Keeping developers honest with fuzz testing

Generating totally random data

Mutating existing data

Intelligently manipulating data knowing its format

Fuzzing APIs

Summary

Table of Contents for
Chapter 9. Secure coding