© Chris Northwood 2018
Chris NorthwoodThe Full Stack Developerhttps://doi.org/10.1007/978-1-4842-4152-3_12

12. Security

Chris Northwood1 
(1)
Manchester, UK
 

Building systems that are secure is a legal and ethical obligation for software developers, but the world of computer and information security can at first seem like an intimidating hurdle to a newbie, with the community often appearing cliquey and focused on hypotheticals. The reality is that the most secure system is one that is disconnected from the network and switched off, but this system is also unusable. Building a secure but usable system is a challenge that can only be met through compromise.

Like accessibility, security at first can seem like it may require specialized skills to address fully. However, for most applications out there, the special knowledge of a security expert is unnecessary, as you can use existing libraries and patterns for building a secure web application. In larger enterprises, a dedicated security team is common, but can be seen as a barrier to delivery. If you are able to talk confidently about the application and build simple, effective techniques in from the start, these barriers can be dropped and built in to your delivery process, yielding usable applications that are effectively secure.

Most of us aren’t building Mission Impossible-style computer systems, and although there are pieces of critical infrastructure that require greater knowledge than a generalist will possess, most web developers will never come across them. Even for those developers building SCADA systems for nuclear reactors, these concepts are a good start. By learning a few key principles and identifying the limitations in your knowledge, you can build systems that satisfy your security requirements.

Trust and Secrets

During the build of your application, it’s important to be able to spot when you’re working with something that might be a security-sensitive component. As a generalist, if you find yourself working directly with credit card processing, passwords, or encrypting/decrypting systems, you’re in the danger zone. Building these kinds of systems from scratch should not be approached lightly, and is often the wrong thing to do. There are many libraries and frameworks for these common tasks, so when you find yourself dealing with functionality such as this, using a reputable library (reputation can be determined by a number of factors, such as activity of a GitHub repository, or the number of StackOverflow questions about the framework, but this is far from a guarantee) will make your job a lot easier.

Security through obscurity means that you’re relying on the fact that a potential attacker does not know some bit of information needed to access a secured part of your system. For example, having a hidden URL for an admin section is considered security through obscurity. Relying on security through obscurity is very dangerous, but there is a difference between obscurity and secrets. Something that is secret should be very hard to guess, whereas something that’s obscure can be feasibly guessed or known. Most systems rely on some knowledge being hidden from a potential attacker—an API key, a password, or an SSH key—but this kind of knowledge can be considered a “secret” rather than just obscurity.

Secrets should be kept separate from your source code. In many organizations, access to the source code can be quite widespread—perhaps everyone has access to source control and can check out a read-only copy of your source code, or you invite a contractor in for a period of time to help out with some tricky functionality. Keeping secrets alongside the code could allow them to spread more widely than intended. This also means that it should not be the code itself that needs to be kept secret; if the information needed to circumvent security can be found in your code, then this is security by obscurity.

Secrets should be handled like nuclear waste—with care, and in a controlled manner. The people who have access to the secrets should be locked down on a need-to-know basis, and you should also change these secrets on a regular basis, particularly if someone leaves the organization (this is sometimes referred to as key rotation). There are many ways to manage secrets. Some deployment tools allow you to submit secret configuration values (so once the configuration has been set, it cannot be read by the user), which is sometimes appropriate, but other methods including having a separate config file containing the secrets, which is kept outside the main repository (such as only on the live server or in an encrypted shared drive), and can then be deployed from a trusted machine. Another option is to have a service that is responsible for managing secrets that other applications can connect to, and once authenticated, can load those secrets from that service.

Some secrets can be considered to be less secret than others. For example, a default password for a developer environment that can only be accessed over localhost may be safe to have in the codebase for speed of setting up new environments, as long as the database only contains dummy or test data .

Responding to Incidents

A security breach can be one of the highest-impact events to happen to an organization, but it’s impossible to know what the impact of a breach is without a full investigation.

In the UK, and in many other countries, organizations have a legal responsibility to notify any affected users of a data breach, notwithstanding any internal policies and ethical obligations to fully identify a breach and its impact. As with many things in life, the best approach here is to hope for the best, but plan for the worst. It’s important to make sure you have tools in place that will allow you to reconstruct as best you can any actions an attacker took, in order to understand the impact. For most systems, this means comprehensive audit logging.

It can at first be tempting to consider this type of logging to be similar to typical application logging, but it’s useful to separate out audit logs into their own files, as these activities can clog up an application log when debugging typical errors. There are some tools that can assist with audit logging that provide tamper-proofing (so the logs cannot be edited after they have been written), which provides a high degree of confidence in your logs. It’s usually a good idea to record an audit log off the infrastructure you are using, to make it harder for them to be deleted or tampered with.

The exact contents of audit logs will vary based on your application, but you will generally want to capture actions such as accessing or editing sensitive data. This can include the date/time, the user, their IP address, the action that was performed, and detailed information on what was changed. Logging when access was denied can also be useful, as this shows any unsuccessful attempts to infiltrate a system and gives you an idea of how far an attacker got. Full access logs from your web server can be used to identify any other suspicious activities an attacker performed.

Some breaches happen as a result of weak or stolen passwords from a privileged user. Often this occurs using social engineering, which can be hard to defend against, but there are technological countermeasures that can be put in place, such as enforcing long passwords and implementing two-factor authentication.

Many breaches happen through exploiting a bug in the application, and looking at the application log alongside the audit log could therefore be enlightening. Any odd or unexpected exceptions or errors from the database (for example, a malformed SQL query) can help pinpoint a vulnerability.

Once the investigation is complete, the most important thing to do is correct the flaw. This could involve suspending a compromised user account, or pushing a code fix to correct a bug. With the impact of the attack understood, the organization should be able to respond appropriately. Although it can be tempting to downplay an attack, honesty is one of the best mechanisms to maintain trust in an organization, rather than risk a slow release of detail, which show an increased impact from what was originally announced. Similarly, vagueness in any announcement can increase uncertainty. Honesty and specifics are the best policy here.

The Golden Rule

When building a web application, there is a golden rule to follow: validate on input, sanitize on output.

When receiving input from a user, or any external system, it is crucially important to make sure that the input is roughly what you expected it to be—this is validation. Generally, you should not try and clean up data a user has submitted except in very limited circumstances (such as stripping leading/trailing whitespace). This also makes for good UX. Most frameworks offer libraries that help you do this, and if you do get something that fails validation, you should abort immediately and show it back to the user as an error to correct rather than to try to cleverly sanitize it.

Regular Expressions

Regular expressions (regexes) are a popular mechanism for writing validation rules, but you must take care! For example, if you specify that an ID must be lower-case alphanumeric, then you could be tempted to write a regex in the form /[a-z0-9_]+/. This has the unintended side effect of matching anywhere that sequence occurs in a string, not the whole string. If someone then sends through a string in the form valid_slug But This Bit Is Invalid", the regex would match it, as it will match valid_slug, but that does not mean the whole string matches the regex.

Anchors can be used to ensure that the regex matches the whole string. ^ and $ are special characters that indicate the start and end of a string respectively, so you could write a regex in the form /^[a-z0-9_]+$ to match the previous example. However, note that ^ and $ mean “start” and “end” of line, so it might mean that this would match against a string like valid_slug But This Bit Is Invalid, as one line matches. Most regex engines will treat the entire input as one line, even if contains newlines, unless put into a multi-line mode, so it would work the way you would expect. Sadly, this is all implementation specific, so it’s worth reading the documentation for your language before making your assumption. Some languages and libraries (notably Ruby) operate in multi-line mode by default, and the anchors A and z should be used instead of ^ and $ to indicate start-of-string and end-of-string, although they are not supported by all languages.

You should also do your validation in one place. This could be in the controller, or in a specific model class set up to handle user data. If you limit the area of your code responsible for handling user input to that one place, it is less likely that you will make a mistake (forgetting to validate one thing from the input will stick out in the code), and means that other areas of the code are insulated from direct user input and can either assume that the data it is handling is clean or, if full defensive programming is in use, reduce the impact of a missed validation at another point in the stack.

It is important to apply this validation in situations where any external system can submit data, rather than just the user. Simply validating in JavaScript on the client and submitting it to a second service is not sufficient—that service must also validate what it receives, because a malicious user can skip the JavaScript step and craft a request to the server directly. This is a commonly overlooked rule, as only the primary interaction path is considered, missing any other points where data could potentially be injected.

Sanitization, on the other hand, refers to ensuring that any data that is either shown to a user or submitted to another system does not corrupt the structure of the message.

Phreaking

The need for sanitization is a result of mixing user data with “control” data in syntaxes such as JSON or HTML. If the control data and contents could be completely separated (for example, in different files), then sanitization would be unnecessary, and this would eliminate a whole class of errors (although it would make the data harder to deal with in other contexts). Mixing control and user data into one channel is what led to the earliest “hacks,” where phone hackers (phreakers) discovered certain frequencies that, sent down a telephone line via a long-distance connection, would allow them to impersonate a telephone exchange, so the long-distance exchange would trust commands they gave as if they were another exchange. This was fixed by filtering the frequencies in those early telephone exchanges, and modern telephone exchanges do not mix users’ voice data with the control data used between telephone exchanges.

When displaying content to a user that comes from an external source (for example, a message from another user), it is possible for that message to include HTML special characters, which could result in the page being corrupted with bad markup. In the simplest case, this could break the design; in the worst, it could allow JavaScript to steal your user’s cookies. Fortunately, most templating libraries will convert any HTML in a variable into their safe equivalents by default (for example, <strong> will be converted to &lt;strong&gt;) which will execute the correct command—this process is called escaping. It can be tempting to simply remove any “special” characters from the output, but these special characters could be legitimately input by a user, and are sometimes easy to miss. This kind of conversion should always be done by a reputable library or framework; attempting to build your own is a high-risk strategy.

It’s worth calling out cases where some HTML is valid—for example, you may want to allow your users to use limited markup in a forum or comments system. In these cases, special HTML sanitizer libraries can be used, which will permit some tags but remove or reject others. It is often helpful to run these when the user submits data too, as a validation step, to give them an appropriate error message and reject any invalid tags. As with other sanitizers, rolling your own is not recommended. Instead, you should trust a reliable library. Additionally, there are two types of approaches that can be taken by sanitizers: an allow list (sometimes called whitelisting), and blocking (sometimes called blacklisting). Allow-listing permits only certain HTML tags to be used, whereas blocking enforces a list of banned tags. It is preferable to use the allow-list mechanism, as evolutions to the HTML spec could introduce new tags that are undesirable, and would require a blocklist update to ban them. It is also easy to miss tags when taking a blocklisting approach.

As with validation, sanitization does not only apply when dealing with displaying information to the user. Any data that leaves the system should be sanitized. A common anti-pattern occurs when a request to an external system (such as a database, or a JSON request to another server) is made by concatenating strings together yourself. These kinds of requests should always be prepared by an appropriate library, such as a JSON serializer, or a SQL library, rather than directly passing strings in. These underlying libraries will handle sanitization for you and bypass this common class of errors.

The main exception to this is when calling a command line program, where often the commands are passed through a shell. Where possible, you should execute a program directly, rather than through a shell library (if you’re using a single string, rather than a list of arguments, to call a program, it’s likely going through a shell library). However, even in this case, it could be possible for invalid data to pass through. Therefore, careful validation is needed, in addition to sanitization.

There are cases when sanitization is not needed. If your data has been validated and can be trusted (for example, if it is guaranteed to be an integer, or is from a trusted set of strings), then sanitization may be unnecessary, but it’s easier to include it and minimize the risk of a mistake than to exclude it and accidentally introduce a vulnerability.

As with validation, sanitization should always be performed at the edge of a system (for example, in the adapter, which makes a request to an external system, or in the view templates). Sticking to this rule simplifies the design, as there’s only one place to check that sanitization has occurred, but it also avoids the risk of double sanitization (especially in HTML), which can show corrupt data to the user.

Threats

Applying secure programming practices will often get you far, but you will always have to make some security compromises as you build your system. This could be for practical implementation reasons, or for user experience reasons, but it’s impossible to build a system that is perfectly secure. In many cases, this is actually okay! The key is to properly understand exactly what the important parts of your application are in order to appropriately protect them.

Information security is a bit like occupational health and safety. Done right, it keeps people safe with a minimum amount of fuss, but done wrong, it just gets in everyone’s way, or worse, allows for serious accidents to occur. In health and safety, risk assessments are used as a way for organizations to develop appropriate protections. In information security, threat modelling is used.

In a risk assessment, you are supposed to identify anything that can cause harm (a risk; calculate the chance of that risk happening and the impact if it does; and then identify an appropriate way of mitigating against it based on that knowledge (a control). Threat modelling is similar, in that it starts with identifying theoretical security holes and threats in your application, then figuring out the chance of that happening and the impact of it if it does, and then strategizing appropriate mitigation—a way of preventing that thing from happening.

Sometimes, an appropriate mitigation might be to do nothing. If there’s a theoretical risk that would have very little impact, and the chance of it happening is very small, or requires lots of unlikely things to happen, then it might be reasonable to identify the fact it’s possible, and then not actually do anything about it, as the cost of fixing the hole may be unrealistically high. For example, if you are using two-factor authentication to log in to an admin system that is behind a firewall, then a risk exists where an administrator is blackmailed and forced to authenticate, but the risk of this is low and the cost of mitigation high (maybe requiring two people to take any action), so it’s an accepted risk.

There are several frameworks that can be used for threat modelling, but they all follow a similar approach. The first is to break your system down into constituent parts, then identify all the threats towards a particular part of a system. Each threat is then evaluated to determine how important it is to deal with, and mitigations identified.

Microsoft’s threat model is popular, and starts by identifying the exact security objectives of your application. For example, does it protect users’ identities? Or would an attack have direct financial consequences? Availability is also considered a security feature, because if a security flaw can take your web site offline, even if no information is lost, this can have financial impact if you provide service-layer agreements to customers, or if that system powers another that is needed for you to do your business.

Threat modelling also requires a good understanding of your systems’ architecture. Designing a system architecture is discussed further in the Systems chapter, but the architecture should show the boundaries of your system, and be able to be annotated where trust between components exists. Understanding how data flows through your systems (using data flow diagrams) is also useful in understanding which parts of your system may be vulnerable. For working with other components of a distributed system, interface contracts can also provide this knowledge. With this understanding, you can now try to identify the threats that could affect each part of the system. When threat modelling against a running system, you should approach this assuming you have no existing mitigations in place, to check that any assumptions you have made previously are valid. Although taking an unstructured approach to this can result in some threats being identified, there are more effective approaches you can use and combine here. The first is to think about it the way you think about your users, and consider the different types of attackers you may face: script kiddies (who try a bunch of common exploits automatically), disgruntled ex-employees (who have inside knowledge of your system design), normal users (who may stumble across something by accident and decide to explore), or determined attackers (including organized criminals or state actors who may have significant resources).

You can also consider different types of attacks. Microsoft introduced the mnemonic STRIDE to remember some common types:
  • Spoofing identity: where a user can somehow impersonate another user or log in as another user to take on their characteristics.

  • Tampering with data: where a user can manipulate data that they should not have access to (often caused by validation failures).

  • Repudiation: where an action cannot be reliably traced back to a user—for example, in an e-commerce app, a user may claim they placed an order and paid more than they actually did.

  • Information disclosure: when information that should be private is revealed to someone who should not have access to that data.

  • Denial of service: when the application is flooded with requests, or run to capacity, stopping other users from accessing it. Oftentimes, limiting how much of a system a particular user can access, or limiting long-running searches, and the like, can mitigate these.

  • Elevation of privilege: if different users have different layers of privilege (for example, administrator and non-administrator roles), then the application should ensure that a logged-in user cannot get access to any more functionality than they should have access to, by doing authorization checks, for example.

Once these aspects have been considered and a comprehensive list of threats identified, then you will need to rate them to determine if they need addressing, or if your existing controls are suitable. A simple way of doing this is for each threat to be categorized according to the chance of it happening—low, medium or high—and the level of impact it would have—again, low, medium or high. This is the approach outlined by the NIST guideline 800-30, on IT risk management. You can then assign each threat an overall risk and determine the priority of addressing each one. Figure 12-1 shows a common way of doing this using a matrix, and then using the overall score to determine overall risk, shown by how red that part of the matrix is.
../images/471976_1_En_12_Chapter/471976_1_En_12_Fig1_HTML.png
Figure 12-1

Probability vs. Impact matrix

Another common mnemonic used here is DREAD, where five different attributes are considered and then a score assigned. Often, a score of zero to ten is used, although any scale, as long as the same is used for all attributes, will work. The five attributes are:
  • Damage: What will the negative impact of an attack be? Zero is often “none,” and ten may be something like the business going bankrupt.

  • Reproducibility: How consistently can the attack be executed, or does it require a number of constraints outside the control of the attacker to be right before it will work? Zero indicates that no one, not even the developers, can reproduce this attack against a running system (so it is theoretical), and ten that the attack will work every time.

  • Exploitability: How hard is it to actually do the attack? Zero means that no one possesses the knowledge or skills to be able to execute the attack, and ten means that a layperson would be able to stumble upon the attack during normal system use.

  • Affected users: How many users will this attack affect? Zero is none, and ten is all.

  • Discoverability: How easy is it for an attacker to find out about this threat? Zero indicates it’s impossible to determine in a running system, and that only reading the source will reveal the threat; and ten that it is obvious to even a casual attacker or by the use of automated tools.

The “discoverability” aspect is often the cause of some controversy, as it is felt that relying on something being hidden is not suitable, and as a result some threat modelling practitioners will set discoverability to ten (or the highest level), or discount it completely. The overall score for a threat is the sum of these attributes (or sometimes an average), which allows the highest-priority issues to be identified first.

Once you have identified each threat, you can then design mitigation for them. Some of these mitigations are a part of good code cleanliness, such as validation, and will often not require any extra work, but others may require further effort.

In many safety-critical systems, there’s a saying: it’s not the first mistake that kills you, but the second. Of course, ideally a system should never kill you, but making two mistakes in a row is rarer than making one, so ensuring that a mistake that removes one layer of safety is in itself not lethal is good enough. The same is true in security—relying on just a single layer can be dangerous, and you should always behave as if an individual layer is capable of being exploited.

This is known as defense-in-depth, where multiple layers of security are used. For example, instead of your API servers relying on only being accessed from a private network, perhaps enforced by firewall rules, you should also consider using API keys or another mechanism. This gives you an additional layer of protection in case a firewall ends up incorrectly configured, or a vulnerability in another server gives an attacker access to your private network. Threat modelling is a great way to identify where additional layers will be of most use. When you model threats, it can also be useful to identify scenarios where there’s been a partial compromise of a system you rely upon, and then identify where additional layers could be of most use.

Security Checklists

Checklists are a powerful tool to help verify consistency and remind you, and new developers to the team, of all the risks involved in software, especially if they’re not obvious.

In Kanban, checklists form the entry/exit criteria for a column, and in Scrum, the definition of done. For code reviews, a checklist (often automated) is useful for the consistency of a discussion. Security checklists are the same, as they force you to think, even if only briefly, and confirm that you have addressed any relevant security issues. Security professionals have produced “Top N” lists for common types of security vulnerabilities, and simply familiarizing yourself with these lists and iterating over them will address the most common types of errors.

The two most popular lists are the OWASP (Open Web Application Security Project) Top 10, which is focused mostly on web development, and the CWE (Common Weakness Enumeration) Top 25, which is more general for all types of software. Like threat modelling, each risk identified should have an appropriate control placed against it to ensure that the risk is either mitigated or irrelevant. For example, the CWE Top 25 includes CWE-120: Buffer Copy Without Checking Size of Input (“Classic Buffer Overflow”). In many modern web languages, this isn’t relevant, as you never have to manage buffers by hand, so the mitigation/control is that it’s handled for you by a trusted lower library.

In the sections below, the OWASP Top 10 risks are discussed, with a brief overview of what each flaw means to you, and how to protect against them.

Injection

Injection attacks are the single most important type of security flaw that can be introduced into an application. They make the very top of the OWASP Top 10 and are the top two of the CWE Top 25 for a reason. Injection attacks occur when untrusted data gets mixed with trusted data—for example, including search terms in a SQL query that can be interpreted as part of the structure of the SQL statement. By following the “golden rule” discussed previously in this chapter, and using appropriate libraries and techniques to construct these requests, you can avoid this class of error.

Injection attacks can cause many types of damage—for example, they can be used to execute a SQL query that will return results for another user, or even cause data loss as the result of a malformed UPDATE or DELETE query.

Broken Authentication and Session Management

Badly implemented login and session management features can cause a lot of damage. A badly implemented login feature might simply check a username and password on login and then set a cookie “Logged_In_User: chris.” The rest of the application blindly trusts that this cookie was set by the application, but an attacker could set it by hand without knowing the password, and gain access to someone else’s account. Similarly, cookies that indicate a level of access should be avoided.

You should instead use an authentication library or framework that can reduce the risk of writing your own. These normally follow a recommended approach of having a session hash or token, which references a database (that only the application can control) containing that important information. This means that a stolen cookie could be used to impersonate a user, but stealing a cookie is generally considered to be difficult enough to successfully mitigate this risk. However, some other common vulnerabilities, if successfully attacked, could be used to steal a cookie, so it is not foolproof.

An alternate approach is to make a cookie that has a cryptographic signature signed by a private key only the server knows, so the server can verify that it was the one that set that cookie. These tokens should either expire or be updated regularly in order to minimize the impact of a token being stolen. Sometimes tokens can be made into session cookies, which means they only live as long as the browser is open—otherwise, the user has to explicitly log out. A common way of avoiding a user having to log in too frequently is to require a session cookie for any sensitive functionality (like making a payment) but have a longer-lasting cookie for general browsing, to allow things like personalization to occur.

Regardless, most languages have many libraries that implement this functionality, and for most sites, there is not much of a reason to implement it yourself rather than use an existing solution. Implementing your own should be considered an advanced activity.

Another way you can accidentally introduce this vulnerability is to include a session token or unique identifier in a URL as a query parameter. This could mean that copying or sharing a URL could accidentally allow a user to let their friend log in as them, or for corporate proxies to accidentally log the URL. Session tokens are generally considered sensitive information, so should only be served on an encrypted web site.

Password reset mechanisms should be carefully designed too, in order to stop people from using them as a way to bypass a login process, with the login process itself having defenses against brute-force attacks (for example, limiting the rate at which users can make requests, or locking accounts after a certain number of bad requests).

Cross-Site Scripting (XSS)

Cross-site scripting is a form of injection into the structure of the page that can cause arbitrary scripts to run in your user’s browser. Like injection, the golden rule applies here in order to address this concern.

Web browsers have a security model known as the same-origin policy. The core of this policy is that JavaScript code only has access to web pages that are in the same domain (e.g., www.example.com ) as the one that loaded the JavaScript. For example, you cannot read or set cookies for a different domain, which can stop credentials from being stolen, nor can you load an iframe for another site and then execute code on that page. With XSS, this means that an attacker can make code run on your domain, and therefore get access to credentials they shouldn’t, and perform activities the same-origin policy would normally block.

The three attacks above can be combined to cause even more damage. For example, injecting malformed content into a database can result in an XSS attack on users visiting the web site. XSS can be used to steal authentication tokens, or cause fake actions to occur, if it is allowed to occur on an administration screen where a high-level user is logged in.

Insecure Direct Object References

Insecure direct object references are a type of error that occurs when an “object”—for example, an HTML page or an image—does not properly check whether or not that user is allowed to access it.

This can often occur with things like user uploaded content, or generated PDFs. For example, if you are building an invoice system that generates PDFs, then it may seem sensible to upload these PDFs to a place where the user can download it and give a link to the PDF in the page. There is a basic mistake to make here: if your URL is, for example, https://invoice-downloads.example.com/invoice0042.pdf , then the user might be able to guess that other invoices perhaps exist at https://invoice-downloads.example.com/invoice0041.pdf and get access to another user’s information simply by changing the URL.

The importance of protecting these kinds of objects depends on the type of application you have. For low-risk applications, then simply having random URLs might work well enough, or some object stores allow generating URLs which are signed with a time-limited hash that offers a greater level of protection. Often the simplest thing to do is set your assets to be served by your application and use the same type of authentication and authorization you use for your regular application code.

The pages that make up your web site should also be considered objects. For example, in 2011, the bank CitiGroup allowed people to see other people’s accounts by changing the account number in the URL once they had logged in.1

If something has a URL, that means it is accessible, so when implementing your security, you should consider what authentication and authorization for that URL (and any query parameters it might take) will need.

Security Misconfiguration

Security misconfiguration can occur when you are using a tool or library that has been incorrectly configured and can leave yourself open to vulnerability. For example, an off-the-shelf password may have a default password which must be changed, then forgetting to do so would introduce a vulnerability. Issues such as incorrect firewall configurations would be classified as this kind of vulnerability too.

Another common issue relates to debug modes in frameworks. Often, with these turned on, when an error occurs, you will get a stack trace or some other debug-type information. Often, this debug detail can give away information that is helpful to attackers, such as the source code or usernames and passwords for databases.

Sensitive Data Exposure

Sensitive data exposure vulnerabilities relate to how you manage any data that can be considered sensitive. It is important to define which data is considered sensitive, and then determine a way to protect that data. Data can be considered to be “in-transit” (moving between systems), or “at rest” (stored on disk).

For example, if you are using session cookies in order to manage login, then those cookies should be considered sensitive, because if someone got ahold of them, they could impersonate that user. At rest, the session token is stored as a cookie on the user’s machine, and in your database. Appropriate access controls on the database may be sufficient to protect the token (if the database is leaked, encrypting the session token is no help, as the user’s information has already been exposed), and it may be sufficient to assume that the user’s browser and cookie store is already sufficiently protected. The session token is also sent in transit as part of a cookie on the HTTP request, where it is more vulnerable. Coffee shop wi-fi is often “open” wi-fi, which means that anyone on the same network can see what other users are sending. If your cookies are sent over plain HTTP, then that leaves them exposed in transit. Using HTTPS encryption and marking your cookie as secure will avoid this use case and will protect your cookie in transit.

In another example, a user database is often the store for lots of sensitive data. Passwords are one example of this, and the safe storage of passwords is discussed later in the chapter. The law often defines “personally identifiable” information to be sensitive, and care must be taken with this data especially. This includes things like National Insurance/Social Security numbers and dates of birth. The simplest way to handle this information is to simply never store it, but if you do, you should make sure it is handled appropriately.

Again, data may be in transit in many directions—between your web server and your database, or to a backup server—and these should be encrypted. Most database servers also support encryption at rest, and it is important to make sure that this applies to any backups too. It is especially important with backups to make sure that you don’t store the decryption keys next to the database!

Missing Function Level Access Control

Function level access control refers to checking permissions at a fine-grain level within an application. For example, it is not sufficient for someone to need to be logged in to access all features, and if a user does not have access to something, it is considered good practice to hide it in the UI. However, restricting someone from doing an action in your front end is not enough; you also need to do a check on the server side, too.

Missing function level access control occurs when someone who is logged in performs an action they shouldn’t be able to do because the server does not appropriately check that they have the right to do so. That is, the user is authenticated (we know who they are), but not authorized (we do not know what they can do). For example, a feature for creating a new product in a catalogue may only appear for users who are logged in as an admin, but if the AJAX endpoint didn’t also do that check on the server side, then any user could make an AJAX request to that endpoint directly, skipping any front-end checks.

There are many libraries to help manage authentication and authorization, and these kinds of errors often creep in when you simply forget to do a check. Test-driven development around authentication, and code review, can help protect you against these kinds of errors. If your libraries and frameworks support it, it is a sensible practice to default your endpoints to reject use until a specific check is in place.

Cross-Site Request Forgery (CSRF)

Cross-site request forgery occurs as a result of the same-origin policy that web browsers use for security. In a CSRF attack, the user visits a malicious web site that makes a request to your web site, for example, by tricking them into clicking on a form, or loading it via an <img> tag. Although the malicious web site can’t get access to your user’s cookies, when it triggers a request, the browser does send cookies along with it, so if your user is logged in to your web site, then it appears they’ve made a valid request, and that action is triggered.

The simplest solution to this is to check that the Origin or Referer header exists and is set to a domain or page that that type of request is expected to come from.

A more comprehensive solution is also available. In a good RESTful design of your API, any potentially damaging/malicious action should be hidden behind a POST request, rather than a GET request , so only POST requests (form submissions) need to be checked. A hidden field can be included in the form, which is sent to the user including a unique token. The server can then check that the token it gets back matches the one that was sent to the user (so it is known that the expected form generated the request), and this token can be either stored in the session or in a cookie that is also sent to the user.

Using Known Vulnerable Components

This kind of vulnerability occurs when a dependency you are using is not kept up to date. Managing dependencies, making sure they’re up to date, and being aware of any security issues remains a challenge due to the fragmented nature of development: there isn’t a universal standard for reporting these, nor a central database or set of tooling for becoming aware of a vulnerability in your dependencies. There do exist tools that are focused on particular domains, and including them in your build chain and having an approach is valuable. The Indirect Attacks section below goes into more detail about how to protect against this kind of attack.

Unvalidated Redirects and Forwards

Sometimes your web site may need to redirect the user to another page on your site—for example, a success page after a form submission. Some of these redirects may use user input to build part of the URL, and these are vulnerable to a type of injection attack. Unvalidated redirects are dangerous because the user can click on a link that appears to be valid (as it’s a URL on your site), but ultimately can end up elsewhere due to injection. Again, applying the golden rule will resolve these kinds of issues.

A similar issue is when parts of your application take a whole URL as a parameter to redirect back to. Login pages are a common place where these vulnerabilities can be introduced. For example, if I visited https://www.example.com/my-account as a user that is not logged in, I may get redirected to https://www.example.com/login?returnto=https://www.example.com/my-account . This redirect is fine, as none of the values are directly designated by the user, but after logging in, the login page may simply redirect to the returnto parameter without validating it. If a malicious user tricked another user into following a link to https://www.example.com/login?returnto=https://www.mybadsite.com/ , perhaps in a phishing attack, a user may be lulled into a false sense of confidence because the link is your site—but they end up elsewhere. This is another example where the golden rule can be applied—the returnto parameter should be checked to make sure the URL it is redirecting to exists on a domain you control.

Passwords

When it comes to identifying users, there are two main approaches you can take: asking them for some information that only they know, or asking them to prove that they have something that only they can have. Passwords are one way of doing this, as they are information that, in theory, only the user knows. However, passwords have become one of the worst security patterns in widespread use.

One issue with passwords is that the only way to check whether or not someone knows a particular secret is if you yourself also know the secret. In order to do this, we need to store the password so we can check that the password the user gives us is correct. This is risky, as it means that anyone who has access to the database (either legitimately, or through a hack) now has the passwords for every user and can impersonate them. In order to deal with this, we can hash the passwords that we store. Hashing is a process similar to computing a checksum, where you take your data and run it through some mathematical functions to generate a hash (normally a very large number expressed in hexadecimal). Because hashes involve throwing away information, it is impossible (for an unbroken hash function) to go back from the hash to the original password. Your application now needs to take the hash of the password the user gives you and compare that to your hash, without having to store the password.

Simply hashing is not good enough to protect against database leaks, though. Although it’s impossible to reverse a hash, an attacker could take advantage of the fact that most passwords are dictionary words, and instead simply compute the hash of every word in the dictionary and then check if any hashes match. Furthermore, as the same word will always hash to the same value, if several people share the same password, they will have the same hash, which can give information away. We can work around this by salting the hash. Salting works by taking the result of hashing the password, adding salt to the end of it, and then hashing it again. Salt is a randomly generated string that is stored alongside the password and is unique for every password in the system, rendering this kind of dictionary attack more difficult, as an attacker would have to try generating different hashes for each salt, rather than trying all the hashes in a database at once. Peppering is a similar concept, except the additional value is a secret and is the same for every user, but not stored in the database (usually it is hard coded into the application, or applied as a run-time configuration option). Using salt and pepper together is common.

The final thing to consider is which hash function to choose. Hash functions have been in use for a long time to check that file transfers are correct (if you send the file and the hash, then if the hash of the file doesn’t match the expected one, then some corruption has occurred). For these use cases, speed is important, as you could be hashing many megabytes of data. For passwords, though, the opposite is true. As passwords are short, even if passwords are salted, they can still be cracked by trying many words until a matching hash is found. If your hashing algorithm is slow, this can slow down this process significantly, so choosing a slow hash is more secure. Of course, there is a performance trade-off, so your hash function shouldn’t take seconds to complete, but perhaps 100ms. The type of hashing functions that satisfy these criteria are called key-stretching algorithms , and include algorithms such as bcrypt and scrypt.

There are other ways of protecting your database against dictionary attacks, but they are flawed. One common way is to require your users to complicate their password by including numbers, capital letters, or special characters, but this comes at a significantly increased user overhead of having to remember the variety. Conversely, sometimes web sites put maximum length restrictions on passwords in order to ensure users pick something easy to remember. You will want to include a maximum length, because password hashing algorithms take longer on long passwords, so an attacker can try using passwords that are several megabytes in size which could take minutes to hash and cause your site to crash—but make the maximum length very long. You should also ensure that your user can use any character in a password—even emojis should be valid in a password!

The most effective restriction to put on a password is minimum length. Long passwords are much more secure than short ones, without necessarily putting an undue burden on your users. An alternative name for passwords is passphrases—using this word encourages users to use multiple words or even a whole sentence as a password.

Even if you appropriately secure your database against leaks, there is a much bigger risk with passwords. It is very common for users to use the same password across multiple sites in order to reduce the burden of remembering them. This means that if another web site, unrelated to yours, gets hacked, and your user used the same username and password, then a hacker can now log in to your web site, even though your code remains secure. This is the fundamental issue with passwords, but there is no perfect alternative; just a number of alternatives and solutions that each have their own tradeoffs.

There are many ways to avoid this problem though. One is to avoid using passwords entirely on your site, and instead ask users to log in using an account they have on another site. This can be very effective for an organization’s internal tools, where you can link into a central logon system for that organization, but essentially moves the problem around. It often offers a better user experience too, as users only need to sign in to one site and many others can piggyback off that, rather than having to enter your username and password multiple times. For public-facing web sites, social media web sites can provide this functionality, but not all of your users will necessarily have an account on those sites, so a traditional username/password fallback (as well as supporting multiple sites) is also needed. Some sites use a novel alternative to this, by e-mailing you a link containing a token that allows you to log in. This has issues though, as the e-mail account becomes a single point of failure, and if you lose access to it or the e-mail account is hacked, then your user loses the ability to log in to that web site. The process also isn’t always smooth, as users have to navigate to their e-mail to log in.

The other alternative to passwords is to instead use something that the user has in order to validate their identity. This can be a virtual or physical token. TLS client certificates are useful here too, as a virtual option, and are common in large enterprises. Users may be more familiar with devices that display a code that changes over time. These devices are synchronized with the server when they are first issued and then pseudo-randomly generate a new number using an initial “seed” (computers cannot generate a truly random number, but instead generate a “pseudo-random” number by applying mathematical functions to a known seed, or start state. These pseudo-random numbers should be impossible to predict without knowing the start state). The number is then entered as a second factor. These tokens are common in online banking and remote access or VPN connections for larger enterprises. However, having to have a physical token for each site is also very unwieldy. Instead, a common algorithm known as TOTP (time-based one-time passwords ) has become widely used, and allows for multiple sites to use different seeds but have the same process for generating a valid password for “now.” Multiple sites are then added to an app, which generates the current valid number for each site. There are also physical tokens, such as Yubikey, that work in a similar way but do not require the user to enter the number themselves, as it communicates directly with the site. One of the biggest constraints in this system is that the time on all the devices must be correct (or within a few seconds). Otherwise, the generated and expected numbers will not match.

However, the issue with using something a user has is that if it is lost, the user is locked out of their account, and if it is stolen, then the thief can use it to log in. Instead of solely relying on this method, it is increasingly common to combine this approach with passwords. This approach is known as “two-factor authentication ,” where the user has to give their password as well as prove they have the physical token. These two factors combined cancel out the negatives of the other factor, but have the downside that forgetting the password, or losing the second factor, can result in your being locked out of your account. The login process is also slightly longer and more inconvenient, resulting in a usability trade-off. Two-factor authentication is more commonly used to protect especially sensitive systems, such as bank accounts.

If using passwords is unavoidable, you will also have to deal with the inevitability of users forgetting them. Some mechanism is therefore needed to reset a user’s password. For small teams, this could be as simple as having a developer’s override and the user approaching you directly, but this does not scale, especially when members of the public use your application, so a common solution is implementing a way to let users reset their own password. Care must be taken when designing a solution for this, though. There have been some famous hacks of individuals resulting from a vulnerable password reset mechanism. A system is only as strong as its weakest link, so even if your normal password scheme is very strong, a weak password reset mechanism will leave your users vulnerable to being hacked by that approach.

A common example of a password reset mechanism is to ask users to register “secret questions” from a common list—such as their mother’s maiden name, or anniversary dates—during sign-up, the idea being that these are relatively easy to remember. When a user forgets a password, they then need to answer these questions to prove who they are, but the problem is that the answer to these questions can often be guessed or figured out based on someone’s public record. These essentially become very weak versions of a password. Another approach is to use an alternative means of getting in touch with the user, such as sending them an e-mail, calling them, or posting them a letter to verify their identity before allowing them to reset their password. This can be slower, and have a cost implication, especially for sending a letter, but is more secure than using the questions alone. It is not infallible, though. Like the “login-via-a-link-to-your-mail” alternative to passwords above, if someone has control of your e-mail address, then they can also take control of any account that is linked to it. In targeted cases, it can be possible to “steal” a mobile phone number too. This also assumes, of course, that the user has kept their contact details up to date.

Password resets get even more complicated when using TOTP second-factor-authentication-like solutions. If resetting a password allows you to bypass the second factor, then that will ultimately weaken the whole system. A common solution is to provide users with a set of “backup codes” that can be used as an alternative to a TOTP code in case the main device is lost. This requires the user to keep these safe, though.

When building password login systems, the final thing to take into consideration is “brute-forcing.” Brute-forcing is a mechanism attackers use to try to hack a site. In a brute force attack, an attacker will constantly try many different passwords for a user, until one eventually works. You should build in protections against brute forcing. One way to do this is to prevent a user from attempting to log in after a number of attempts, either for a set period of time, or until manually reset by an administrator, but this can be inconvenient for a user too. Another mechanism might be to temporarily ban an IP address after a number of unsuccessful logins. (for example, only let an IP try to log in five times every 60 seconds). Another type of brute-forcing attempt involves trying to figure out if a user has an account on a service, or which e-mail address they’re using. When building a password reset mechanism, it can be helpful to the user to know if an e-mail address or username they’ve entered was invalid or not. However, this can give an attacker useful information that can help target an attack, so usability and security must be carefully balanced, depending on how valuable an individual user’s account is.

While passwords are far from perfect, they are ultimately unavoidable. There will be times when you will have to manage passwords during the development process for parts of your development systems. For this, using a password database is a modern best practice. As the name suggests, a password database manages passwords, allowing you to keep a unique password for each site, and they can be long and complex, as they do not need to be memorized. The databases are encrypted, and there are two main types of apps for this: ones that manage files, where you either keep the file locally, and ones that manage synchronization between devices. The first is ultimately more secure, but at the expense of convenience. When working on a development team, you will often need to store some shared passwords, perhaps for root accounts. Day-to-day use of shared passwords should be avoided, as it’s hard to keep track of who does what action, and if someone leaves the team, you will need to change the password to ensure it is secure. Wherever possible, you should have an individual account per user, but often some sort of “root” password is needed. A password database for your team becomes a useful way of sharing these secrets.

Indirect Attacks

Although direct attacks on your infrastructure may seem like the most obvious to protect against, they aren’t the only ones. Many attacks on software you write will be by automated tools and scanners that try common techniques to detect SQL injection, XSS vulnerabilities, etc., or by scanners that detect known vulnerabilities in popular software and libraries.

Following the best practices above will protect you against common types of security error, but it is incredibly rare (and also a bad idea) to write your software from scratch without using any libraries or frameworks. Every web site out there relies on some code written by someone else, whether that’s a useful library, framework, or programming language, or tools like Apache HTTPD Server, Varnish, or even OS-level utilities like network drivers or an OS kernel. For every dependency you introduce, you need to consider its security impact. This gets even more nightmarish when you’re doing modern JavaScript development, as it has become common for many transitive and nested dependencies to be introduced, and it becomes unrealistic to effectively audit them all.

When introducing a new dependency or library, you should take the time to think about how you know you’re not accidentally introducing a vulnerability. Gut feelings can be useful here—does the project look like it’s maintained and has an active community around it (or, if you’re purchasing it, do the vendor seem aware of security)? The idiom “many eyes make all bugs shallow” is popular, but when a vulnerability in a large open-source software occurs, it is often higher profile, so carries a higher risk. On the flip side, a smaller project may not have had as much scrutiny applied to it from a security perspective, so could potentially have undetected vulnerabilities. The potential trade-offs often depend on the context too. A front-end library for animations is much less likely to have vulnerabilities as compared to a library that is responsible for validating form inputs.

You should also think about how you will stay aware of security vulnerabilities. High-profile vulnerabilities often make it into the specialist tech news sites and communities, so you may hear about them by osmosis, but smaller ones do not. Frequently checking for out-of-date dependencies (e.g., using npm outdated) and updating them to the latest patch releases can be effective, and there are specialist services that will check your dependencies to see if you are relying on known vulnerable versions. Many OS vendors have a security feed or tool you can use to see if any packages with known issues are installed. Automating these once as part of your deployment pipeline can make your life a lot easier in the future and, if you’re working within a large enterprise, can often quickly help you make friends with a central infosec team.

Although the advice above will protect you from most types of attacks, there also exists a rarer, but scarier, risk, which is a targeted attack. Although an attack can be targeted directly at your application and the infrastructure in production, a well-written and well-configured application will stand a good chance against such an attack. Even though lower-impact vulnerabilities may still be found, there is a way to get the highest level of access to your application and data, which is the ability to alter the code running on your server directly. Tools like your Git repository, your CI server, or deployment tools are very attractive targets, because if someone can take control of those, they have complete control over your infrastructure.

There have been a number of high-profile attacks caused by outdated or inappropriately secured Jenkins servers, as well as a number of lower-profile but very damaging attacks undertaken by disgruntled former employees. Similar to a “fire drill” used to discover the failure modes for your application for operations purposes, you should run exercises to figure out all the ways someone malicious could push code into production. Common vulnerabilities include a Jenkins server without proper authentication on a “hidden” URL, a shared SSH account for deployment where the passwords aren’t changed when people leave the team, or leaving people on a GitHub repository after they have left the organization.

Securing the way code is put into production is as important as securing the code itself, and you should take care when managing access to code repositories and control of your deployment tools. Managing backups becomes a security issue, too; if you are accidentally backing up database credentials, or allow your backups to be overwritten when they are created, you can give an attacker the ability to create unrecoverable damage using vectors you can’t even think of.

The “disgruntled employee” can be one of the worst threats to an organization. It’s relatively common, and employees are given a large amount of access without a lot of trust. Sometimes, these employees also have the ability to cover their trail. Audit logs are important so you can trace actions undertaken by a user, and should be protected so they cannot be deleted. Another risk is for users who use Git. Git allows you to alter history or fake the names of committers, as they are just metadata. If you are using Git around a site like GitHub, then you can enable GPG signing-of commits to verify that the author is correct.

Even if you secure your build pipeline and minimize the damage a single disgruntled employee can do, and then apply best practices in building your software to minimize the risk of a drive-by scan or even a targeted attack finding a vulnerability, your dependencies can still introduce a vulnerability. Fortunately, in addition to being rare, this is easy to mitigate against, so it makes a lot of sense to do it. If you have a dependency with a less-protected build chain, then it may be feasible for a particularly determined attacker to push a malicious build of that dependency, and for your build server to download that and bundle it into your code. For many commercial or organization-run package repositories (like the Ubuntu or Red Hat repositories), only trusted users can push builds in, and they have very secure build chains. However, for community-run repositories, like NPM or PyPI, the provenance of a package is less clear. If a developer’s credentials are compromised, then the attacker can push a new version up immediately. Often, semantic versioning (“semver”) is used to allow your build tooling to automatically pick up patch releases of a dependency without input from you, but this leaves you open to this kind of issue (it’s also worth noting that it can leave you open to accidentally introducing bugs—semver is a nice idea, but is still fallible!). Most package managers support version locking (for example, PHP’s Composer with composer.lock, JavaScript’s NPM with shrinkwrapping, Python’s Pipenv with Pipfile.lock, and Ruby’s RubyGem with Gemfile.lock), and most of the community package repositories do not allow you to republish something with the same version as existed before. For those that do, you may want to put a local cache of packages between the build server and the upstream repository. This is a normally a good idea to do anyway, as you can guarantee a known “good” version will not suddenly become bad, and it will make your builds resilient against downtime of the remote package server.

Summary

The Web is becoming an increasingly hostile place, with more and more attacks—some targeted and others more scattergun—becoming a real challenge for many organizations. Building public-facing web sites and apps presents a potential weak point for attackers to break in. You have an obligation as a full stack developer to build appropriately secure applications, and building security in at a foundational level is the best way to achieve that.

Security not only covers bugs that can inadvertently give an attacker access to information or systems that they should not, but also intervenes in the design phase to determine where trust lives in a system, and how to execute careful management of secrets such as passwords or user data. This trust also includes third parties—for example, if you use dependencies from public repositories, or include JavaScript on your site directly from another domain, are you sure that domain cannot be compromised?

You should also plan for the worse case, so in the event a security incident does occur, you have processes in place to handle and minimize the impact. Analyzing your system and designs for security issues is also an effective way to build secure systems, and there are systems that help do this. When dealing with theoretical risks, a likelihood/impact analysis can determine which risks are worth fixing.

A system is only as secure as its weakest point. Security by obscurity is often maligned, but this means that security can not only be through obscurity, and in fact obscurity can be part of a layered approach that is effective. This means you should not rely on only a single layer of security, unless, for example, databases could be protected by firewalls as well as passwords. This includes your infrastructure and development environments, as this can often be a way in for attackers if not appropriately secured.

When it comes to secure coding, the golden rule is to validate on input to avoid corrupt data coming into the system, and to sanitize on output to avoid showing any damaging data that can inadvertently corrupt your UI. Checking new features against security checklists is also a good way to ensure you’ve thought through possible issues.

Building a system that uses passwords also takes special consideration, as passwords are a weak point in many security systems, especially password reset mechanisms. There are alternatives, but a common approach is to combine a password with a second factor (combining something someone has with something someone knows) to ensure that someone really is who they claim to be.

There is no silver bullet to security; it’s a patchwork of small things that make a whole, so beware of tools or organizations that promise to solve your security problems. Security has to be a fundamental part of your product. The main exception to this, where you may want to bring in specialized outside help, is penetration testing. Finding vulnerabilities using external analysis is an important and learned skill, and a good penetration tester should work with you to discover that.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.216.174