Chapter 18. Secure Application Architecture

The first step in securing any web application is the architecture phase.

When building a product, a cross-functional team of software engineers and product managers usually collaborate to find a technical model that will serve a very specific business goal in an efficient manner.

In software engineering, the role of an architect is to design modules at a high level and evaluate the best ways for modules to communicate with each other. This can be extended to determining the best ways to store data, what third party dependencies to rely on, what programming paradigm should be predominant throughout the codebase, etc.

Similarly to a building architect, software architecture is a delicate process that carries a large amount of risk because rearchitecture and refactor are expensive processes once an application has already been built.

Security architecture includes a similar risk profile to software or building architecture. Often, vulnerabilities can be prevented easily in the architecture phase with careful planning and evaluation. However, too little planning and application code must be rearchitected and refactored—often at a large cost to the business.

The National Institute of Standards and Technology (NIST) has claimed based on a study of popular web applications that: “The cost of removing an application security vulnerability during the design phase ranges from 30-60 times less than if removed during production.” Hence solidifying any doubts we have regarding the importance of the architecture phase.

Analyzing Feature Requirements

The first step in ensuring a product or feature is architected securely is collecting all of the business requirements that said product or feature is expected to implement.

Tip

Any organization that has separate teams for security and R&D should ensure that communication pathways between the two are built into the development process.

Features cannot be properly analyzed in a silo, and such analysis should include stakeholders from engineering as well as product development.

Business requirements can be evaluated for risk, prior to their integration in a web application even being considered.

Consider the following business case:

MegaBank management has decided that due to their stellar brand image after cleaning up multiple security holes in their codebase, they can capitalize on their newly found popularity by beginning their own merchandising brand.

MegaBank’s new merchandising brand, MegaMerch will offer a collection of high-quality cotton tee-shirts, comfortable cotton/elastic sweatpants and men and women’s swimware with the MegaMerch (MM) logo.

In order to distribute merchandise under the new MegaMerch brand, MegaBank would like to set up an eCommerce application that meets the following requirements:

  1. Users can create accounts and sign in

  2. User accounts contain full name, address, and date of birth

  3. Users can access the front-page of the store which shows items

  4. Users can search for specific items

  5. Users can save credit cards and bank accounts for use later

A high level analysis of these requirements tells us a few important tidbits of information:

  1. We are storing credentials

  2. We are storing personal identifier information

  3. Users have elevated privileges compared to guests

  4. Users can search through existing items

  5. We are storing financial data

These points, while not out of the ordinary allow us to derive an initial analysis of what potential risks this application could encounter if not architected correctly.

A few of the risk areas derived from this analysis are as follows:

  • Authentication & Authorization: how do we handle sessions, logins, cookies?

  • Personal Data: is it handled differently than other data? Do laws affect how we should handle said data?

  • How is the search engine implemented? Does it draw from the primary database as it’s single-source-of-truth or use a separate cached database?

Each of these risks brings up many questions in regards to implementation details which as a result provide surface area for a security engineer to assist in developing the application in a more secure direction.

Authentication & Authorization

Because we are storing credentials, and offering a different user experience to guests and registered users—we know we have both an authentication and authorization system. This means we must allow users to log in, as well as be able to differentiate different tiers of users when determining what actions these users are allowed.

Furthermore, because we are storing credentials and we support a log-in flow—we know there are going to be credentials sent over the network. These credentials must also be stored in a database, otherwise the authentication flow would break down.

This means we have to consider the following risks:

  • How do we handle data in transit?

  • How do we handle the storage of credentials?

  • How do we handle various authorization levels of users?

Secure Sockets Layer (SSL) & Transport Layer Security (TLS)

lets-encrypt
Figure 18-1. Lets Encrypt—one of only a few non-profit security authorities (SA) that provides certificates for TLS encryption. The certificates issued are valid for 90 days and supported by all major web browsers.

One of the most important architectural decisions to tackle as a result of the risks we have determined is how we should handle data in transit.

Data in transit is an important first step evaluation during architecture review because it will affect the flow of all data that takes place throughout the web application.

An initial data-in-transit requirement should be that all data sent over the network is encrypted en-route. This reduces the risk of a man-in-the middle attack which could otherwise be performed in order to steal credentials from our users and make purchases on their behalf (since we are storing their financial data).

SSL & TLS are the two major cryptographic protocols in use today for securing in-transit data from malicious eyes in the middle of any network.

SSL was designed by Netscape in the mid-1990’s, and several versions of the protocol have been released in the timeframe since then.

TLS was defined by a web RFC (#2246) in 1999, which offered upgraded security in response to several architectural issues in SSL. TSL cannot interpolate with older versions of SSL due to the amount of architectural differences between the two.

TLS offers the most rigid security, while SSL has higher adoption but multiple found vulnerabilities that reduce it’s integrity as a cryptographic protocol.

All major web browsers today will show a lock icon in the URL address bar when a website’s communication is properly secured via SSL or TLS. The HTTP standard offers “HTTPS” or “HTTP Secure”, a URI-scheme that requires TLS/SSL to be present prior to allowing any data to be sent over the network. Browsers that support HTTPS will throw a warning to the end-user if TLS/SSL connections are compromised when an HTTPS request is made.

For MegaMerch, we would want to ensure that all data is encrypted and TLS compatible prior to being sent over the network.

How TLS is implemented is generally server-specific, but every major web-server software package offers an easy integration to begin encrypting web traffic.

Secure Credentials

Password security requirements exist for a number of reasons, but unfortunately most developers don’t understand what makes a password hacker-safe.

Making a secure password has less to do with the length and number of special characters—but instead has everything to do with the patterns that can be found in said password. In cryptography, this is known as “entropy”—aka the amount of randomness and uncertainty. You want passwords with a lot of entropy.

Believe it or not, most passwords used on the web are not unique. When a hacker attempts to brute force logins to a web application, the easiest route is to find a list of the top most common passwords and use that to perform a dictionary attack.

An advanced dictionary attack will also include combinations of common passwords, common password structure and common combinations of passwords.

Beyond that, classical brute forcing involves iterating through all possible combinations.

As you can see, it is not so much the length of the password that will protect you—but instead the lack of observable patterns and avoidance of common words and phrases.

Unfortunately, it is difficult to convey this to users. Instead we should make our requirements such that it is difficult for a user to develop a password that contains a number of well known patterns.

For example, we can reject any password in a top-1000 password list and tell them it is too common.

We should also prevent our users from using birthdates, first name, last name or any part of their address.

At MegaMerch, we can require first name, last name and birthdate at signup and prevent these from being allowed within the user’s password.

Hashing Credentials

When storing sensitive credentials, we should never store in plaintext. Instead, we should hash the password the first time we see it prior to storing it.

Hashing a password is not a difficult process, and the security benefits are massive.

Hashing algorithms differ from most encryption algorithms for a number of reasons. First off, hashing algorithms are not reversible. This is a key point when dealing with passwords. We don’t want even our own staff to be able to steal user passwords because users might use those passwords elsewhere (a bad practice, but common) and we don’t want that type of liability in the case of a rogue employee.

Next off, modern hashing algorithms are extremely efficient. Today’s hashing algorithms can represent multiple-megabyte strings of characters in just 128-264 bits of data. This means when we do a password check we will re-hash the user’s password at login and compare it to the hashed password in the database. Even if the user has a huge password we will be able to perform the lookup at high speeds.

Another key point in using a hash is that modern hashing algorithms have almost no collision in practical application (either 0 collisions, or statistically approaching 0—aka 1/1,000,000,000+). This means, you can mathematically evaluate the probability that two passwords will have identical hashes and it will be extraordinarily low. As a result, you do not need to worry about hackers “guessing” a password unless they guess the exact password of another user.

If a database is breached and data is stolen, properly hashed passwords protect your users. The hacker will only have access to the hash, and as such it will be very unlikely that even a single password in your database will be reverse engineered.

Let’s consider three cases where a hacker gets access to MegaMerch’s databases:

case #1: Passwords stored in plaintext

result: All passwords compromised

case #2: Passwords hashed with MD5 algorithm

result: Hacker can crack some of the passwords using rainbow tables (a precomputed table of hash→password. Weaker hashing algorithms are susceptible to these)

case #3: Passwords hashed with BCrypt

result: It is unlikely any passwords will be cracked

As you can see, all passwords should be hashed. Furthermore, the algorithm used for hashing should be evaluated based on it’s mathematical integrity and it’s scalability with modern hardware. Algorithms should be SLOW on modern hardware when hashing, hence reducing the number of guesses per second a hacker can make.

When cracking passwords, slow hashing algorithms are essential because the hacker will be automating the process of password -> hash. Once the hacker finds an identical hash to a password (ignoring potential collision), the password has been effectively breached. Extremely slow to hash algorithms like BCrypt can take years or more to crack one password on modern hardware.

Modern web applications should consider the following hashing algorithms for securing the integrity of their user’s credentials:

BCrypt

BCrypt is a hashing function which derives it’s name from two developments: the “B” comes from Blowfish Cipher, a symmetric-key block cipher developed in 1993 by Bruce Schneier which was designed as a general purpose and open source encryption algorithm. “Crypt” on the other hand is the name of the default hashing function that shipped with Unix operating systems.

The Crypt hashing function was written with early Unix-hardware in mind, which meant that current-era hardware could not hash enough passwords per second to reverse engineer a hashed password using the crypt function. At the time of it’s development, Crypt hash could hash less than 10 passwords per second. With modern hardware, the Crypt function can be used to hash tens of thousands of passwords per second. This makes breaking a Crypt-hashed password an easy operation for any current-era hacker.

BCrypt iterates on both Blowfish and Crypt by offering a hashing algorithm that actually becomes slower on faster hardware. BCrypt hashed passwords scale into the future, because the more powerful the hardware attempting to hash using BCrypt the more operations are required. As a result, it is nearly impossible for a hacker today to write a script that would perform enough hashes to match a complex password via brute force.

PBKDF2

As an alternative to BCrypt, the PBKDF2 hashing algorithm can also be used to secure passwords.

PBKDF2 is a hashing algorithm based on a concept known as “key stretching”. Key stretching algorithms will rapidly generate a hash on the first attempt, but each additional attempt will become slower and slower. As a result of this, PBKDF2 makes brute forcing a computationally expensive process.

PBKDF2 was not originally designed for hashing passwords, but should be sufficient for hashing passwords when BCrypt-like algorithms are not available.

PBKDF2 takes configuration as to the minimum number of iterations in order to generate a hash, this minimum should always be set to the highest number of iterations your hardware can handle. You never know what type of hardware a hacker might have access to, so by setting the minimum iterations for a hash to your hardware maximum value you are eliminating potential iterations on faster hardware and eliminating any attempts on slower hardware.

In our evaluation of MegaMerch, we have decided to hash our passwords using BCrypt and will only compare password hashes.

2FA

2fa
Figure 18-2. Google Authenticator—one of the most commonly used 2FA applications for Android and IoS. Google authenticator is compatible with a large number of websites and has an open API should you wish to integrate it into your application.

In addition to requiring secure, hashed passwords that are encrypted in transit—we also should consider offering 2FA to our users that want to ensure their account’s integrity is not compromised.

2FA is a fantastic security feature, which operates at very effectively based on a very simple principal.

Most 2FA systems require a user to enter a password into their browser, in addition to entering a password generated from a mobile application or SMS text message.

More advanced 2FA protocols actually make use of a physical hardware token, usually this is a USB drive that generates a unique one-time-use token when plugged into a user’s computer. Generally speaking the physical tokens are more applicable in use to a business’s employees than to their users. Distributing and managing physical tokens for an eCommerce platform would be a painful experience for everyone involved.

Phone app/SMS-based 2FA might not be as secure as a dedicated 2FA USB token, but the benefits are still an order of magnitude safer than application use without 2FA.

In absence of any vulnerabilities in the 2FA app or messaging protocol, 2FA eliminates remote logins to your web application that where not initiated by the owner of the account.

The only way to compromise a 2FA account is to gain access to both the account password and the physical device containing the 2FA codes (usually a phone).

During our architecture review with MegaMerch, we strongly suggest offering 2FA to user’s that wish to improve the security of their MegaMerch accounts.

PII + Financial Data

First, when we store personally identifiable information (PII) on a user we need to ensure that such storage is legal in the countries we are operating in—and that we are following any applicable laws for PII storage in those countries.

Beyond that, we want to ensure that in the case of a database breach or server compromise—the PII is not exposed in a format that makes it easily abusable.

Similar rules to PII apply to financial data, such as credit card numbers (also included under PII laws in some countries).

A smaller company might find that rather than storing PII and financial details on their own, it could be more effective of a strategy to outsource the storage of such data to a compliant business that specializes in such data storage.

Searching

Any web application implementing their own custom search engine should consider the implications of such a task.

Search engines typically require data to be stored in a way that makes particular queries very efficient. The way that data is ideally stored in a search engine is much different than the way data is ideally stored in a general purpose database.

As a result of this, most web applications implementing a search engine will need a separate database from which the search engine draws it’s data.

As you can clearly see, this could cause a number of complications—hence requiring proper security architecture up front rather than later.

Syncing any two databases is a big undertaking, if the permissions model in the primary database is updated—the search engine’s database must be updated to reflect the changes in the primary database.

Additionally, it may be possible that bugs introduced into the codebase cause certain models in the primary database to be deleted—but not in the search database. Alternatively, metadata in the search database regarding a particular object may still be searchable after the object has been removed from the primary database.

All of these are examples of concerns with implementing search that should definitely be considered upfront prior to implementing any search engine be it Elastic Search or an in-house solution.

Summary

The example detailed above exists to suggest that there are many concerns to be considered when building any new application.

Whenever a new application is being developed by a product organization, the design and architecture of the application should also be analyzed carefully by a skilled security engineer or architect.

Deep security flaws—such as an improper authentication scheme, or half-baked integration with a search engine could expose your application to risk that is not easily resolved.

Once paying customers begin relying on your application in their workflows—especially after contracts are written and signed, resolving architecture level security bugs will become a daunting task.

At the beginning of this chapter I included the estimate from NIST that a security flaw found in the architecture phase of an application could cost 30-60 times less to fix than when it is found in production.

This can be because of a combination of factors, including the following:

  1. Customers may be relying on insecure functionality, hence causing you to build secure equivalent functionality and provide the customers with a migration plan so that downtown is not encountered.

  2. Deep architecture level security flaws may require re-rewriting a significant number of modules, in addition to the insecure module. For example, a complex 3D video game with a flawed multiplayer module may require re-rewriting of not only the networking module, but the game modules written on top of multiplayer networking module as well. This is especially true if an underlying technology has to be swapped out to improve security (aka, moving from UDP or TCP networking for example).

  3. The security flaw may have been exploited, hence costing the business actual money in addition to engineering time.

  4. The security flaw may be published, bringing bad PR against the affected web application—hence costing the business in lost engagements and customers who will choose to leave.

Ultimately, the ideal phase to catch and resolve security concerns is always the architecture phase. Eliminating security issues in this phase will save you money in the long run, and eliminate potential headaches caused by external discovery or publication later on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.190.232