I carry my unwritten poems in cipher on my face!
—George Eliot
This chapter introduces you to algorithms related to cryptography. We will start by presenting the background, then we will discuss symmetric encryption algorithms. We will then explain the Message-Digest 5 (MD5) algorithm and the Secure Hash Algorithm (SHA) and present the limitations and weaknesses of symmetric algorithms. Next, we will discuss asymmetric encryption algorithms and how they are used to create digital certificates. Finally, we will present a practical example that summarizes all of these techniques.
By the end of this chapter, you will have a basic understanding of various issues related to cryptography.
The following topics are discussed in this chapter:
Let’s start by looking at the basic concepts.
Techniques to protect secrets have been around for centuries. The earliest attempts to secure and hide data from adversaries date back to ancient inscriptions discovered on monuments in Egypt, where a special alphabet that was known by only a few trusted people was used. This early form of security is called obscurity and is still used in different forms today. In order for this method to work, it is critical to protect the secret, which would be the secret meaning of the alphabet in the above example. Later in time, finding foolproof ways of protecting important messages was important in both World War One and World War Two. In the late 20th century, with the introduction of electronics and computers, sophisticated algorithms were developed to secure data, giving rise to a whole new field called cryptography. This chapter discusses the algorithmic aspects of cryptography. One of the uses of these algorithms is to allow secure data exchange between two processes or users. Cryptographic algorithms find strategies for using mathematical functions to ensure the stated security goals.
First, we will look at the importance of “the weakest link” in the infrastructure.
Sometimes, when architecting the security of digital infrastructure, we put too much emphasis on the security of individual entities and don’t pay the necessary attention to end-to-end security. This can result in us overlooking some loopholes and vulnerabilities in the system, which can later be exploited by hackers to access sensitive data. The important point to remember is that a digital infrastructure, as a whole, is only as strong as its weakest link. For a hacker, this weakest link can provide backdoor access to sensitive data in the digital infrastructure. Beyond a certain point, there is not much benefit in fortifying the front door without closing all the back doors.
As the algorithms and techniques for keeping digital infrastructure become more and more sophisticated, attackers keep upgrading their techniques as well. It is always important to remember that one of the easiest ways for attackers to hack digital infrastructure is by exploiting these vulnerabilities to access sensitive information.
In 2014, a cyber attack on a Canadian federal research institute—the National Research Council (NRC)—is estimated to have cost hundreds of millions of dollars. The attackers were able to steal decades of research data and intellectual property material. They used a loophole in the Apache software that was used on the web servers to gain access to the sensitive data.
In this chapter, we will highlight the vulnerabilities of various encryption algorithms.
Let’s first look at the basic terminology used.
Let’s look at the basic terminology related to cryptography:
Let us first understand the security needs of a system.
It is important to first understand the exact security needs of a system. Understanding this will help us use the correct cryptographic technique and discover the potential loopholes in a system.
One way of developing a better understanding of the security needs of a system is by answering the following four questions:
Let us take the example of a Virtual Private Cloud (VPC) in the AWS cloud. A VPC allows us to create a logical isolation network where resources like virtual machines are added to it. In order to understand the security requirements of a VPC, it is important to first identify the identities by answering those four questions:
Most of the answers to these questions will come by performing the following three steps:
Let’s look at these steps one by one.
An entity can be defined as an individual, a process, or a resource that is part of an information system. We first need to identify how users, resources, and processes are present at runtime. Then, we will quantify the security needs of these identified entities, either individually or as a group.
Once we better understand these requirements, we can establish the security goals of our digital system.
The goal of designing a security system is to protect information from being stolen, compromised, or attacked. Cryptographic algorithms are typically used to meet one or more security goals:
It is important to understand the classified nature of data. Data is categorized by regulatory authorities such as governments, agencies, or organizations based on how serious the consequence will be if it is compromised. The categorization of the data helps us choose the correct cryptographic algorithm. There is more than one way to categorize data, based on the sensitivity of the information it contains. Let’s look at the typical ways of classifying data:
Top-secret data is protected through multiple layers of security and requires special permission to access it.
In general, more sophisticated security designs are much slower than simple algorithms. It is important to strike the right balance between the security and the performance of the system.
Designing ciphers is about coming up with an algorithm that can scramble sensitive data so that a malicious process or an unauthorized user cannot access it. Although, over time, ciphers have become more and more sophisticated, the underlying principles that ciphers are based on remain unchanged.
Let’s start by looking at some relatively simple ciphers that will help us understand the underlying principles that are used in the design of cryptographic algorithms.
Substitution ciphers have been in use for hundreds of years in various forms. As the name indicates, substitution ciphers are based on a simple concept—substituting characters in plain text with other characters in a predetermined, organized way.
Let’s look at the exact steps involved in this:
The following are examples of substitution-based ciphers:
Let us look into them in more detail.
Caesar ciphers are based on substitution mapping. Substitution mapping changes the actual string in a deterministic way by applying a simple formula that is kept secret.
The substitution mapping is created by replacing each character with the third character to the right of it. This mapping is described in the following diagram:
Figure 13.1: The substitution mapping of Caesar ciphers
Let’s see how we can implement a Caesar cipher using Python:
rotation = 3
P = 'CALM'; C=''
for letter in P:
C = C+ (chr(ord(letter) + rotation))
We can see that we applied a Caesar cipher to the plaintext CALM
.
Let’s print the cipher text after encrypting it with the Caesar cipher:
print(C)
FDOP
Caesar ciphers are said to have been used by Julius Caesar to communicate with his advisers.
A Caesar cipher is a simple cipher and is easy to implement. The downside is that it is not too difficult to crack as a hacker could simply iterate through all the possible shifts of the alphabet (all 2626 of them) and see if any coherent message appears. Given the current processing abilities of computers, this is a relatively small number of combinations to do. It should not be used to protect highly sensitive data.
ROT13 is a special case of the Caesar cipher where the substitution mapping is created by replacing each character with the 13th character to the right of it. The following diagram illustrates this:
Figure 14.2: Workings of ROT13
This means that if ROT13()
is the function that implements ROT13, then the following applies:
rotation = 13
P = 'CALM'; C=''
for letter in P:
C = C+ (chr(ord(letter) + rotation))
Now, let’s print the encoded value of C
:
print(c)
PNYZ
ROT13 is actually not used to accomplish data confidentiality. It is used more to mask text, for example, to hide potentially offensive text. It can also be used to avoid giving away the answer to a puzzle, and in other similar use-cases.
Substitution ciphers are simple to implement and understand. Unfortunately, they are also easy to crack. Simple cryptanalysis of substitution ciphers shows that if we use the English language alphabet, then all we need to determine to crack the cipher is how much we are rotating by. We can try each letter of the English alphabet one by one until we are able to decrypt the text. This means that it will take around 25 attempts to reconstruct the plain text.
Now, let’s look at another type of simple cipher—transposition ciphers.
In transposition ciphers, the characters of the plain text are encrypted using transposition. Transposition is a method of encryption where we scramble the position of the characters using deterministic logic. A transposition cipher writes characters into rows in a matrix and then reads the columns as output. Let’s look at an example.
Let’s take the Ottawa Rocks
plain text (P).
First, let’s encode P. For that, we will use a 3 x 4 matrix and write in the plaintext horizontally:
O |
t |
t |
a |
w |
a |
R |
o |
c |
k |
s |
The read
process will read the string vertically, which will generate the cipher text—OwctaktRsao
. The key would be {1,2,3,4}, which is the order in which the columns are read. Encrypting with a different key, say, {2,4,3,1}, would result in a different cipher text, in this case, takaotRsOwc
.
The Germans used a cipher named ADFGVX in the First World War, which used both transposition and substitution ciphers. Years later, it was cracked by George Painvin.
So, these are some of the types of ciphers. In general, ciphers use a key to code plain text. Now, let’s look at some of the cryptographic techniques that are currently used. Cryptography protects a message using encryption and decryption processes, as discussed in the next section.
Different types of cryptographic techniques use different types of algorithms and are used under different sets of circumstances. As different situations and use-cases have different requirements of security based on the business requirements and the data classification, the selection of the right technique is important for a well-designed architecture.
Broadly, cryptographic techniques can be divided into the following three types:
Let’s look at them one by one.
The cryptographic hash function is a mathematical algorithm that can be used to create a unique fingerprint of a message. It creates an output, called a hash, from plain text. The size of the output is usually fixed but can vary for some specialized algorithms.
Mathematically, this looks as follows:
C1 = hashFunction(P1)
This is explained as follows:
This is shown in the following diagram. The variable-length data is converted into a fixed-length hash through a one-way hash function:
Figure 14.3: One-way hash functions
A hash function is a mathematical algorithm that transforms an arbitrary amount of data into a fixed-size string of bytes. It plays a vital role in ensuring the integrity and authenticity of data. Below are the key characteristics that define a cryptographic hash function:
If we have a situation where each unique message does not have a unique hash, we call it a collision. In other words, a collision is when the hash algorithm produces the same hash value for two different input values. For security applications, a collision is a potential vulnerability and its probability should be very low. That is, if we have two texts, P1 and P2, in the case of collision, it means hashFunction(P1) = hashFunction(P2).
Regardless of the hashing algorithm used, collisions are rare. Otherwise, hashing wouldn’t be useful. However, for some applications, collisions cannot be tolerated. In those cases, we need to use a hashing algorithm that is more complex but much less likely to generate hash values that collide.
Cryptographic hash functions can be implemented by using various algorithms. Let’s take a deeper look at two of them:
MD5 was developed by Poul-Henning Kamp in 1994 to replace MD4. It generates a 128-bit hash. Generating a 128-bit hash means that the resulting hash value is made up of 128 binary digits (bits).
This translates to a fixed length of 16 bytes or 32 hexadecimal characters. The fixed length ensures that no matter the size of the original data, the hash will always be 128 bits long. The purpose of this fixed-length output is to create a “fingerprint” or “digest” of the original data. MD5 is a relatively simple algorithm that is vulnerable to collision. In applications where a collision cannot be tolerated, MD5 should not be used. For example, it can be used to check the integrity of files downloaded from the internet.
Let’s look at an example. In order to generate an MD5 hash in Python, we will start by using the hashlib
module, which is part of the Python Standard Library and provides a range of different cryptographic hashing algorithms:
import hashlib
Next, we define a utility function called generate_md5_hash()
, which takes input_string
as a parameter. This string will be hashed by the function:
def generate_md5_hash(input_string):
# Create a new md5 hash object
md5_hash = hashlib.md5()
# Encode the input string to bytes and hash it
md5_hash.update(input_string.encode())
# Return the hexadecimal representation of the hash
return md5_hash.hexdigest()
Note that hashlib.md5()
creates a new hash object. This object uses the MD5 algorithm and md5_hash.update(input_string.encode())
updates the hash object with the bytes of the input string. The string is encoded to bytes using the default UTF-8 encoding. After all data has been updated in the hash object, we can call the hexdigest()
method to return the hexadecimal representation of the digest. This is the MD5 hash of the input string.
Here we use the generate_md5_hash()
function to get the MD5 hash of the string "Hello, World!"
, and print the result to the console:
def verify_md5_hash(input_string, correct_hash):
# Generate md5 hash for the input_string
computed_hash = generate_md5_hash(input_string)
# Compare the computed hash with the provided hash
return computed_hash == correct_hash
# Test
input_string = "Hello, World!"
hash_value = generate_md5_hash(input_string)
print(f"Generated hash: {hash_value}")
correct_hash = hash_value
print(verify_md5_hash(input_string, correct_hash))# This should return True
Generated hash: 65a8e27d8879283831b664bd8b7f0ad4
True
In the verify_md5_hash
function, we take an input string and a known correct MD5 hash. We generate the MD5 hash of the input string using our generate_md5_hash
function and then compare it to the known correct hash.
Looking back at history, weaknesses with MD5 were discovered in the late 1990s. Despite several issues, MD5 usage is still popular. It is ideal to be used for integrity checks for data. Note that the MD5 message digest does not uniquely associate the hash with its owner as the MD5 digest is not a signed hash. MD5 is used to prove that a file has not been changed since the hash was computed. It is not used to prove the authenticity of a file. Now, let’s look at another hashing algorithm—SHA.
SHA was developed by the National Institute of Standards and Technology (NIST). It’s widely used to verify the integrity of data. Among its variations, SHA-512 is a popular hash function, and Python’s hashlib
library includes it. Let’s see how we can use Python to create a hash using the SHA algorithm. For that, let us first import the hashlib
library:
import hashlib
Then we will define the salt and the message. Salting is the practice of adding random characters to a password before hashing. It enhances security by making hash collisions more challenging:
salt = "qIo0foX5"
password = "myPassword"
Next, we will combine the salt with the password to apply the salting procedure:
salted_password = salt + password
Then, we will use the sha512
function to create a hash of the salted password:
sha512_hash = hashlib.sha512()
sha512_hash.update(salted_password.encode())
myHash = sha512_hash.hexdigest()
Let us print myHash
:
myHash
2e367911b87b12f73b135b1a4af9fac193a8064d3c0a52e34b3a52a5422beed2b6276eabf9
5abe728f91ba61ef93175e5bac9a643b54967363ffab0b35133563
Note that when we use the SHA algorithm, the hash generated is 512 bytes. This specific size isn’t arbitrary, but rather a key component of the algorithm’s security features. A larger hash size corresponds to an increased number of potential combinations, thereby reducing the chances of “collisions”—instances where two different inputs produce the same hash output. Collisions compromise the reliability of a hashing algorithm, and SHA-512’s 512-byte output significantly reduces this risk.
Hash functions are used to check the integrity of a file after making a copy of it. To achieve this, when a file is copied from a source to a destination (for example, when downloaded from a web server), a corresponding hash is also copied with it. This original hash, horiginal, acts as a fingerprint of the original file. After copying the file, we generate the hash again from the copied version of the file—that is, hcopied. If horiginal = hcopied—that is, the generated hash matches the original hash—this verifies that the file has not changed and none of the data was lost during the download process. We can use any cryptographic hash function, such as MD5 or SHA, to generate a hash for this purpose.
Both MD5 and SHA are hashing algorithms. MD5 is simple and fast, but it does not provide good security. SHA is complex compared to MD5 and it provides a greater level of security.
Now, let’s look at symmetric encryption.
In cryptography, a key is a combination of numbers that is used to encode plain text using an algorithm of our choice. In symmetric encryption, we use the same key for encryption and decryption. If the key used for symmetric encryption is K, then for symmetric encryption, the following equation holds:
EK(P) = C
Here, P is the plain text and C is the cipher text.
For decryption, we use the same key, K, to convert it back to P:
DK(C) = P
This process is shown in the following diagram:
Figure 14.4: Symmetric encryption
Now, let’s look at how we can use symmetric encryption with Python.
In this section, we’ll explore how to work with hash functions using Python’s built-in hashlib
library. hashlib
comes pre-installed with Python and provides a wide array of hashing algorithms. First, let us import the hashlib
library:
import hashlib
We’ll use the SHA-256 algorithm to create our hash. Other algorithms like MD5, SHA-1, etc., can also be used:
sha256_hash = hashlib.sha256()
Let’s create a hash for the message "Ottawa is really cold"
:
message = "Ottawa is really cold".encode()
sha256_hash.update(message)
The hexadecimal representation of the hash can be printed with:
print(sha256_hash.hexdigest())
b6ee63a201c4505f1f50ff92b7fe9d9e881b57292c00a3244008b76d0e026161
Let’s look at some of the advantages of symmetric encryption.
The following are the advantages of symmetric encryption:
When two users or processes plan to use symmetric encryption to communicate, they need to exchange keys using a secure channel. This gives rise to the following two problems:
Now, let’s look at asymmetric encryption.
In the 1970s, asymmetric encryption was devised to address some of the weaknesses of symmetric encryption that we discussed in the previous section.
The first step in asymmetric encryption is to generate two different keys that look totally different but are algorithmically related. One of them is chosen as the private key, Kpr, and the other one is chosen as the public key, Kpu. The choice of which one of the two keys is public or private is arbitrary. Mathematically, we can represent this as follows:
EKpr(P) = C
Here, P is the plain text and C is the cipher text.
We can decrypt it as follows:
DKpu(C) = P
Public keys are supposed to be freely distributed and private keys are kept secret by the owner of the key pair. For instance, in AWS, key pairs are used to secure connections to virtual instances and manage encrypted resources. The public key is used by others to encrypt data or verify signatures, while the private key, securely stored by the owner, is used to decrypt data or sign digital content. By adhering to the principle of keeping the private key secret and the public key accessible, AWS users can ensure secure communication and data integrity within their cloud environment. This separation between public and private keys is a cornerstone in the security and trust mechanisms within AWS and other cloud services.
The fundamental principle is that if you encrypt with one of the keys, the only way to decrypt it is by using the other key. For example, if we encrypt the data using the public key, we will need to decrypt it using the other key—that is, the private key.
Now, let’s look at one of the fundamental protocols of asymmetric encryption—the Secure Sockets Layer (SSL)/Transport Layer Security (TLS) handshake—which is responsible for establishing a connection between two nodes using asymmetric encryption.
SSL was originally developed to add security to HTTP. Over time, SSL was replaced with a more efficient and more secure protocol, called TLS. TLS handshakes are the basis of how HTTP creates a secure communication session. A TLS handshake occurs between the two participating entities—the client and the server. This process is shown in the following diagram:
Figure 14.5: Secure session between the client and the server
A TLS handshake establishes a secure connection between the participating nodes. The following are the steps that are involved in this process:
client hello
message to the server. The message also contains the following:byte_client
server hello
message back to the client. The message also contains the following:byte_server
.cert_server
, containing the public key of the server.cert_server
.byte_client2
, and encrypts it with the public key of the server provided through cert_server
.finished
message to the server, which is encrypted with a secret key.finished
message to the client, which is encrypted with a secret key.Figure 14.6: Secure session between the client and the server
Now, let’s discuss how we can use asymmetric encryption to create Public Key Infrastructure (PKI), which is created to meet one or more security goals for an organization.
Asymmetric encryption is used to implement PKI. PKI is one of the most popular and reliable ways to manage encryption keys for an organization. All the participants trust a central trusting authority called a Certification Authority (CA). CAs verify the identity of individuals and organizations and then issue them digital certificates (a digital certificate contains a copy of a person or organization’s public key and its identity), verifying that the public key associated with that individual or organization actually belongs to that individual or organization.
The way it works is that the CA asks a user to prove their identity. The basic validation is called domain validation, which could involve simply verifying ownership of a domain name. The extended validation, if needed, involves a more rigorous process that involves physical proof of identity, depending on the type of digital certificate that a user is trying to obtain. If the CA is satisfied that the user is indeed who they claim to be, the user then provides the CA with their public encryption key over a secure channel.
The CA uses this information to create a digital certificate that contains information about the user’s identity and their public key. This certificate is digitally signed by the CA. The certificate is a public entity as the user can then show their certificate to anyone who wants to verify their identity, without having to send it through a secure channel, as the certificate doesn’t contain any sensitive information itself. The person receiving the certificate does not have to verify the user’s identity directly. That person can simply verify that the certificate is valid by verifying the CA’s digital signature, which validates that the public key contained in the certificate does, in fact, belong to the individual or organization named on the certificate.
The private key of the CA of an organization is the weakest link in the PKI chain of trust. If an impersonator got hold of Microsoft’s private key, for example, they could install malicious software on millions of computers around the world by impersonating a Windows update.
There is no doubt that in recent years there has been a lot of excitement around blockchain and cryptocurrency. Blockchain is said to be one of the most secure technologies ever invented. The excitement about blockchain started with Bitcoin and digital currencies. Digital currencies were first developed in 1980, but with Bitcoin, they became mainstream. The rise of Bitcoin was due to the widespread availability of distributed systems. It has two important characteristics that made it a game-changer:
Although blockchain was developed for Bitcoin, it has found broader use and applications. Blockchain is based on a distributed consensus algorithm, using Distributed Ledger Technology (DLT). It has the following characteristics:
Note that the term “P2P” stands for “Peer-to-Peer,” which means that each node, or “peer,” in the network communicates directly with the others without needing to go through a central server or authority.
Under the hood, blockchain transactions use cryptographic hashes from each of the previous blocks in the chain. Hash functions are used to create a one-way fingerprint of an arbitrary chunk of data. A Merkle tree or hash tree is used to verify data stored, handled, and transferred between different participating nodes. It uses SHA-2 for hashing. A diagram of a particular transaction is shown below:
Figure 14.7: The Merkle tree of blockchain
Figure 13.7 summarizes the workings of blockchain. It shows how transactions get converted into blocks, which are, in turn, converted into chains. On the left-hand side, four transactions, A, B, C, and D, are shown. Next, the Merkle root is created by applying a hash function. The Merkle root can be considered a data structure that forms part of the block header. As transactions are immutable, the previously recorded transactions cannot be changed.
Note that the hash value of the previous block header also becomes part of the block, thus incorporating transaction records. This creates chain-like processing structures and is the reason for the name blockchain.
Each blockchain user is authenticated and authorized using cryptography, eliminating the need for third-party authentication and authorization. Digital signatures are used to secure transactions as well. The receiver of a transaction has a public key. Blockchain technology eliminates the involvement of third parties for transaction validation and relies on cryptographic proof for this. Transactions are secured using a digital signature. Each user has a unique private key that establishes their digital identity in the system.
In Chapter 6, Unsupervised Machine Learning Algorithms, we looked at the Cross-Industry Standard Process for Data Mining (CRISP-DM) life cycle, which specifies the different phases of training and deploying a machine learning model. Once a model is trained and evaluated, the final phase is deployment. If it is a critical machine learning model, then we want to make sure that all of its security goals are met.
Let’s analyze the common challenges faced in deploying a model such as this and how we can address those challenges using the concepts discussed in this chapter. We will discuss strategies to protect our trained model against the following three challenges:
Let’s look at them one by one.
One of the possible attacks that we would want to protect our model against is MITM attacks. A MITM attack occurs when an intruder tries to eavesdrop on a supposedly private communication.
Let’s try to understand MITM attacks sequentially using an example scenario.
Let’s assume that Bob and Alice want to exchange messages using PKI:
This MITM attack is shown in the following diagram:
Figure 14.8: MITM attack
Now, let’s look at how we can prevent MITM attacks.
Let’s explore how we can prevent MITM attacks by introducing a CA to the organization. Let’s say the name of this CA is myTrustCA. The digital certificate has its public key, named PumyTrustCA
, embedded in it. myTrustCA is responsible for signing the certificates for all of the people in the organization, including Alice and Bob. This means that both Bob and Alice have their certificates signed by myTrustCA. When signing their certificates, myTrustCA verifies that they are indeed who they claim to be.
Now, with this new arrangement in place, let’s revisit the sequential interaction between Bob and Alice:
When deploying a trained machine learning model, instead of Alice, there is a deployment server. Bob only deploys the model after establishing a secure channel, using the previously mentioned steps.
Attacker X pretends to be an authorized user, Bob, and gains access to sensitive data, which is the trained model, in this case. We need to protect the model against any unauthorized changes.
One way of protecting our trained model against masquerading is by encrypting the model with an authorized user’s private key. Once encrypted, anyone can read and utilize the model by decrypting it through the public key of the authorized user, which is found in their digital certificate. No one can make any unauthorized changes to the model.
Once the model is deployed, the real-time unlabeled data that is provided as input to the model can also be tampered with. The trained model is used for inference and provides a label for this data. To protect data against tampering, we need to protect the data at rest and in communication. To protect the data at rest, symmetric encryption can be used to encode it.
To transfer the data, SSL-/TLS-based secure channels can be established to provide a secure tunnel. This secure tunnel can be used to transfer the symmetric key and the data can be decrypted on the server before it is provided to the trained model.
This is one of the more efficient and foolproof ways to protect data against tampering.
Symmetric encryption can also be used to encrypt a model when it has been trained, before deploying it to a server. This will prevent any unauthorized access to the model before it is deployed.
Let’s see how we can encrypt a trained model at the source, using symmetric encryption with the help of the following steps, and then decrypt it at the destination before using it:
import pickle
from joblib import dump, load
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from cryptography.fernet import Fernet
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = LogisticRegression(max_iter=1000) # Increase max_iter for convergence
model.fit(X_train, y_train)
filename_source = "unencrypted_model.pkl"
filename_destination = "decrypted_model.pkl"
filename_sec = "encrypted_model.pkl"
filename_source
is the file that will store the trained unencrypted model at the source. filename_destination
is the file that will store the trained unencrypted model at the destination, and filename_sec
is the encrypted trained model.pickle
to store the trained model in a file:
from joblib import dump
dump(model, filename_source)
write_key()
that will generate a symmetric key and store it in a file named key.key
:
def write_key():
key = Fernet.generate_key()
with open("key.key", "wb") as key_file:
key_file.write(key)
load_key()
that can read the stored key from the key.key
file:
def load_key():
return open("key.key", "rb").read()
encrypt()
function that can encrypt and train the model, and store it in a file named filename_sec
:
def encrypt(filename, key):
f = Fernet(key)
with open(filename,"rb") as file:
file_data = file.read()
encrypted_data = f.encrypt(file_data)
with open(filename_sec,"wb") as file:
file.write(encrypted_data)
filename_sec
:
write_key()
key = load_key()
encrypt(filename_source, key)
Now the model is encrypted. It will be transferred to the destination where it will be used for prediction:
decrypt()
that we can use to decrypt the model from filename_sec
to filename_destination
using the key stored in the key.key
file:
def decrypt(filename, key):
f = Fernet(key)
with open(filename, "rb") as file:
encrypted_data = file.read()
decrypted_data = f.decrypt(encrypted_data)
with open(filename_destination, "wb") as file:
file.write(decrypted_data)
filename_destination
:
decrypt(filename_sec, key)
loaded model = load(filename_destination)
result = loaded_model.score(X_test, y_test)
print(result)
0.9473684210526315
Note that we have used symmetric encryption to encode the model. The same technique can be used to encrypt data as well, if needed.
In this chapter, we learned about cryptographic algorithms. We started by identifying the security goals of a problem. We then discussed various cryptographic techniques and also looked at the details of the PKI. Finally, we looked at the different ways of protecting a trained machine learning model against common attacks. Now, you should be able to understand the fundamentals of security algorithms used to protect modern IT infrastructures.
In the next chapter, we will look at designing large-scale algorithms. We will study the challenges and trade-offs involved in designing and selecting large algorithms. We will also look at the use of a GPU and clusters to solve complex problems.
To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:
35.170.81.33