Amazon Redshift provides out-of-the-box features that enable you to build data warehouses to meet the requirements of the most security-sensitive organizations. In AWS, security is the highest priority and is a shared responsibility (https://aws.amazon.com/compliance/shared-responsibility-model/) between AWS and you. Using an Amazon Redshift managed service, the data center and network architecture come out of the box to meet the requirements of security-sensitive organizations. You can now configure the data and cluster management controls to meet your organization's requirements. Data can be encrypted to keep your data secure in transit and at rest using industry-standard encryption techniques. Amazon Redshift resources are controlled in the four different levels of cluster management (creating and configuring the cluster), cluster connectivity, database access to objects, and temporary/single sign-on.
Specifically, the following topics are covered in this chapter:
Here are the technical requirements in order to complete the recipes in this chapter:
a. An IAM policy attached to the IAM user that will give them access to Amazon Redshift, Amazon S3, AWS Secrets Manager, Amazon CloudWatch, Amazon CloudWatch Logs, Amazon EC2, Amazon Simple Notification Service (SNS) AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS Glue, and Amazon Virtual Private Cloud (Amazon VPC)
b. An IAM policy attached to the IAM role that will allow the Amazon Redshift cluster to access Amazon S3.
Amazon VPC allows you to launch Amazon Redshift clusters in a logically isolated virtual network in which you define the IP address range and subnets and configure the infrastructure security. When you provision an Amazon Redshift cluster, it is locked down by default, so nobody has access to it. To grant inbound access to an Amazon Redshift cluster, you associate the cluster using the security group. Having your Amazon Redshift cluster by following the least access security principle is a best practice.
To complete this recipe, you will need the following setup:
In this recipe, you will launch an Amazon Redshift cluster inside a custom VPC and subnet using the following steps:
Choosing the service endpoints from Amazon S3 allows the traffic to and from Amazon Redshift to be within the VPC, rather than the default of transcending the internet:
In the list of inbound Rules, instead of an individual IP's address, configuring the CIDR IP's ranges provides flexibility for allowing connections within your organization.
Note
You can learn more about setting up a VPC by using this working with VPC guide: https://docs.aws.amazon.com/vpc/latest/userguide/working-with-vpcs.html#add-ipv4-cidr.
Amazon Redshift by default provides you with the option to encrypt the cluster at rest, using an AES algorithm with 256-bit key. Key management can be performed by AWS KMS or your hardware security module. When an Amazon Redshift cluster is encrypted at rest, it provides block-level encryption. When the cluster is encrypted, the metadata and snapshots are also encrypted. This enables you to meet your security requirements to comply with PCI, SOX, HIPAA, and GDPR, depending on your needs.
Amazon Redshift uses envelope encryption using a robust four-tier hierarchy of encryption keys: the master key, cluster encryption key (CEK), database encryption key (DEK), and data encryption keys:
To complete this recipe, you will need the following setup:
In this recipe, we will see options to encrypt a new and an existing Amazon Redshift cluster.
Let's see the option to turn on encryption while creating an Amazon Redshift cluster:
Let's see the option to turn on encryption for an existing Amazon Redshift cluster.
Using the Amazon redshift console, navigate to the existing Amazon Redshift cluster. Choose Actions and select Rotate encryption:
With Amazon Redshift, you can encrypt your data in transit. Enabling the SSL allows SQL clients to encrypt the data in transit using the certificates. In addition, the AWS CLI, SDK, and the API client can communicate using the HTTS endpoints. For communication between AWS services such as Amazon S3, DynamoDB, and so on, Amazon Redshift uses hardware-accelerated SSL.
To complete this recipe, you will need the following:
In this recipe, we will enable the SSL connection in Amazon Redshift and the SQL Workbench client to establish an SSL connection:
select * from stl_connection_log
order by recordtime desc
limit 2;
Here is the output of the preceding code:
We have now successfully connected to Amazon Redshift using a TLS-encrypted connection.
Amazon Redshift supports fine-grained data security with column-level controls. Column-level security can be applied to local tables, views, and materialized views. Applying column-level security allows you to restrict access to personally identifiable information (PII) or payment card information (PCI) to selected people. For instance, you can grant the finance or human resources team access to sensitive information but restrict access to the sales and marketing team.
To complete this recipe, you will need the following:
In this recipe, we will use a customer table. Using column-level access control, a sales user will be restricted from accessing the phone number column:
CREATE TABLE public.customer
(
C_CUSTKEY BIGINT NOT NULL,
C_NAME VARCHAR(25),
C_NATIONKEY BIGINT,
C_PHONE VARCHAR(15),
C_ACCTBAL DECIMAL(18,4),
C_MKTSEGMENT VARCHAR(10),
C_COMMENT VARCHAR(117)
);
Insert into public.customer values
(1, 'customer-0001', 1, '123-123-1234', 111.11, 'MACHINERY', 'FIRST ORDER'),
(2, 'customer-0002', 2, '122-122-1234', 222.11, 'HOUSEHOLD', 'SECOND ORDER');
CREATE user sales with password 'Sales1234';
GRANT SELECT (C_CUSTKEY, C_NAME, C_NATIONKEY, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT) ON public.customer TO sales;
SET SESSION AUTHORIZATION 'sales';
SELECT CURRENT_USER;
SELECT * FROM public.customer;
--output
ERROR: 42501: permission denied for relation customer
SET SESSION AUTHORIZATION 'sales';
SELECT CURRENT_USER;
SELECT C_CUSTKEY, C_NAME, C_NATIONKEY, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT FROM public.customer;
Here is the output of the preceding code:
Using the GRANT and REVOKE statements, you can enable or disable column-level access control to Amazon Redshift users or groups on tables, views, or materialized views. You can learn about the GRANT and REVOKE syntax for fine-grained access control at https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html and https://docs.aws.amazon.com/redshift/latest/dg/r_REVOKE.html.
Amazon S3 allows to have your data (for example, your source data files) to be encrypted using server-side encryption with Amazon S3-managed keys (SSE-S3) or AWS KMS-managed keys (SSE-KMS). In addition, you can perform client-side encryption using a client-side symmetric master key. Amazon Redshift supports loading the encrypted data into the local table. Similarly, you can unload Amazon Redshift data to Amazon S3 as encrypted files using a customer-managed symmetric master key.
To complete this recipe, you will need the following setup:
aws s3 cp s3://packt-redshift-cookbook/customer/ s3://[Your-Amazon_S3_Bucket]/Chapter6/customer/
In this recipe, we will COPY encrypted data from Amazon S3 and also load as encrypted files:
Note
AWS KMS allows you to manage the encryption key. You can create, store, rotate, and control access to them.
CREATE TABLE public.customer
(
C_CUSTKEY BIGINT NOT NULL,
C_NAME VARCHAR(25),
C_ADDRESS VARCHAR(40),
C_NATIONKEY BIGINT,
C_PHONE VARCHAR(15),
C_ACCTBAL DECIMAL(18,4),
C_MKTSEGMENT VARCHAR(10),
C_COMMENT VARCHAR(117)
)
diststyle ALL;
COPY customer from 's3:// s3://[Your-Amazon_S3_Bucket]/Chapter6/customer/' iam_role 'arn:aws:iam::[Your-AWS_Account_Id]:role/[Your-Redshift_Role]' CSV gzip COMPUPDATE PRESET;
Note
Observe in the COPY command that Amazon Redshift is automatically able to identify that the file is encrypted and communicates with KMS automatically to retrieve the correct master key. This KMS key is used to decrypt the data key and is used by the COPY command for loading.
unload ('select * from customer') TO 's3:// [Your-Amazon_S3_Bucket]/Chapter6/customer_encrypted/' iam_role 'arn:aws:iam::[Your-AWS_Account_Id]:role/[Your-Redshift_Role]' master_symmetric_key 'EXAMPLEMASTERKEYtkbjk/OpCwtYSx/M4/t7DMCDIK722' encrypted;
Note
Similar to the UNLOAD command, you can also copy the data that was encrypted using a master key. Please see https://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html.
The preceding command unloads the customer table to a set of encrypted files using the specified master symmetric key.
A superuser allows you to get all the access on Amazon Redshift, independent of all permission checks, and is used for administrative tasks. For example, you can create other users, execute diagnostic queries on system tables, and take action as needed. Superuser access has to be granted sparingly; do not use this for day-to-day work.
The masteruser is a special type of superuser that you set up when launching the cluster.
To complete this recipe, you will need the following setup:
In this recipe, we will illustrate how to create a superuser and use it to list all the active SQL statements, and demonstrate how to terminate a particular statement:
create user myadmin createuser password '[masteruser_password]';
If you have forgotten the masteruser credentials, you can navigate to the Amazon Redshift AWS Console and click on your cluster-id (Amazon Redshift → Clusters → YOUR_CLUSTER) and click on the Actions dropdown and click on Change masteruser password to reset it to a new value.
SELECT pid,
TRIM(user_name),
starttime,
duration,
SUBSTRING(query,1,50) AS stmt
FROM stv_recents
WHERE status = 'Running';
Here is the expected sample output:
Pid btrim starttime duration stmt
18764 user_a 2021-03-28 18:39:49.355918 3000 select part_id, seller_id
18790 user_b 2021-03-28 18:39:49.355918 60 Insert into parts(
The query from user_a is taking up over 3,000 seconds to execute and is likely to consume resources (that can be confirmed using the AWS Console), so we assume you would like to terminate this query.
set query_group to 'superuser';
cancel 18764;
Using the optional query_group to 'superuser' allows access to the special superuser queue and has the query execute immediately. Please also refer to https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-queue-assignment-rules.html.
Users and groups are the building blocks for access management of the objects in the Amazon Redshift cluster. Users get authenticated into the Amazon Redshift cluster and privileges for objects can be managed at the group level for managing access in a scalable manner. Users can be members of one of multiple groups and inherit the access privileges granted to the groups. Users can also be individually granted privileges.
To complete this recipe, you will need the following setup:
In this recipe, we will illustrate how to create users and groups for the schema set up in Chapter 2, Data Management. There are two groups – finance_grp and audit_grp – that will be created and users will be added to those groups:
create user financeuser1 with password '[financeuser_password]' createdb connection limit 30;
create user audituser1 with password '[audituser_password]'syslog unrestricted;
The audituser1 user is provided syslog unrestricted access that allows visibility to system tables to list queries and transactions performed by other users, which is restricted by default.
create schema if not exists finance;
create group finance_grp with user financeuser1;
create group audit_grp with user audituser1;
GRANT USAGE on SCHEMA finance TO GROUP finance_grp, GROUP audit_grp;
GRANT ALL ON schema finance to GROUP finance_grp;
ALTER DEFAULT PRIVILEGES IN SCHEMA finance GRANT ALL
ON tables
TO group finance_grp;
GRANT SELECT ON ALL TABLES IN SCHEMA finance TO GROUP audit_grp;
ALTER DEFAULT PRIVILEGES IN SCHEMA finance GRANT SELECT
ON tables
TO group audit_grp;
SELECT
pg_group.groname
g,pg_group.grosysid
,pg_user.*
FROM pg_group, pg_user
WHERE pg_user.usesysid = ANY(pg_group.grolist)
ORDER BY 1,2
;
Here is the expected sample output:
groname,grosysid,usename,usesysid,usecreatedb,
usesuper,usecatupd,passwd,valuntil,useconfig
finance_grp 106 financeuser1 127 false ******* audit_grp 107
audituser1127 false ********
Hence, in the preceding setup, the users in finance_grp are able to perform all the DDL/DML (SELECT/INSERT/UPDATE/DELETE) operations, the audit_grp users are able to perform only the SELECT operations to isolate the access control managed through the individual groups. You can learn more about the GRANT access options at https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html.
Amazon Redshift allows easy integration of multiple Identity Providers (IdPs) such as Microsoft Azure Active Directory, Active Directory Federation Services (ADFS), Okta, Ping Identity, AWS SSO, and any SAML v2. You can manage the authentication and authorization of the users and objects using the IdPs without the need to maintain local database users. This provides seamless extension of your corporate policies to Amazon Redshift and a convenient way to govern them centrally. For example, users just use their corporate credentials to get into Amazon Redshift. In addition, Amazon Redshift also supports multi-factor authentication using the federation to provide additional security when authenticating.
To complete this recipe, you will need the following setup:
In this recipe, we will integrate the Okta idP with Amazon Redshift:
Add user bob west to the group dwgroup:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"redshift:CreateClusterUser",
"redshift:JoinGroup",
"redshift:GetClusterCredentials",
"redshift:DescribeClusters"
],
"Resource": [
"arn:aws:redshift:[Your-AWS_Region]:[Your-AWS_Account_Id]:cluster:*",
"arn:aws:redshift:[Your-AWS_Region]:[Your-AWS_Account_Id]:dbuser:[cluster]/*",
"arn:aws:redshift: [Your-AWS_Region]:[Your-AWS_Account_Id]:dbgroup:[cluster]/*"
}]
}
Note
In the preceding policy statement, the permissions allow connection to any Amazon Redshift cluster, dbuser, and dbgroups. Ideally, you can create different IAM policies to make them restrictive to the specific cluster/groups and users that you want to allow access to.
Click Save/Next.
jdbc:redshift:iam://[your-redshift-cluster-connection-string]?plugin_name=com.amazon.redshift.plugin.OktaCredentialsProvider&idp_host=[okta-hostname]&preferred_role=[role-arn]&user=[okta-user]&password=[okta-user-password]&app_id=[okta-redhshift-app-id]
For [okta-redhshift-app-id] and [okta-hostname], refer to the URL for the application in your web browser:
https://[okta-hostname]-admin.okta.com/admin/app/amazon_aws_redshift/instance/[okta-redhshift-app-id]
The following diagram shows how Amazon Redshift is able to authenticate the user through the IdP:
Here is the workflow for the federation with the IdP once integrated with Amazon Redshift:
Amazon Redshift allows you to programmatically generate temporary database user credentials that can be used for automated scripts connect to the cluster. Using the get-cluster-credentials command in the AWS Command Line Interface (AWS CLI) and the GetClusterCredentials in the API, you can generate the temporary credentials that can then be used in the JDBC and ODBC options.
To complete this recipe, you will need the following setup:
In this recipe, we will generate temporary credentials to connect to the Amazon Redshift cluster:
aws help
aws redshift get-cluster-credentials --cluster-identifier [Your-Redshift_Cluster] --db-user temp_creds_user --db-name [Your-Redshift_DB] --duration-seconds 3600
The result of the preceding command will produce an output like the following:
{
"DbUser": "IAM:temp_creds_user",
"Expiration": "2020-12-08T21:12:53Z",
"DbPassword": "EXAMPLEjArE3hcnQj8zt4XQj9Xtma8oxYEM8OyxpDHwXVPyJYBDm/gqX2Eeaq6P3DgTzgPg=="
}
Note
The credentials generated using the preceding command are temporary and will expire in 3,600 seconds.
Amazon Redshift allows you to log connection and user activities by using the audit logs. Audit logs are published into Amazon S3 asynchronously and provide a mechanism to allow you to monitor the requests to the cluster, which can be used to implement security requirements as well as for troubleshooting. For example, let's say on a particular day in the past, you want to find the user who might have truncated a particular table. The audit logs can query to uncover this information.
To complete this recipe, you will need the following setup:
In this recipe, we will illustrate how to turn on the audit logging in Amazon S3 (which is turned off by default) and easily query it:
The previous option turns on the connection logging that will start capturing the connection information such as client host IP, username, and so on, as detailed in https://docs.aws.amazon.com/redshift/latest/mgmt/db-auditing.html#db-auditing-logs. Logs will be delivered asynchronously, organized into hourly S3 prefix locations.
The log files are organized as <AWS Account #>/redshift/<Region>/<Year>/<Month>/<Day>/<Hour>.
create external schema audit_logs
from data catalog
database 'audit_logs_db'
iam_role 'arn:aws:iam::[Your-AWS_Account_Id]:role/[Your-Redshift_Role]'
create external database if not exists;
SELECT col0 AS event,
col1 AS recordtime,
col2 AS remotehost,
col3 AS remoteport,
col4 AS pid,
col5 AS dbname,
col6 AS username,
col7 AS authmethod,
col8 AS duration,
col9 AS sslversion,
col10 AS sslcipher,
col11 AS mtu,
col12 AS sslcompression,
col13 AS sslexpansion,
col14 AS iamauthguid,
col15 AS application_name,
col16 AS driver_version,
col17 AS os_version,
col18 AS plugin_name
FROM audit_logs.auditawslogs
WHERE partition_5 = 25
AND partition_4 = 12
AND partition_3 = 2020 LIMIT 10;
Here is the output of the preceding code:
event,recordtime,remotehost,remoteport,pid,dbname,
username,authmethod,duration,sslversion,sslcipher,
mtu,sslcompression,sslexpansion,iamauthguid,
application_name,driver_version,os_version,plugin_name
authenticated Fri, 25 Dec 2020 09:02:04:228
[local] 49050 dev rdsdb Ident 0 0
initiating session Fri, 25 Dec 2020 09:02:04:228[local] 49050 dev rdsdb Ident 0 0
disconnecting session Fri, 25 Dec 2020 09:02:04:346 [local] 49050 dev rdsdb Ident 118856 0
authenticated Fri, 25 Dec 2020 09:02:40:156 [local] 49238 dev rdsdb Ident 0 0
As observed in the preceding output, all the session activity is logged as part of the audit trail and can be easily queried using a SQL query.
Audit logs are also available in system log tables, STL_USERLOG and STL_CONNECTION_LOG, but retention is limited in the system tables.
For longer retention and convenient sharing of the audit information, Amazon Redshift logs can be enabled that asynchronously send the logs into Amazon S3. The user activity log can be enabled by setting the enable_user_activity_logging parameter to true in the database parameter group in addition to the connection logs.
Monitoring the cluster performance metrics allows you to ensure the cluster is operating healthily. Amazon Redshift publishes metrics such as CPU, disk utilization, query workloads, and so on continuously. These metrics can be automatically monitored for anomalies to trigger notification events. Amazon Redshift publishes the cluster performance metrics to AWS CloudWatch as well, which allows you to monitor all your AWS services in a centralized location.
To complete this recipe, you will need the following setup:
In this recipe, we will illustrate how to watch the cluster and query monitoring metrics and also set up a health check alarm:
Query monitoring also provides the ability to get insights into the overall workload in the cluster using the Database performance tab and also break down the time query spends into queue versus execution using the Workload concurrency tab.
a. Alarm name: Any meaningful name for the health alarm
b. Notification: Enabled
c. Notify SNS topic: Select AmazonRedshiftHealthNotification
Click on Create alarm to complete the setup for the health check alarm.
The health check alarm is a binary value where 1 indicates a healthy cluster node, while 0 indicates an unhealthy node. The health check alarm is monitoring for any value that is less than 1 for 10 consecutive times for a duration of 5 minutes to notify through the SNS topic. Similarly, other performance metrics can be configured and notified when the thresholds are breached.
18.223.159.195