CHAPTER 14

Related Technologies

This chapter covers the following topics from Domain 14 of the CSA Guidance:

•   Big Data

•   Internet of Things

•   Mobile

•   Serverless Computing

Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.

—Stewart Brand

In my opinion, this quote from Mr. Brand perfectly summarizes your career in information technology. There are always new developments that you need to understand to some degree. This is not to say that you need to be an expert in every new technology, but you at the very least need to understand the benefits that new technology brings.

This chapter looks at a few key technologies that, while not cloud-specific by any extent, are frequently found in cloud environments. These technologies often rely on the massive amounts of available resources that can quickly (and even automatically) scale up and down to meet demand.

In preparation for your CCSK exam, remember that the mission of the CSA and its Guidance document is to help organizations determine who is responsible for choosing the best practices that should be adopted and implemented (that is, provider side or customer side) and why these controls are important. This chapter focuses on the security concerns associated with these technologies, rather than on how controls are configured in a particular vendor’s implementation.

If you are interested in learning more about one or more of these technologies, check out the Cloud Security Alliance web site for whitepapers and research regarding each of these areas.

Big Data

The term “big data” refers to extremely large data sets from which you can derive valuable information. Big data can handle volumes of data that traditional data-processing tools are simply unable to manage. You can’t go to a store and buy a big data solution, and big data isn’t a single technology. It refers to a set of distributed collection, storage, and data-processing frameworks. According to Gartner, “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization.”

The CSA refers to the qualities from the Gartner quote as the “Three Vs.” Let’s define those now:

•   High Volume A large amount of data in terms of the number of records or attributes

•   High Velocity Fast generation and processing of data (such as real-time or data stream)

•   High Variety Structured, semistructured, or unstructured data

The Three Vs of big data make it very practical for cloud deployments, because of the attributes of elasticity and massive storage capabilities available in Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) deployment models. Additionally, big data technologies can be integrated into cloud-computing applications.

Big data systems typically are typically associated with three common components:

•   Distributed data collection This component refers to the system’s ability to ingest large volumes of data, often as streamed data. Ingested data could range from simple web clickstream analytics to scientific and sensor data. Not all big data relies on distributed or streaming data collection, but it is a core big data technology. See the “Distributed Data Collection Backgrounder” for further information on the data types mentioned.

•   Distributed storage This refers to the system’s ability to store the large data sets in distributed file systems (such as Google File System, Hadoop Distributed File System, and so on) or databases (such as NoSQL). NoSQL (Not only SQL) is a nonrelational distributed and scalable database system that works well in big data scenarios and is often required because of the limitations of nondistributed storage technologies.

•   Distributed processing Tools and techniques are capable of distributing processing jobs (such as MapReduce, Spark, and so on) for the effective analysis of data sets that are so massive and rapidly changing that single-origin processing can’t effectively handle them. See the “Distributed Data Collection Backgrounder” for further information.

Images

EXAM TIP    Remember the three components listed here: data gets collected, stored, and processed.

A few of the terms used in the preceding bulleted list deserve a bit more of an explanation. They are covered in the following backgrounders.

Images

NOTE    As always, information in the backgrounders is for your understanding, not for the CCSK exam.

Distributed Data Collection Backgrounder

Unlike typical distributed data that is often sent in a bulk fashion (such as structured database records from a previous week), streaming data is continuously generated by many data sources, which typically send data records simultaneously and in small sizes (kilobytes). Streaming data can include log files generated by customers using mobile or web applications, information from social networks, and telemetry from connected devices or instrumentation in data centers. Streaming data processing is beneficial in most scenarios where new and dynamic data is generated on a continuous basis.

Web clickstream analytics provide data that is generated when tracking how users interact with your web sites. There are generally two types of web clickstream analytics:

•   Traffic analytics Operates at the server level and delivers performance data such as tracking the number of pages accessed by a user, page load times, how long it takes each page to load, and other interaction data.

•   E-commerce–based analytics Uses web clickstream data to determine the effectiveness of e-commerce functionality. It analyzes the web pages on which shoppers linger, shopping cart analytics, items purchased, what the shopper puts in or takes out of a shopping cart, what items the shopper purchases, coupon codes used, and payment methods.

These two use cases demonstrate the potential of vast amounts of data that can be generated in day-to-day operations of a company and the need for tools that can interpret this data into actionable information the company can use to improve revenues.

Hadoop Backgrounder

Hadoop is fairly synonymous with big data. In fact, it is estimated that roughly half of Fortune 500 companies use Hadoop for big data, so it merits its own backgrounder. Believe it or not, what we now know as big data started off with Google trying to create a system they could use to index the Internet (called Google File System). They released the inner workings of their invention to the world as a whitepaper in 2003. In 2005, Doug Cutting and Mike Cafarella leveraged this knowledge to create the open source big data framework called Hadoop. Hadoop is now maintained by the Apache Software Foundation.

The following quote from the Hadoop project itself best explains what Hadoop is:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Notice that Hadoop allows for distributed processing of data in large data sets. To achieve this, Hadoop runs both storage and processing on a number of separate x86 systems in a cluster. Why did they make it so that data and processing are both running on individual x86 systems? Cost and performance. These x86 systems (such as your laptop or work PC) are cheap in comparison to customized hardware. This decentralized approach means you don’t need super-costly, powerful, high-performance computers to analyze huge amounts of data.

This storage and processing capability is broken down into two major components:

•   Hadoop Distributed File System (HDFS) This is the storage part of Hadoop. When data is stored in an HDFS system, the data is broken down into smaller blocks that are spread out across multiple systems in a cluster. HDFS itself sits on top of the native file system on the operating system you operate (likely Linux, but Windows is supported as well). HDFS allows for multiple data types (structured, unstructured, streaming data) to be used in a cluster. I’m not going to get into the details of the various components (SQOOP for databases and Flume for streaming data) that enable this ingestion to occur, because that’s getting way too deep for this brief explanation of Hadoop (and especially HDFS).

•   MapReduce This is the processing part of Hadoop. MapReduce is a distributed computation algorithm—its name is actually a combination of mapping and reducing. The map part filters and sorts data, while the reduce part performs summary operations. Consider, for example, trying to determine the number of pennies in a jar. You could either count these out by yourself, or you could work with a team of four by dividing up the jar into four sets (map function) and having each person count their own and write down their findings (reduce). The four people working together would be considered a cluster. This is the divide-and-conquer approach used by MapReduce. Now let’s say that you have a 4TB database you want perform analytics on with a cluster of four Hadoop nodes. The 4TB file would be split into four 1TB files and would be processed on the four individual nodes that would deliver the results you asked for.

To consider a real-life big data scenario, consider a very large retailer that needs to read sales data from all cash registers on a real-time basis from 11,000 stores conducting 500 sales an hour, so they can determine and forecast what needs to be ordered from partners and shipped out on a daily basis. That’s a pretty complex computation that needs to be done. This data may very well be streaming data that we discussed earlier. This would be ingested into the Hadoop system, distributed among the nodes in the cluster, and then required orders and deliveries could be processed and sent to the appropriate systems via REST APIs.

These two components were the basis of the original Hadoop framework. Additional components have been added over time to improve functionality, including these:

•   Spark Spark is another data processing function that may replace MapReduce. It allows for more in-memory processing options than MapReduce does.

•   YARN Yet Another Resource Negotiator, as you can guess, performs a resource management function (cluster resource management specifically) in the Hadoop system.

•   Hadoop Common These tools enable the operating system to read data stored in the Hadoop file system.

This completes your 101-level crash course of big data analytics using Hadoop as an example. It’s a field that is big today and is only going to become bigger. One last thing: There are commercial big data offerings out there that you can check out. Like most other new areas, mergers and acquisitions are common. For example, two of the larger big data solution providers, Cloudera and Hortonworks, completed their merger in 2019.

Security and Privacy Considerations

You know that big data is a framework that uses multiple modules across multiple nodes to process high volumes of data with a high velocity and high variety of sources. This makes security and privacy challenging when you’re using a patchwork of different tools and platforms.

This is a great opportunity to discuss how security basics can be applied to technologies with which you may be unfamiliar, such as big data. At its most basic level, you need to authenticate, authorize, and audit (AAA) least-privilege access to all components and modules in the Hadoop environment. This, of course, includes everything from the physical layer all the way up to the modules themselves. For application-level components, your vendor should have their best practices documented (for example, Cloudera’s security document is roughly 500 pages long) and should quickly address any vulnerabilities with patches. Only after these AAA basics are addressed should you consider encryption requirements, both in-transit and at-rest as required.

Data Collection

When data is collected, it will likely go through some form of intermediary storage device before it is stored in the big data analytics system. Data in this device (virtual machine, instance, container, and so on) will also need to be secured, as discussed in the previous section. Intermediary storage could be swap space (held in memory). Your provider should have documentation available for customers to address their own security requirements.

Images

EXAM TIP    For your CCSK exam, remember that all components and workloads required of any technology must have secure AAA in place. This remains true when underlying cloud services are consumed to deliver big data analytics for your organization. An example of a cloud-based big data system could consist of processing nodes running in instances that collect data in volume storage.

Key Management

If encryption at rest is required as part of a big data implementation (everything is risk-based, after all), implementation may be complicated by the distributed nature of nodes. As far as the protection of data at rest, encryption capabilities in a cloud environment will likely be defined by a provider’s ability to expose appropriate controls to secure data, and this includes key management. Key management systems need to be able to support distribution of keys to multiple storage and analysis tools.

Security Capabilities

CSP controls can be used to address your security requirements as far as the services that may be consumed (such as object storage) as part of your big data implementation. If you need your data to be encrypted, see if your cloud provider can do that for you. If you need very granular access control, see if the provider’s service includes it. The details of the security configuration of these services and controls should be included in your security architecture.

Identity and Access Management

As mentioned, authorization and authentication are the most important controls. You must ensure that they are done correctly. In your cloud environment, this means starting with ensuring that every entity that has access to the management plane is restricted based on least-privilege principles. Moving from there, you need to address access to the services that are used as part of your big data architecture. Finally, all application components of the big data system itself need to have appropriate access controls established.

Considering the number of areas where identity and access management (IAM) must be implemented (cloud platform, services, and big data tool level), entitlement matrices can be complicated.

PaaS

Cloud providers may offer big data services as a PaaS. Numerous benefits can be associated with consuming a big data platform instead of building your own. Cloud providers may implement advanced technologies, such as machine learning, as part of their offerings.

You need to have an adequate understanding of potential data exposure, compliance, and privacy implications. Is there a compliance exposure if the PaaS vendor employees can technically access enterprise data? How does the vendor address this insider threat? These are the types of questions that must be addressed before you embrace a big data PaaS service.

Just like everything else covered in this book, risk-based decisions must be made and appropriate security controls implemented to satisfy your organizational requirements.

Internet of Things (IoT)

Internet of Things includes everything in the physical world, ranging from power and water systems to fitness trackers, home assistants, medical devices, and other industrial and retail technologies. Beyond these products, enterprises are adopting IoT for applications such as the following:

•   Supply chain management

•   Physical logistics management

•   Marketing, retail, and customer relationship management

•   Connected healthcare and lifestyle applications for employees and consumers

Depending on the deployment (for example, mass consumer use), I’m sure you can appreciate the amount of streaming data these devices can generate. While the cloud is not a requirement to support all of this incoming data and the subsequent processing required, it is often used to support these IoT devices.

The following cloud-specific IoT security elements are identified in the CSA Guidance:

•   Secure data collection and sanitization This could include, for example, stripping code of sensitive and/or malicious data.

•   Device registration, authentication, and authorization One common issue encountered today is the use of stored credentials to make direct API calls to the backend cloud provider. There are known cases of attackers decompiling applications or device software and then using those credentials for malicious purposes.

•   API security for connections from devices back to the cloud infrastructure In addition to the stored credentials issue just mentioned, the APIs themselves could be decoded and used for attacks on the cloud infrastructure.

•   Encrypted communications Many current devices use weak, outdated, or nonexistent encryption, which places data and the devices at risk.

•   Ability to patch and update devices so they don’t become a point of compromise Currently, it is common for devices to be shipped as-is, and they never receive security updates for operating systems or applications. This has already caused multiple significant and highly publicized security incidents, such as massive botnet attacks based on compromised IoT devices.

Images

NOTE    Check out articles on Mirai and Torii malware for information on how compromised IoT devices can make for very large botnets used in massive DDoS attacks.

Mobile Computing

Mobile computing is, of course, nothing new. Companies don’t require cloud services to support mobile applications, but still, many mobile applications are dependent on cloud services for backend processing. Mobile applications leverage the cloud not only because of its processing power capabilities for highly dynamic workloads but also because of its geographic distribution.

The CSA Guidance identifies the following security issues for mobile computing in a cloud environment:

•   Device registration, authentication, and authorization are issues for mobile applications, as they are for IoT devices, especially when stored credentials are used to connect directly to provider infrastructure and resources via an API. If an attacker can decompile the application and obtain these stored credentials, they will be able to manipulate or attack the cloud infrastructure.

•   Any application APIs that are run within the cloud environment are also listed as a potential source of compromise. If an attacker can run local proxies that intercept these API calls, they may be able to decompile the likely unencrypted information and explore them for security weaknesses. Certificate pinning/validation inside the application may help mitigate this risk.

Images

NOTE    The Open Web Application Security Project (OWASP) defines pinning as “the process of associating a host with their expected X509 certificate or public key. Once a certificate or public key is known or seen for a host, the certificate or public key is associated or ‘pinned’ to the host.”

Serverless Computing

Serverless computing can be considered an environment in which the customer is not responsible for managing the server. In this model, the provider takes care of the servers upon which customers run workloads. The CSA defines serverless computing as “the extensive use of certain PaaS capabilities to such a degree that all or some of an application stack runs in a cloud provider’s environment without any customer-managed operating systems, or even containers.”

When most people think of serverless computing, they (incorrectly) think of running a script on the provider’s platform (see the “Serverless vs. Function as a Service [FaaS]” sidebar). But serverless is much more than that very limited use case. The following serverless computing examples are provided by the CSA:

•   Object storage

•   Cloud load balancers

•   Cloud databases

•   Machine learning

•   Message queues

•   Notification services

•   API gateways

•   Web servers

I have no doubt that if your organization is using the cloud today, you are using serverless computing in some capacity. If your organization is planning on using the cloud, you will likely be using serverless offerings. There’s nothing inherently wrong with this, because these services can be highly orchestrated (aka event-driven) and have deep integration with IAM services supplied by the provider. Just be aware that the more you leverage services supplied by the provider, the more dependent (locked in) your organization becomes, because you would have to re-create the environment in a new environment.

From a security perspective, the CSA Guidance calls out the following issues that you should be aware of before taking your CCSK exam:

•   Serverless places a much higher security burden on the cloud provider. Choosing your provider and understanding security SLAs and capabilities is absolutely critical.

•   Using serverless, the cloud user will not have access to commonly used monitoring and logging levels, such as server or network logs. Applications will need to integrate more logging, and cloud providers should provide necessary logging to meet core security and compliance requirements.

•   Although the provider’s services may be certified or attested for various compliance requirements, not necessarily every service will match every potential regulation. Providers need to keep compliance mappings up-to-date, and customers need to ensure that they use only the services within their compliance scope.

•   There will be high levels of access to the cloud provider’s management plane because that is the only way to integrate and use the serverless capabilities.

•   Serverless can dramatically reduce attack surfaces and pathways, and integrating serverless components may be an excellent way to break links in an attack chain, even if the entire application stack is not serverless.

•   Any vulnerability assessment or other security testing must comply with the provider’s terms of service. Cloud users may no longer have the ability to test applications directly, or they may test with a reduced scope, since the provider’s infrastructure is now hosting everything and can’t distinguish between legitimate tests and attacks.

•   Incident response may also be complicated and will definitely require changes in process and tooling to manage a serverless-based incident.

As you read through this list from the Guidance, did each point make sense to you? Did it feel like déjà vu? It should, because we have discussed every entry previously, just not directly in reference to serverless. Trust but verify your provider by performing due diligence, and remember that in addition to preventative controls, you need detection. Since the services are built and managed by the provider with all serverless offerings, you may need to build logging into applications that are run in a serverless environment.

Chapter Review

This chapter covered some newer technologies that you will undoubtedly encounter as your organization adopts cloud services. Just remember the highlights of the various technologies and the security issues surrounding them and you’ll be prepared for your CCSK exam. Always remember you’re taking an exam on cloud security, not these individual related technologies.

You should be comfortable with the following CSA recommendations for each technology covered in this section in preparation for your CCSK exam.

Big Data Recommendations:

•   Authorization and authentication for all services and application components need to be locked down on a least-privilege basis.

•   Access to the management plane and big data components will be required. Entitlement matrices are required and may be complicated by addressing these various components.

•   Follow vendor recommendations for securing big data components. The CSA has whitepapers available regarding the securing of big data (not required reading for the CCSK exam).

•   Big data services from a provider should be leveraged wherever possible. When using provider services as part of a big data solution, you should understand the advantages and security risks of adopting such services.

•   If encryption of data at rest is required, be sure to address encryption in all locations. Remember that in addition to the primary storage, you must address intermediary and backup storage locations.

•   Do not forget to address both security and privacy requirements.

•   Ensure that the cloud provider doesn’t expose data to employees or administrators by reviewing the provider’s technical and process controls.

•   Providers should clearly publish any compliance standards that their big data solutions meet. Customers need to ensure that they understand their compliance requirements.

•   If security, privacy, or compliance is an issue, customers should consider using some form of data masking or obfuscation.

Internet of Things Recommendations:

•   IoT devices must be able to be patched and updated.

•   Static credentials should never be used on devices. This may lead to compromise of the cloud infrastructure or components.

•   Best practices for device registration and authentication to the cloud should always be followed. Federated identity systems can be used for such purposes.

•   Communications should always be encrypted.

•   Data collected from devices should be sanitized (input validation best practice).

•   Always assume API requests are hostile and build security from that.

•   Changes and advances in the IoT space will continue. Keep up-to-date with recent developments by following the CSA Internet of Things working group.

Mobile Computing Recommendations:

•   When designing mobile applications, follow CSP recommendations regarding authentication and authorization.

•   As with IoT, federated identity can be used to connect mobile applications to cloud-hosted applications and services.

•   Never transfer any keys or credentials in an unencrypted fashion.

•   When testing APIs, assume all connections are hostile and that attackers will have authenticated unencrypted access.

•   Mobile applications should use certificate pinning and validation to mitigate the risk of attackers using proxies to analyze API traffic that may be used to compromise security.

•   Perform input validation on data and monitor all incoming data from a security perspective. Trust no one!

•   Attackers will have access to your application. Ensure that any data stored on the mobile device is secured and properly encrypted. No data that may lead to a compromise of the cloud side (such as credentials) should be stored in the device.

•   Keep up-to-date with the latest industry recommendations regarding mobile security by following the CSA Mobile Security working group.

Serverless Computing Recommendations:

•   Remember that “serverless” simply means the customer doesn’t have to worry about configuring the base server operating system. Customers still need to securely configure any exposed controls offered by the provider.

•   Serverless platforms must meet compliance requirements. Cloud providers should be able to clearly state to customers what certifications have been obtained for every platform.

•   Customers should use only platforms that meet compliance requirements.

•   Serverless computing can be leveraged to enhance the overall security architecture. By injecting a provider service into your architectures (such as a message queuing service), attackers would need to compromise both the customer and provider services, which will likely be a significant hurdle for them, especially if a service removes any direct network connectivity between components or the cloud and the customer data center.

•   Security monitoring will change as a result of serverless, because the provider assumes more responsibility for security and may not expose log data to customers. This may require that more logging be built into applications created for serverless environments.

•   Security assessments and penetration testing of applications leveraging provider platforms will change. Use only assessors and testers who are knowledgeable about the provider’s environment.

•   Incident response will likely change even more dramatically in PaaS platforms than in IaaS. Communication with your provider regarding incident response roles is critical.

•   Always remember that even though the provider is managing the platform and underlying servers (and operating systems), there are likely controls that need to be configured and assessed on a regular basis.

Questions

1.   What is certificate pinning?

A.   Installing a certificate on a mobile device

B.   Storing a certificate in an open certificate registry that can be used for validation

C.   Associating a host with a certificate

D.   All of the above

2.   Where should encryption of data be performed in a big data system?

A.   Primary storage

B.   Intermediary storage

C.   In memory

D.   A and B

3.   What is Spark used for in big data?

A.   Spark is a big data storage file system.

B.   Spark is a machine learning module.

C.   Spark is a big data processing module.

D.   Spark is for storing big data.

4.   Which of the following has led to IoT device security issues in the past?

A.   Embedding of credentials in the device

B.   Lack of encryption

C.   Lack of update mechanisms for IoT devices

D.   All of the above

5.   Why may entitlement matrices be complicated when using them for big data systems?

A.   Multiple components are associated with big data implementations.

B.   Several components do not allow for granular entitlements.

C.   Cloud environment components are being leveraged as part of a big data implementation.

D.   A and C are correct.

6.   What are the common components associated with a big data system?

A.   Distributed collection

B.   Distributed storage

C.   Distributed processing

D.   All of the above

7.   What are the Three Vs of big data according to the CSA?

A.   High velocity, high volume, high variance

B.   High velocity, high volume, high variety

C.   High validation, high volume, high variety

D.   High value, high variance, high velocity

8.   Which of the following is not considered a serverless platform according to the CSA?

A.   Load balancer

B.   DNS server

C.   Notification service

D.   Object storage

9.   When should input validation be performed?

A.   When using the cloud as the backend for mobile applications

B.   When using the cloud as the backend for IoT devices

C.   When using cloud services to support a big data system

D.   All of the above

10.   According to the CSA, what is an/are attribute(s) of the cloud that makes it ideal to support mobile applications?

A.   Cost of running required infrastructure

B.   Distributed geographical nature of cloud

C.   Inherent security associated with cloud services

D.   B and C

Answers

1.   D. Certificate pinning is associating a certificate with a host. This can be useful to prevent attackers from using a proxy to view unencrypted network activity that may be used to identify security weaknesses. None of the other answers are correct.

2.   D. Encryption (if required) of big data must be performed at all storage locations, including primary and intermediary locations.

3.   C. Spark is a processing module for Hadoop that is considered the next generation of MapReduce. Although Hadoop was discussed only as part of a big data backgrounder, it is specifically called out in the core text of this book and CSA Guidance as a big data processing module.

4.   D. All of the answers listed have led to security issues in the past for IoT devices.

5.   D. CSA states that entitlement matrices can be complicated by both the number of components in a big data system as well as the cloud resources that may be leveraged as part of a big data implementation.

6.   D. A big data system consists of distributed collection, distributed storage, and distributed processing.

7.   B. The Three Vs are high volume, high velocity, and high variety. This means a big data system has to process a high volume of data that is coming in at a high rate of speed and that can be in multiple formats (structured, unstructured, and streamed).

8.   B. The DNS server is not a serverless option according to CSA. Hold on, because there’s a learning lesson to be had here. Providers may very well offer a DNS service to customers. That’s not what is written here, though. Take your time when reading questions on your exam to make sure you aren’t tricked by wording. You can absolutely build your own DNS server in an IaaS environment, or you can consume a DNS service if the provider offers one. The other possible answers are listed as serverless platforms.

9.   D. It is a security best practice always to perform input validation on any incoming network traffic. This includes all the technologies listed.

10.   B. The only listed attribute in the CSA Guidance regarding mobile application suitability for the cloud is the geographical nature of cloud. Yes, a cloud environment may be more secure, but this is, of course, a shared responsibility. You are never guaranteed that running in the cloud will be cheaper than running systems in your own data center.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.248.208