6.1 Threat Classifications and Infrastructure

Earlier, we introduced some broad attack classes, along with important concepts such as attack vectors and surfaces. In this chapter, we must cover a little bit more about infrastructure and how systems work along with some general threat classifications to scaffold onto those previously mentioned. In other words, we will dig a bit deeper here, and we will cover a few more concepts before we start looking at the forest and trees.

It would be impossible to cover all the threats and countermeasures for them, but we can explain various threat types and offer some examples. Threats fall into four main categories: (1) interruption of business or service by making systems unavailable, such as through attacks against a server’s operating system using malware or ransomware, or issuing a distributed denial of service attack; (2) masquerading or impersonating a legitimate or authentic user, for example by stealing login credentials; (3) interception of communications using a man-in-the-middle attack; and (4) modification, where data might be changed in transit or on a storage device (Figure 6.1).

The architecture of some attacks shown with a diagram. 1. Attack: Interruption. The attribute it affects is availability. 2. Attack: Masquerade. The attribute it affects is authenticity. 3. Attack: Interception. The attribute it affects is privacy. 4. Attack: Modification. The attribute it affects is integrity.

FIGURE 6-1 Attack Architecture

While a specific attack will have a specific response protocol, the general ­classifications of threats have general responses. For service interruption attacks, we would use high-availability systems and architecture, such as failover and hot standby systems with filtering routers, among other countermeasures. Attacks against interception usually involve cryptography, virtual private networks (VPNs), and the like. For masquerade and modification types of attacks, we would want to use cryptography, multifactor authentication, scanners and IDS, and techniques to control access to information or other resources using role-based access controls (RBACs). RBACs use devices, software, rules, and access control lists to permit or deny users and determine, once a user has been authenticated, what that user can do on the controlled device, such as read, write, and execute (see Figure 6.2). Of course, these are general measures. The general threat classifications apply to most all infrastructure; however, the specific attack will depend on what device, application, or communications is targeted. To perform risk analysis, we need to inventory our infrastructure to make sure we cover all the exposures, attack surfaces, and vulnerabilities. Let’s take a closer look at infrastructure to consider what we might need to inventory for our analyses.

An illustration explaining access control lists. The system needs to know who a user is and what he or she wants. A user may need to access a network, code, or a data file. An access control list maintains a table of information that lists who the users are what kind of permissions they have. For instance, an administrator my have read, write, and execute permissions. User A may have only read access. User B may have read and write access.

FIGURE 6-2 Access Control Lists

6.1.1 Internet of Things (IoT)

Shortly, we will cover some of the technologies that make up computing infrastructure as it relates to information and cybersecurity, both on premise and in the cloud. First, it is important to realize that all the computing devices we connect over networks blur the on-premise and cloud definitions. Thus, before we dive into this, let’s discuss in more detail some of the broader configuration concepts we have introduced before we dig into the key components. Recall that the Internet began as a government-funded project in the in the 1970s and emerged in the 1980s as the Advanced Research Projects Agency Network, or ARPANET. This invention was a packet-switching network built upon the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite plus routing protocols, and its first implementation merely sent electronic messages among researchers. By the 1990s, Tim Berners-Lee had created the HTTP protocol and the first browser, which enabled the web.

Beyond clients and servers that we typically think about in terms of computing infrastructure, the capabilities of smart devices, high-speed fiber optic networks, 5G devices, and the Internet of today was unimaginable even just few decades ago. Computing infrastructure also includes GPS, passive and active radio frequency identification (RFID), Bluetooth, Wi-Fi, streaming media devices, gaming and augmented reality applications, virtual reality systems, instant messaging, videoconferencing, social media and their various forms (such as Facebook®, Twitter®, Snapchat®, LinkedIn®, Pinterest®, and Instagram®), low energy beacons and sensors, software agents, traffic and security cameras, Alexa®, autonomous vehicles, and so on. From this, we can see that technological infrastructure covers much more than just servers and clients, and we see can that it includes all the applications and services that utilize the Internet or other networks to communicate or perform their functions. Collectively we refer to this as Internet of Things, or IoT.

Security requirements differ in both form and substance, depending on the systems, networks, infrastructure, or other IoT that are under consideration, but broadly, there are six areas to consider: (1) cybersecurity combined with both the capabilities and limitations of the various devices (e.g., memory capacity of mobile phones versus gaming systems); (2) reliability and availability of IoT devices, platforms, and applications (e.g., that they are designed to be available on demand through techniques such as high-availability technologies); (3) standardization and/or interoperability among the devices (consider Android versus iOS, for example); (4) privacy, for example, laptop and mobile phone camera access and usage, or personal information collection and redistribution by IoT providers; (5) compliance with policies (e.g., policy on bring your own devices), procedures (e.g., asset tagging), regulations (e.g., what data can be stored and in what form); and (6) laws (e.g., intellectual property considerations).

6.1.2 Cloud Computing

An alternative to using your own data centers, colocations, or standby sites is to utilize outsourced computing facilities and operations. In the early 1990s, application service providers (ASPs) pioneered this, but due to low trust in transferring mission critical systems and data to a third party and frequent quality of service failures along with other problems, they quickly lost favor and faded from view for the most part, for a while. Now, a variation on that theme has arisen, called cloud computing, or simply, the cloud. The cloud was enabled by components known as hypervisors (or virtual machine monitors), which manage virtual infrastructure and virtual instances for a cloud such as creating and shutting down virtual compute and storage instances. An example of a hypervisor is Hyper-V, used by the Microsoft Azure cloud (and others).

A virtual machine is a self-contained system within a system or container. Two common virtual machines are VMWare® and VirtualBox®, but there are many others. These virtual systems allow the creation of environments with different operating systems to be installed on a particular host computing system; for example, a Linux system can be created as a self-contained OS on top of a Windows OS without having to create special partitioning and boot manager to a special OS login, as was previously required. By creating a virtual server and a virtual client, any client, regardless of native OS, can present a user interface and self-contained applications to a user, regardless of the underlying OS. Moreover, these virtual OS machines can be cloned such that if one is corrupted or the configuration changed, the administrator may simply delete and overlay a new one.

One advantage cloud computing has over the old ASP approach is that companies can continue to develop and “operate” their own applications, while others maintain the infrastructure. Also, many cloud providers, including Amazon Web Services (AWS), have a “shared security model” where the division of security labor is split between the two entities—your company and the cloud provider. Another consideration is that many fixed costs in your own data center become variable costs that are sometimes hard to predict and sometimes lead to the temptation to “cut corners.”

Cloud computing is an evolving term because the technologies used for cloud computing are still evolving. The National Institute of Standards and Technology (NIST)1 defines cloud computing as: A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (for example, networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service-provider interaction. The cloud model promotes availability and is composed of five essential characteristics:2 (1) On-demand services, meaning utilization of the systems and networks is given as-needed, (2) ubiquitous access in terms of both platform and geography, (3) resource sharing, which allows for more economical delivery of services, (4) elasticity, which means that the computing and communications can scale to the demand, and (5) predictability, meaning that because resources can scale, there is greater predictability in service delivery than when pre-provisioned.

The basic architecture of cloud computing has been based on three fundamental approaches to computing resource utilization, or what are called service models.1 They are (1) Software as a Service (SaaS), which enables clients to “rent” infrastructure to run an application; (2) Platform as a Service (PaaS), which allows clients to deploy their own applications on computing facilities “rented” from a provider; and (3) Infrastructure as a Service (IaaS), which allows clients to process, store data, provision networks, and deploy and run systems and applications. What distinguishes IaaS from PaaS is that the client has control over the operating systems, storage, deployed applications, and some limited control over networking components such as host firewalls.3 In other words, it is a hybrid of sorts between SaaS and PaaS.4 While the essential “as a service” models were initially defined, they are expanding into all sorts of “As-a-Services” such as Containers as a Service (CaaS), which virtualizes groupings of code and/or infrastructure in such a way that it is portable across platforms and provides standard ways to upload, configure, start, stop, scale, and manage the container environment. Docker® is a popular container, and Kubernetes® is a popular container orchestration system for managing containerized applications (Figure 6.3).

Cloud computing depicted as third-party infrastructure. Cloud computing includes routers and switches, storage, peripherals, servers, workstations, software services, phones, printers, scanners, and copiers, and projectors that are provided by third party companies.

FIGURE 6-3 Cloud as Shared Third-Party Infrastructure

Although using these facilities may improve resource utilization and better support quality of service (QoS), they come with security and privacy concerns, especially in multitenant configurations.5 According to a recent Forrester Research survey, 51% of small businesses’ participants said that security and privacy concerns were their top reasons for not using cloud services. The issue of privacy revolved around ways that a provider might utilize company information. Providers such as Google, for instance, have explicitly stated in their policies that they collect data for consumer profiling and marketing purposes. Next, while providers purport to have higher availability because of resource sharing, many technology managers still worry that providers will not live up to their promises—as was the case with many providers of the old ASP approach. Finally, some technology managers have expressed concerns about data loss or leakage. Several high-profile breaches have fueled those concerns. It is important to understand these issues and trade-offs before we develop solutions to them.

6.1.3 Servers and Host Computers

When people think about information systems, most often server and host computers come to mind. An information system (IS), however, extends to all organizational information infrastructure including networks, hardcopies of documents, computer server rooms, and even physical storage vaults. Servers and host computers form the bulk of the computing platforms in most organizations, along with the networks that connect them and the software applications and databases that are used. Along with facilitating businesses in effectively operating in a global marketplace, managers must simultaneously consider the practicality of securing systems and information to prevent breaches or damage. Where many technology managers may see security as an overhead cost of doing business, security is rather like the concept of quality—that is, while these measures may not directly impact the financial bottom line (except negatively as a financial outlay), having poor security, just like poor quality, will indirectly if not directly affect the bottom line. For example, consider poor quality on brand reputation or a major security breach of a credit bureau on customer trust. However, it’s even more severe than that, because there may also be lawsuits and penalties in cybersecurity breach cases.

Information and communications systems can be attacked from anywhere in the world nowadays, and this sometimes leaves technology managers with little or no recourse, unless there has been good planning followed by good actions. While information and communications technologies can be used unethically and for dubious purposes, the technologies themselves are value neutral, meaning that it is up to people who apply the technologies whether to use them for good or for bad purposes. Before managers can understand how to protect their information infrastructure, they must understand important aspects of how all that works, at least at a high level. That means that before we can really appreciate each of the technological components of security concern, we need to lay some foundations.

6.1.3.1 Operating Systems

Early on in this text, we noted that some experience with operating systems (OSs) such as Windows, Linux, or macOS, or preferably all three, is needed to fully appreciate this material, and cybersecurity in particular. While at the implementation level, Unix, Linux, macOS, and Windows OSs are different, they perform the same functions. We will cover some of these functions so that when we get to threats and vulnerabilities to them, as well as the technical issues and mitigations, they will make sense. For instance, it will be important to know how malware may cause stack or buffer overflows and what can be done about that and other such threats.

Overall, the OS can be thought of as programming code and sets of data structures. The programming code is written in low-level languages, such as C/C++, and assembly language. Most OSs are event-driven, meaning that when a user types on a keyboard or clicks on something with the mouse, when a storage location in memory becomes full, or when a program in a run queue is ready to be executed by the CPU, a signal is generated by the software (or firmware) that needs to interrupt the CPU to switch functions. Because many users can use a computer at the same time, and many programs are running at the same time on a computer, the orchestration of these events and the execution of programs are handled by a set of subsystems. First, when we talk about an OS, we are usually talking about the kernel, or core functions, but many support systems use and/or are used by the kernel, such as device drivers, shells, and graphical user interfaces. The following are some of the major functions OSs perform.

The memory manager is responsible for handling the data structures and managing the available RAM. Data structures are little more than data or variables containing values that are stored at specific memory addresses. The memory manager must ensure that proper data are stored in proper locations, and that there are no overlaps or that data segments that have been unreferenced (deallocated) are returned to reassignment for reuse by other processes. The memory manager also attempts to load all the processes and threads and their data structures into RAM memory because all processes and their data structures have to be resident in memory for the CPU to execute them. For processes, there are two classifications of data structures: user data structures (specifically the user structure) and kernel data structures (for example, the proc structure). User data structures contain information a process uses, such as indexes and descriptors for open files and their permissions. User data structures contain information a process uses, such as indexes and descriptors for open files and their permissions. The kernel data structures are those things the OS needs in order to retrieve and execute the process.

Process management is handled by several kernel subsystems, including the scheduler, which takes runnable processes in the run queue and switches them into the CPU for execution for a period of time called a time slice, or quantum. A program process is divided into four main logical segments in memory: (1) the program text segment, (2) the data segment, (3) heap space, and (4) the stack segment. The user structure contains system information about the state of the process such as system calls pending and files open and being accessed. The text segment contains the code portion of a program. The data segment has two subgroupings: a segment for uninitialized data such as global and static variables that are uninitialized, called the BSS, for block segment storage, and a segment for initialized data. The heap is for dynamic allocation of uninitialized data. The stack is used for automatic variables and for function parameters. These segments are decomposed from this logical grouping into virtual address groupings managed by the kernel’s virtual memory management subsystem (Figure 6.4).

A diagram showing the memory segmentation and data structures for a process. A stack is shown with the following components from bottom to top. Text or code. Data. Heap. Stack. User structure.

FIGURE 6-4 Memory Segmentation and Data Structures for a Process

Many processes are shared text, meaning they each use the same text region simultaneously with other processes. The text may simply be thought of as the programming code in executable form, along with “immediate” data it needs to prepare to execute. Examples of shared text programs are the shell (or command line editor), graphical user interfaces, and compilers. When a command line editor is invoked, for example, its text region is the same as the other editors being executed on the system, so programs may logically contain a text segment, although physically, they reside in separate address spaces. In other words, processes may share (reside in) the same set of virtual addresses but not the same physical addresses. Processes and their data structures are suspended (idle) in various queues such as the run queue while they wait for an execution time slice by the CPU. The scheduler scans the run queue for runnable processes by priority and switches them for the CPU to execute. If RAM memory becomes full while processes are in this wait state, the user data structures (with the exception of the text data structure) can be moved to disk to the swap partition, while the kernel data structures (and text structure) for the process remain resident in memory. This is because the kernel data structures that remain in memory are needed by the pager/swapper (or dispatcher in Windows) to locate and retrieve the user data structures from the swap partition when the scheduler calls for that process to be executed by the CPU.

Virtual Memory A memory addressing technique to map data in RAM to locations on persistent storage (e.g., on disk, or in solid state storage) in order to expand data and processes beyond the limits of the RAM.
Data Structures Groups of associated data, for example, linked lists of pointers to memory locations that connect all the data associated with a file, which are scattered across physical locations on a disk drive.
Kernel The core functions of an OS. The kernel is used by and uses other subsystems such as graphical user interfaces, compilers, device drivers, and other software and firmware that are not part of the kernel but are required to make a system fully functional.

File system management allocates and deallocates storage space on persistent storage such as a disk drive. The manner in which data are stored on a particular device is called the file structure, which has both a physical and a logical structure. The physical structure of the file determines how bits that represent data are arranged on storage surfaces such as disk drives. This is largely managed by the hardware (and firmware) controller for a device, and the logical structure determines how data are maintained, accessed, and presented to a user, largely managed by a software device driver.

Input/output (I/O) processing are all the facilities that pass data bits between devices, including networking and support for the network protocol stack. For the most part, I/O means sending grouped bits of data from one process and device to another—but the grouped sizes depend on the hardware used. This set of subsystems also handles interprocess communications (IPC), where one executable may share or pass data such as parameters to another executable. There are many data structures associated with I/O. An example is the open file table that contains file descriptors—which consist of information needed to access an underlying file object. The file system and I/O subsystem have components partially contained in the user data structures and partially maintained by the kernel, and the data that are passed between processes must be stored while waiting to be processed.

The Windows OS has an additional Achilles heel we should mention before we begin to cover how various exploits occur: the Windows Registry. The Windows Registry is a configuration database that maintains information about applications, users, and hardware on a system. The Registry uses registry keys, called HKEY, to denote an entry in the configuration database. There are HKEY entries for each component, listed under a root node such as HKEY_LOCAL_MACHINE, which is the root of configuration components for a local system. The configuration manager is the part of the Windows executive that is responsible for maintaining the Registry. Software ranging from device drivers to user applications uses the Registry for locating executables and coordinating interprocess communications, and for keeping track of information such as user preferences. The Registry provides a convenient, centralized store of configuration information, but it is a major source of Windows host computer threats.

6.1.3.2 Database Systems

Software applications, for the most part, lead to the creation and/or consumption of data. If we need to store the data to be used in or across applications, the most suitable form is a database system, or database management systems (DBMS) designed to manage simultaneous access to shared data. Most modern database systems today are relational, although there are others such as hierarchical, object-oriented, NoSQL, semantic, and graph databases. When data are stored into a database, we call this persistence.

A relational database system (RDB) through a relational database management system organizes data into tables that contain keys, so the indices remain coupled to the data that are related across tables. Tables in RDBs are connected via primary and foreign keys. These form the linkages that preserve data relationships. When a set of data related to a transaction, such as retrieval of invoice payment information, is to be reconstructed from a database query, it requires a procedure called a join wherein criteria from all the related tables are gathered, and a result set or a view is produced and returned to the software application that made the request.

Data warehouses and data lakes are similar in some respects to RDBs, but they are used more for analytical processes rather than transactional ones, and so their data are structured (or rather unstructured), persisted, and retrieved differently from relational database systems. There are also NoSQL databases that are useful for unstructured data, images, and documents, as well as graph databases that store what are called edges, properties, tags or labels, and nodes among relations, and semantic databases that store what are called triples, consisting of subject-predicate-object relationships. Regardless, all of these use some form of structured query language. For example, SPARQL is used for semantic databases to store and retrieve subject-predicate-object triple terms. We will look at SQL Injection attacks later, along with other database attacks such Windows Open Database Connectivity (ODBC) connection stealing and connection pool corruption. ODBC is an open standard Application Programming Interface (API) for accessing a database, and a connection pool is a cache of connections to a database that multiple processes can reuse to improve access performance.

6.1.4 Networking

In the beginning of this text, as with OSs, we indicated that some knowledge of TCP/IP networking is necessary to fully grasp the concepts we are discussing here, and even more so to understand the threats and countermeasures for network security. With that in mind, let’s present some networking concepts and issues. Maintaining the security of computer systems is one thing, but as they usually are, when they are connected to a network, the complexity (and security risks) escalates at exponential rates. The International Standards Organization (ISO) formed a committee in 1977 to develop a network specification known as the Open System Interconnection (OSI). The first draft of the standard was finalized in 1984, but it was never fully implemented, and thus the Transmission Control Protocol/Internet Protocol (TCP/IP) suite has supplanted the OSI. However, the OSI serves as a reference model for protocol functions.

The ISO/OSI model presents seven layers to describe the data flows through a network protocol stack. At the top of the model is the Application Layer; these are protocols for user programs such as email user agents (for example, Outlook). At the bottom of the model is where the Physical Layer is represented; it consists of the network media, for instance, copper or fiber cable, which makes the actual connection between computers, or the airwaves, in the case of Wi-Fi or cellular networks. The main philosophy behind the layered architecture was something akin to “divide and conquer.” Moving data from one computer to another computer is a very complex problem, and by breaking this huge task into smaller functions, we can look at each task more closely and can come up with a rather well-defined solution.

The layered approach also allows standardization of interfaces because the tasks are narrowly defined such that when an application sends information from one computer to another, the data travel down through the protocol stack on the sending computer, across the network, and then up through the protocol stack on the receiving computer. At the sending computer, header information is attached as the data are constructed, encapsulated, and passed down the stack. The header contains information such as the address of the sending and receiving computers, encryption method, and other information that can be used by the receiving computer to correctly identify where the data came from and how to unpack, sort, and interpret the message.

As noted, the ISO/OSI specification has remained primarily a reference model rather than a network implementation, and so TCP/IP has been adopted as the de facto standard used in networks ranging from Local Area Networks (LANs) to the Internet. TCP/IP evolved because the need and technology outpaced the development of the OSI standard specifications. While the OSI and TCP/IP models share many features, there are also a number of differences. For instance, TCP/IP combines a number of OSI top-level layers into one layer.

6.1.4.1 Internet Specifications

Although the Internet is sometimes viewed as an amorphous cloud with no overall control, there have been many attempts to provide some governance of this important resource, but key entities remain largely a loose confederation. Still, one influential organization is the Internet Architecture Board (IAB). The board was first established in 1983 to ensure that important technology advances were promoted and the Internet standards were widely available. An IAB subgroup that deals with the research promotion is called the Internet Research Task Force (IRTF), and the Internet Engineering Steering Group (IESG) is a subgroup within the Internet Engineering Task Force (IETF), which reviews and selects Internet standards. People who want to submit ideas for Internet standards can do so by sending a proposal to the steering group. These submissions are called the Internet drafts. The steering group meets and reviews the technical and other merits of the drafts periodically. Most of the proposals do not make it into the official standards and fall into the “do not publish” category. The proposals that are adopted by the steering group are officially recognized and are assigned an RFC (Request for Comments) designation. There are many Internet standards, but the majority of RFCs are either updates or revisions to existing standards.

As indicated earlier, both the OSI reference model and the TCP/IP protocols use a layering (or stack) approach. The top Application Layer is the one that people usually relate to most because the main function of the Application Layer is to provide a user interface for user interaction. Applications such as email, databases, file transfers (e.g., FTP), and browsers are examples of programs that use the Application Layer. The next layer is the Presentation Layer, which is a critical point for network data manipulation, such as compression and application layer encryption processes. We can think about this layer as the packaging layer. When two computers communicate with each other, they need to establish some kind of connection, or what is called a session. Examples of the Session Layer protocol are NetBIOS developed by IBM, which was an attempt to provide primitive network capabilities to stand-alone computers, and remote procedure calls (RPCs) that enable a client computer to invoke a program on a server computer. The Transport Layer has to do with getting and delivering data for a session. Roughly speaking, there are two types of data transport: TCP provides a reliable connection, and User Datagram Protocol (UDP) is a best effort delivery system. As an illustration, an application such as a chat program needs to have a “reliable connection” so that the communications between one chat client and another can alternate in seamless way.

The Network Layer deals with routing of data through a network. Routing involves sending data packets to a destination address through the best available path. It is at the Transport and the Network Layers where the issue of multiplexing is addressed. Multiplexing is the ability to send multiple types of data or signals over a single line or connection. This increases the efficiency of data communication. The Data Link Layer involves how the physical connections between systems are identified and the information associated with them communicated over a physical medium—wire, fiber, or airwaves (Figure 6.5).

The layers in a network protocol stack. The bottom most layer is the physical layer through which signals are transmitted. The second layer is the Data Link Layer. The protocols in this layer are Ethernet and P P P. The third layer is the Network Layer. The protocols in this layer are I C M P, I P, and I P Sec. The three layer described above are equipped with link to link security. The fourth layer is the Transport Layer. The protocols in this layer are T C P, U D P, S S L and T L S. The fifth layer is the Session Layer. The protocols in this layer are Net BIOS, Telnet, and F T P. The sixth layer is the Presentation Layer. The protocols in this layer are MIME, X D R, and S S H. The seventh and the last layer is the Application Layer. The protocols in this layer are F T P and S S H. The fourth to the seventh layers are equipped with end to end security.

FIGURE 6-5 Network Protocol Stack

6.1.4.2 Internetworking

The Internet consists of a vast set of interconnections using computers called routers. Routers typically use TCP/IP for transporting various data packets (also called datagrams) among systems. It also encompasses network applications such as Telnet, FTP (File Transfer Protocol), DHCP (Dynamic Host Configuration Protocol), and DNS (Domain Name Service). These are all examples of programs and protocols included in, or that support, the TCP/IP protocol suite. The web is simply a way that the Internet is used. Specifically, it consists of a set of technologies that logically reside on top of other networking layers among the Internet protocols using HTTP (Hypertext Transfer Protocol), for example. While these protocols may be most familiar, note that there are other network protocols used, such as Asynchronous Transfer Mode (ATM), Fiber Distributed Data Interconnect (FDDI), X.25, and Frame Relay.

Network protocols are defined by standards in Request for Comments, or RFCs, as ­previously mentioned. For example, the RFC 822 defines the message format for email. RFC specifications define the ways the various protocols work such that each has ­well-defined interfaces and clearly articulated behavior to enable global interconnections. Where TCP/IP defines a suite of protocols for communicating over networks, Ethernet is one medium that governs the physical aspects of how computers are connected and how they send data over a wire or through the airwaves. Since its development in the 1970s, Ethernet technologies have evolved over the years and provide the basis for most high-performing networking systems, especially for local area networks.

A major competitor to Ethernet was the Asynchronous Transmission Mode, or ATM. ATM had strong appeal because it was a circuit-switched and cell-based technology that can prioritize and guarantee delivery of different types of transmission, for example, data versus voice versus video. Because video in particular is sensitive to network delays (latency), this led to some initial deployments in major media organizations. Also, ATM initially provided a higher-speed bandwidth for all traffic types compared to Ethernet. In many government and telecommunications organizations, ATM continues to be used in certain segments because of its circuit-based, connection-oriented dependable delivery and strong security capabilities; however, interest in ATM has waned in most commercial organizations.

6.1.4.3 Distributed Systems

Today, most applications are distributed. Distributed systems are applications that run on multiple computers connected via networks to complete a business function or transaction. A commonly recognized distributed system is a website for a large commercial enterprise such as Amazon.com. Suppose, for example, that I wanted to purchase a text online, and I accessed a website called www.mybooks.com to make the purchase. It is unlikely that the text I want will be sitting in some warehouse that the company maintains. Instead, the company may act as an electronic storefront for publishers. The mybooks.com systems would communicate with various publisher systems to place orders on demand. It is also not likely that a single computer is handling my transaction. It is more likely that mybooks.com has myriad computers that process customer requests.

One of the more common distributed systems designs is called an n-tier configuration. This means that multiple server computers are connected horizontally and vertically. This is enabled by the use of a software design pattern called model-view-controller (MVC) or model-2 (Figure 6.6). Using our website analogy, a horizontal connection among servers would be a collection of web servers that handle the vast number of users who want to connect to the mybooks.com website. These “front-end” servers would handle the display of the information and the user interaction. The front-end servers would then be connected via a network to a middle tier of systems that process the business logic, such as placing orders with the book suppliers or performing calculations such as shipping costs. The middle tier servers connect via APIs and networks to trading partners, and/or to database servers where, perhaps, catalog and customer information is stored.

An illustration of n-tier architecture using M V C design pattern. The tiers in an n-tier architecture are the databases which are hosted on data servers, the applications hosted on application servers and web servers that are connected in a network. Clients on the Internet can connect to the web servers. Remote Access Servers can connect to the web servers for administration needs.

FIGURE 6-6 n-Tier Architecture using MVC Design Pattern

The term n-tier means that the company can have many servers at each of these three horizontal layers. As the company business grows, mybooks.com can simply add servers where they are needed; for example, if many new users who want to purchase books or music come online, a server could be added to the front-end tier. If many new suppliers were added to the business, another server could be added to the middle tier, and as our data stores grow, we could add a server to the database tier. This separation of layers or tiers helps with both extensibility and security.

Some of the technologies that are used in distributed systems are tightly coupled; by that we mean that they are written in technologies that require programs to coordinate through standard interfaces and share some components. An example of this would be an Object Request Broker (ORB) of which there are three major ones: (1) those built on the Object Management Group’s (OMG) Common Object Request Broker Architecture—or CORBA; (2) the Distributed Component Object Model (DCOM) from Microsoft; and (3) the Remote Method Invocation (RMI) from Sun Microsystems/Oracle. With these distributed technologies, programmers write interfaces, which consist of data definitions and types that distributed programs expect to send and receive, along with the names of functions that distributed programs can call. Programmers must also generate connector components often called stubs and skeletons to be installed on clients and servers, which tell the distributed programs how to locate and communicate with each other.

An alternative to this “tightly coupled” approach is what is often called Service Oriented Architecture (SOA). A conventional description of SOA is that it is an ability to loosely couple information for disparate systems and provide services through proxy structures. To try to ground this abstract idea, let’s utilize some examples beginning with a question: What if a computer system needed to transact business with some other computer system in the network cloud with no prior knowledge of the other system or prearrangement for how to communicate? For example, suppose we offered insurance brokerage services to automobile owners and wanted to get them the best price for coverage. We would need to get quotes for them from all the available independent insurance underwriters to compare. As you might imagine, we need to first find their web services, then we need to determine how to interact with them, and then lastly determine how to secure the transmission.

One of the ways to address these issues is to use the eXtensible Markup Language (XML) along with the Web Services Description Language (WSDL), which is an XML format for describing web services as a set of endpoints for messages. Thus, WSDL is basically a markup language that describes the network protocols for the services and ways to exchange messages. Specifically, it defines how client applications locate and communicate with a service provider. Trading partners or company lines of businesses that operate using different data formats can exchange WSDL directly, even though sometimes it makes more sense to use a registry. A registry is a mechanism that allows for advertising services to client applications. Client processes can simply look up the services from the registry and determine at runtime how to find and connect (bind) and exchange information with the services. The Simple Object Access Protocol (SOAP) is a messaging protocol used for exchanging structured information using message queues. A more popular approach has become what are known as RESTful applications. Representational state transfer (REST) is a set of software and application program interfaces (APIs) that provide interoperability between computer systems and mobile devices. REST allows requesting systems to access and manipulate web applications in a stateless manner.

6.1.5 Programming Languages and Resource Files

High-level languages are usually divided between interpreted and compiled languages. Compiled languages include COBOL, Pascal, “C,” and C++. Interpreted languages include Python and JavaScript (and variations such as Angular.js), and there are hybrid languages such Java and Visual Basic.NET. All of these are considered “high-level” languages because their program instructions are in the human-constructed form of source code. Source code must be transformed into an executable that the computer OS can understand. For compiled languages, the compiler does this transformation, along with its linker/loader to pull in libraries of other code functions or objects. This is translated yet again by an assembler at the lowest layer of programming logic, which consists of groupings of 0s and 1s called machine language. This becomes the executable. The executable machine code can be loaded into memory and run when called upon by the CPU on behalf of another program or as initiated by a user.

As you might imagine then, there are differences between compiled and interpreted programs in how this transformation is done. Interpreted code is parsed line-by-line and executed at the same time by an “interpreter” program. Scripts, such as JavaScript and all its variations, are interpreted by a browser, for example. Hybrids like Java use both a compiler and interpreter. The main goal here is that the compiler transforms the source code into intermediate (byte) code, which is then interpreted as it runs. This construction allows for language portability across different OSs, and it executes faster than purely interpreted code. Resource files include JavaScript Object Notation (JSON) and Yet Another Markup Language (YAML). JSON is a text file with a defined syntax that is used for defining, storing, and transporting data among applications and automation tools. YAML is also a text file that has a defined syntax and is commonly used for defining configurations for applications and automation tools and is a superset of JSON. There are many others, but the common attribute is that they allow people to define and declare resources in human-readable text files that can be operated upon by some other application.

6.1.6 RDF and Ontology Markup

The W3C standards body embarked on a revolutionary way to reorganize information in the web and coined the “Semantic Web.” Semantics is a name given to a group of technologies that evolved from XML to provide enriched and better-contextualized information, thus enhancing the human ability to make meaning out of the information. As such, a type of markup called Resource Description Framework (RDF) was developed to provide more relational intelligence (semantics) in web systems. In other words, RDF is based on hierarchical XML, but it is an attempt to make better use of metadata (data about data) by extending the markup to form relationships among documents. It is, if you will, a relational form of XML of sorts.

For example, using the technology of search engines, if we were going to write a research paper, we might first do a search using keywords on the topic. A typical search engine would sift through metadata looking for keywords or combinations of keywords, cross linkages, and other cues that might help match, and then we would receive back from the search engine lists of links, many of which might not be relevant. RDF, on the other hand, establishes internally defined relationships among documents by embedded Uniform Resource Identifiers (URIs), and the relationships among documents are expressed in triples: subject, predicate, and object. With RDF statements, I might make assertions in three separate but interconnected or linked RDF documents such that:

  1. Michael is a university professor.

    1. A university professor conducts research and teaches.

    2. A university professor provides academic services to the community.

  2. Michael researches and teaches information security for managers.

    1. Teaching occurs both on campus and online.

    2. Teaching includes cybersecurity training, laboratories, and simulations.

      1. Training is the process of inculcating important knowledge.

      2. Laboratories are activities to practice with learned applications.

      3. Simulations are types of online games that have a learning objective.

  3. Michael has an office in the Harrington Building.

    1. The Harrington Building is on the campus of Texas A&M University.

    2. Texas A&M University is located in College Station, Texas.

Beyond the relational aspects of RDF markup, ontologies have evolved to organize bodies of related information and provide semantic rules with context, as illustrated in the differences between these two sentences: Wave to the crowd versus Let’s catch the next wave. While ontologies and ontology markup are beyond the scope of this text because they require an extensive understanding of programming concepts, we will briefly mention them here because they are important for technology managers to consider in terms of security as this technology evolves. An ontology in the context of our text is a controlled vocabulary in a specific knowledge domain that provides structure for representing concepts, properties, and allowable values for the concepts. As indicated, ontologies are created using a markup language, and documents are linked together with URIs. URIs resemble URLs in that they are used by browsers to find webpages, but they differ in some subtle ways—in particular, URIs extend beyond webpages to include other media.

Because ontology markup builds on RDF, ontology markup languages include all the RDF characteristics. Just as with RDF, the predicate portion of the ontology definition is a property type for a resource, such as an attribute, or relationship, or characteristic, and the object is the value of the resource property type for the specific subject. However, while RDF enables URI linkages, these are based on a relational structure, but the Ontology Web Language (OWL) and the DARPA Agent Markup Language with the Ontology Inference Layer (DAML+OIL) use RDF to form a more object-oriented markup used in organizing related bodies of information. Ontology markup therefore establishes rules and enables inheritance features in which the constructs can form superclass–subclass relationships. For example, a disjoint relationship could be expressed such that A is a B, C is a B, but A is not C; such as, a dog is an animal, a cat is an animal, but a dog is not a cat. In the DAML+OIL markup that follows, we could assert an expression that a class Female is a subclass of Animal, but a Female is not a Male even though Male is also a subclass of Animal:

D A M L and O I L markup code that asserts in an expression that a class Female is a subclass of Animal, but a Female is not a Male even though Male is also a subclass of Animal. Program code follows. Line 1: Left single angle bracket, d a m l, colon, Class, r d f colon, I D, equals, left double quotation mark, Female, right double quotation mark, right single angle bracket. Line 2, indented once: Left single angle bracket, r d f s, colon, sub Class Of, r d f, colon, resource, equals, left double quotation mark, hash, Animal, right double quotation mark, forward slash, right single angle bracket. Line 3, indented once: Left single angle bracket, d a m l, colon, disjoint With, r d f, colon, resource, equals, left double quotation mark, hash, Male, right double quotation mark, forward slash, right single angle bracket. Line 4: Left single angle bracket, forward slash, d a m l, colon, Class, right single angle bracket.

6.1.7 Active Semantic Systems

Without some way to gather and utilize information stored in ontologies, these would be little more than passive data warehouses, or more specifically data marts. However, although they consist of largely undifferentiated and nonnormalized snapshots of data, at least with data warehouses, statistical programs can mine patterns from them in a relatively efficient manner for making future predictions. Ontologies would be less efficient because they consist of text documents that need to be parsed before mining could take place, or before an online analytical process (OLAP) could produce meaningful multidimensional views of the related data.

The good news is that there are technologies to deal with this problem that have recently emerged, and some are on the cusp. For more intelligent systems, we need a more active type of query device, for example, a bot or crawler that can traverse URIs and make inferences about what it “learns.” The most advanced of these are called goal-directed agents. Agents such as Aglets from IBM and those developed from the open source Cougaar framework range widely in terms of their capabilities.

Simple utility agents (or bots) have little more capability than a web search engine. Chatbots are slightly more sophisticated. However, a goal-directed agent can collect information and perform evaluations using machine learning. They are able to make some minor inferences, determine deviations from a current state and an end-state (goal), and make requests of systems that the agent operates upon. Consequently, with the advancement of semantic technologies, there is the classic trade-off between functionality and security. To mitigate, agents generally work in a sandbox or a self-contained area—such as a given company’s ontology—and employ a variety of security techniques such as authentication. Machine learning and artificial intelligence are making use of agents and agent frameworks, which we will discuss later.

6.1.8 Agent Frameworks and Semantic Fusion

Agent frameworks are part of a class of systems that perform “semantic fusion,” which provides a different way to advertise and discover services than has existed in systems to date. Semantic fusion and agent frameworks allow users to specify parameters and write scripting to surf through the vast set of URI linkages for relevant information based on specific contexts within an ontology because these are usually vast expanses of information reservoirs. Using semantic persistence engines, ontologies can even be stored as subject-predicate-object in semantic databases or in a triple-store, as previously mentioned. This architecture allows machine learning, reasoning, and other analytics to be performed rapidly, even on the fly.

In contemporary systems, information is typically drawn out of an environment and stored away in a data warehouse, where it is later examined for patterns by using various analytics, but much of the important information may have changed in the dynamic environment since the time the data were extrapolated into the closed system. This closed-system static model of pattern discovery is inherently limited.6 Moreover, with data warehousing analytics, the user must provide the problem context. By way of using the web as an analogy, the user must “drive” the search for information with the assistance of a technology such as a crawler or bot. This has widely recognized limitations.

Specifically, the web is filled with a sea of electronic texts, videos, and images. When we look for something of interest, unless someone provides us with a Universal Resource Locator (URL) link where we can find the relevant material, we must resort to a search engine that gathers up links for documents that are possibly related to our topic. We then begin a hunt from the search results, sifting through the links looking for those that might match our interests. When we find a page that seems relevant at first and begin reading through the material, we may discover that it is not what we had in mind. This is a larger-scale version of the same problem as when you want to locate a document on your computer and you have to use search tools such as grep/find or search. Wouldn’t it be nice if there were a different document indexing and organization method that would let you find what you are looking for faster and in a more automated way?

With semantic fusion, advertisement, and discovery of ontology, models are done using agents. Agents are similar to web search engine crawlers or bots, but they have greater inferential capabilities. For example, they can evaluate information as they retrieve it. There are many types of agents, depending on the roles they fulfill. Middle agents act as intermediaries or brokers among ontologies, and they support the flow of information across systems by assisting in locating and connecting the information providers with the information requesters, and they assist in the discovery of ontology models and services based upon a given description. A number of different types of middle agents are useful in the development of complex distributed multiagent systems.7 Other types of agents include matchmakers. These do not participate in the agent-to-agent communication process; rather, they match service requests with advertisements of services or information and return matches to the requesters. Thus, matchmaker agents (also called yellow page agents) facilitate service provisioning. There are also blackboard agents that collect requests, and broker agents that coordinate both the provider and consumer processes.

Therefore, intelligent agents have a capability that enables software to “understand” the contents of webpages, and they provide an environment where software agents can roam from page to page and carry out sophisticated tasks on behalf of their users, including drawing inferences and making requests. For example, with this new technology, we might advertise through a website that, We-Provide-Air-Transportation. Agents would be able to meander from airline website-to-website, searching for those semantic relationships and performing tasks such as telling an airline website that “Mike-Wants-to-Make-a-Reservation” and then provide the “Amount-Mike-Will-Pay.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.91.153