Chapter 2. Meet Swift

Joe Arnold

In this chapter you will meet Swift, learning about some of its key features and benefits. This conceptual overview will prepare you for later chapters that go into much greater detail, including Chapter 9, where we cover installing a Swift cluster from source. You will also meet SwiftStack the company, which provides software and support to install and operate Swift. You won’t see much of SwiftStack again until Chapter 10 when we cover installing SwiftStack software.

Swift is a multi-tenant, highly scalable, and durable object storage system designed to store large amounts of unstructured data at low cost. Swift is used by businesses of all sizes, service providers, and research organizations worldwide. It is typically used to store unstructured data such as documents, web content, backups, images, and virtual machine snapshots. Originally developed as the engine behind RackSpace Cloud Files in 2010, it was open-sourced as part of the OpenStack project. With hundreds of companies and thousands of developers now participating in the OpenStack project, the usage of Swift is increasing rapidly.

Swift is not a traditional filesystem or a raw block device. Instead, it lets you store, retrieve, and delete objects along with their associated metadata in containers (“buckets” in Amazon S3 terminology) via a RESTful HTTP API. Developers can either write directly to the Swift API or use one of the many client libraries that exist for popular programming languages, such as Java, Python, Ruby, and C#.

Swift’s key characteristics include:

Scalablity
Swift is designed to scale linearly based on how much data needs to be stored and how many users need to be served. This means that it can scale from a few nodes with a handful of drives to thousands of machines with dozens, even hundreds, of petabytes of storage. As the system grows in usage and the number of requests increase, performance doesn’t degrade, in part because Swift is designed to be scalable with no single point of failure. To scale up, the system grows where needed—by adding storage nodes to increase storage capacity, adding proxy nodes as requests increase, and growing network capacity where bottlenecks are detected.
Durability
Swift’s innovative distributed architecture means that it provides extremely durable storage. The essence of durability is that stored objects will always be available and have data integrity. To ensure an object is persistently available, Swift copies it and distributes the copies across the cluster. Auditing processes run, verifying that data is still good. Replicators run to ensure that the correct number of copies are in the cluster. In the event that a device fails, missing data copies are replicated and placed throughout the cluster to ensure that durability levels are maintained.
Multi-regional capability

Swift can distribute data across multiple data centers, which may have high latency between them. Distribution can be done for a number of reasons. One would be to provide high availability of data by allowing it to be accessed from each region. Another reason would be to designate one region as a disaster recovery site.

Swift does this by allowing operators to define regions and zones within a cluster. Regions generally specify geographic boundaries, such as data centers in different cities. Zones are portions of regions that define points of failure for groups of machines, such as a rack where all the nodes are on one power source going to the same switch. The use of regions and zones ensures that Swift places copies across the cluster in a way that allows for failures. It enables a cluster to survive even if a zone is unavailable. This provides additional guarantees of durability and availability of data.

High concurrency
Swift is architected to distribute requests across multiple servers. By using a shared-nothing approach, Swift can take advantage of all the available server capacity to handle many requests simultaneously. This increases the system’s concurrency and total throughput available. This is a great advantage to those who need to satisfy the storage needs of large-scale web workloads.
Flexible storage

Swift offers great flexibility in data architecture and hardware, allowing operators to tailor their storage to meet the specific needs of their users. In addition to the ability to mix and match commodity hardware, Swift has storage polices that allow operators to use hardware in a way that best handles the constraints of various situations. For example, need higher performance for some data? Create a storage policy that only uses the SSDs in the cluster. Need data to be available across the globe? Create a storage policy that encompasses data centers across the world. Need data to be in a particular country? Create a policy that will place data only in that region.

Swift’s underlying storage methods are also very flexible. Its pluggable architecture allows the incorporation of new storage systems. Typically, direct-attached storage devices are used to build a cluster, but emerging technology (such as key/value Ethernet drives from Seagate) and other open source and commercial storage systems that have adaptors can become storage targets in a Swift cluster.

Open source
Swift is open-sourced under the Apache 2 license as part of the OpenStack project. With more than 150 participating developers as of early 2014, the Swift community is growing every quarter. As with other open source projects, source code can be reviewed by many more developers than is the case with proprietary software. This means potential bugs tend to be more visible and are more rapidly corrected than with proprietary software. In the long term, “open” generally wins.
Large ecosystem
The Swift ecosystem is powered by open source code, but unlike some open source projects, it is a large ecosystem with multiple companies that test and develop Swift at scale. Having so many vendors participating greatly reduces the risk of vendor lock-in for users. The large number of organizations and developers participating in the OpenStack project means that the development velocity and breadth of tools, utilities, and services for Swift is great and will only increase over time. Many tools, libraries, clients, and applications already support Swift’s API and many more are in the works. With such a vibrant and engaged ecosystem, it is easy to obtain tools, best practices, and deployment know-how from other organizations and community members who are using Swift.
Runs on commodity hardware

Swift is designed from the ground up to handle failures, so reliability of individual components is less critical. Swift installations can run robustly on commodity hardware, and even on regular desktop drives rather than more expensive enterprise drives. Companies can choose hardware quality and configuration to suit the tolerances of the application and their ability to replace failed equipment.

Swift’s ability to use commodity hardware means there is no lock-in with any particular hardware vendor. As a result, deployments can continually take advantage of decreasing hardware prices and increasing drive capacity. It also allows data to be moved from one media to another to address constraints such as IO rate or latency.

Developer-friendliness

Developers benefit from the rich and growing body of Swift tools and libraries. Beyond the core functionality to store and serve data durably at large scale, Swift has many built-in features that make it easy for application developers and users. Features that developers might find useful include:

Static website hosting
Users can host static websites, which support client-side JavaScript and CSS scripting, directly from Swift. Swift also supports custom error pages and auto-generated listings.
Automatically expiring objects
Objects can be given an expiration time after which they are no longer available and will be deleted. This is very useful for preventing stale data from circulating and to comply with data retention policies.
Time-limited URLs
URLs can be generated that are valid for only a limited period of time. These URLs can prevent hotlinking or enable temporary write permissions without needing to hand out full credentials to an untrusted party.
Quotas
Storage limits can be set on containers and accounts.
Upload directly from HTML forms
Users can generate web forms that upload data directly into Swift so that it doesn’t have to be proxied through another server.
Versioned writes
Users can write a new version of an object to Swift while keeping all older versions.
Support for chunked transfer encoding
Users can upload data to Swift without knowing ahead of time how large the object is.
Multirange reads
Users can read one or more sections of an object with a single read request.
Access control lists
Users can configure access to their data to give or deny others the ability to read or write the data.
Programmatic access to data locality
Deployers can integrate Swift with systems such as Hadoop and take advantage of locality information to lower network requirements when processing data.
Customizability
Middleware can be developed and run directly on the storage system. For further details on these features, see Part II.
Operator-friendly
Swift is appealing to IT operators for a number of reasons. It lets you use low-cost, industry-standard servers and disks. With Swift, you can manage more data and use cases with ease. Because an API is used to store and serve data, you do not spend time managing volumes for individual projects. Enabling new applications is easy and quick. Finally, Swift’s durable architecture with no single point of failure lets you avoid catastrophic failure and rest a bit easier. The chapters in this book on deploying and operating Swift clusters will provide you with an overview of how easy it really is.
Upcoming features
The Swift developer community is working on many additional features that will be added to upcoming releases of Swift, such as storage policies and support for erasure coding. Storage policies will allow deployers and users to choose what hardware data is on, how the data is stored across that hardware, and in which region the data resides. The erasure coding support in Swift will enable deployers to store data with erasure coding instead of (or in addition to) Swift’s standard replica model. The design goal is to be able to have erasure-coded storage plus replicas coexisting in a single Swift cluster. This will allow a choice in how to store data and will allow applications to make the right trade-offs based on their use.

Meet SwiftStack

SwiftStack is a company that provides highly available and scalable object storage software based on OpenStack Swift and is one of your alternatives to installing, integrating, and operating Swift directly from source.

Several of the core contributors who are part of the approval process for code contributions to the Swift repositories work at SwiftStack. This, in combination with the company’s real-world experience in deploying Swift, allows SwiftStack to contribute heavily upstream and often lead many of the major initiatives for Swift in collaboration with the rest of the Swift developer community.

The SwiftStack software package includes an unmodified, 100% open source version of OpenStack Swift and adds software components for deployment, integration (with authentication and billing systems), monitoring, and management of Swift clusters. SwiftStack also provides training, consulting, and 24×7 support for SwiftStack software, including Swift. The SwiftStack product is composed of two parts:

SwiftStack Node
This runs OpenStack Swift. SwiftStack Node software automates the installation of the latest, stable version of Swift via a package-based installer. At this writing, there are installers for CentOS/Red Hat Server 6.3, 6.4, or 6.5 (64-bit), or Ubuntu 12.04 LTS Precise Server (64-bit). Additionally, the SwiftStack Node software provides preconfigured runtime elements, additional integrations, and access methods described below.
SwiftStack Controller
The SwiftStack Controller is an out-of-band management system that manages one or more Swift clusters and automates the deployment, integration, and ongoing operations of SwiftStack Nodes. As such, the SwiftStack Controller decouples the control and management of the storage nodes from the physical hardware. The actual storage services run on the servers where Swift is installed, while the deployment, management, and monitoring are conducted out-of-band by the SwiftStack Controller.

The SwiftStack product offers several benefits to operators:

Automated storage provisioning
Devices are automatically identified by agents running on the SwiftStack Node, and an operator places those nodes in a region or zone. The SwiftStack Controller keeps track of all the devices and provides a consistent interface for operators to add and remove capacity.
Automated failure management
Agents running on the SwiftStack Node detect drive failures and can alert an operator. Although Swift will automatically route around a failure, this feature lets operators deal with failures in a consistent way. Extensive dashboards on cluster statistics also let operators see how the cluster is operating.
Lifecycle management
A SwiftStack Controller can be upgraded separately from Swift. A Controller can perform a rolling, no-downtime upgrade of Swift as new versions are released. It can be configured with a warm standby if desired. A Controller can also fold in old and new hardware in the same cluster. Agents identify the available capacity and will gradually rebalance data automatically.
User and capacity management
Several authentication modules are supported by SwiftStack, including on-cluster accounts. You can create storage groups (which can be provisioned with an API), and integrate with the Lightweight Directory Acccess Protocol (LDAP) and OpenStack Keystone. Additionally, these show capacity and trending. A per-account utilization API provides the ability to do chargeback.
Additional access methods
SwiftStack includes a web UI for users that can be custom-tailored with additional CSS to suit your organization. SwiftStack also includes the ability to provide filesystem access via the Common Internet File System (CIFS)/Network File System (NFS).

Other SwiftStack features include:

  • Built-in load balancer
  • SSL termination for HTTPS services
  • Disk management tools
  • Swift ring building and ring deployment
  • Automated gradual capacity adjustments
  • Health-check and alerting agents
  • Node/drive replacement tools
  • System monitoring and stats collection
  • Capacity monitoring and trending
  • Web client / user portal

Additional information on SwiftStack is available at the SwiftStack site.

Now that you have some background, in Chapter 3 we’ll start digging more deeply into Swift’s architecture and how it works.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.154.18