In this chapter you will meet Swift, learning about some of its key features and benefits. This conceptual overview will prepare you for later chapters that go into much greater detail, including Chapter 9, where we cover installing a Swift cluster from source. You will also meet SwiftStack the company, which provides software and support to install and operate Swift. You won’t see much of SwiftStack again until Chapter 10 when we cover installing SwiftStack software.
Swift is a multi-tenant, highly scalable, and durable object storage system
designed to store large amounts of unstructured data at low cost. Swift is used by businesses of all sizes, service providers, and research organizations worldwide. It is typically used to store unstructured data such
as documents, web content, backups, images, and virtual machine snapshots.
Originally developed as the engine behind RackSpace Cloud Files in 2010, it was
open-sourced as part of the OpenStack project. With hundreds of companies and thousands of developers now participating in the OpenStack project, the usage of Swift is increasing rapidly.
Swift is not a traditional filesystem or a raw block device.
Instead, it lets you store, retrieve, and delete objects along with their
associated metadata in containers (“buckets” in Amazon S3 terminology) via
a RESTful HTTP API. Developers can either write directly to the Swift API or
use one of the many client libraries that exist for popular programming
languages, such as Java, Python, Ruby, and C#.
Swift’s key characteristics include:
-
Scalablity
-
Swift is designed to scale linearly based on how much data needs to be stored
and how many users need to be served. This means that it can scale from a few
nodes with a handful of drives to thousands of machines with dozens, even
hundreds, of petabytes of storage. As the system grows in usage and the number
of requests increase, performance doesn’t degrade, in part because Swift is designed to be scalable with no single point of failure. To scale up, the system
grows where needed—by adding storage nodes to increase storage capacity,
adding proxy nodes as requests increase, and growing network capacity where
bottlenecks are detected.
-
Durability
-
Swift’s innovative distributed architecture means that it provides extremely durable storage. The essence of durability is that stored objects will always be available and have data integrity. To ensure an object is persistently available, Swift copies it and distributes the copies across the cluster. Auditing processes run, verifying that data is still good. Replicators run to ensure that the correct number of copies are in the cluster. In the event that a device fails, missing data copies are replicated and placed throughout the cluster to ensure that durability levels are maintained.
-
Multi-regional capability
Swift can distribute data across multiple data centers, which may have
high latency between them. Distribution can be done for a number of reasons. One would be to provide high availability of data by allowing it to be accessed from each region. Another reason would be to designate one region as a disaster recovery site.
Swift does this by allowing operators to define regions and zones within a cluster. Regions generally specify geographic boundaries, such as data centers in different cities. Zones are portions of regions that define points of failure for groups of machines, such as a rack where all the nodes are on one power source going to the same switch. The use of regions and zones ensures that Swift places copies across the cluster in a way that allows for failures. It enables a cluster to survive even if a zone is unavailable. This provides additional guarantees of durability and availability of data.
-
High concurrency
-
Swift is architected to distribute requests across multiple servers. By using a
shared-nothing approach, Swift can take advantage of all the available server
capacity to handle many requests simultaneously. This increases the system’s
concurrency and total throughput available. This is a great advantage to those who need to satisfy the storage needs
of large-scale web workloads.
-
Flexible storage
Swift offers great flexibility in data architecture and hardware, allowing operators to tailor their storage to meet the specific needs of their users. In addition to the ability to mix and match commodity hardware, Swift has storage polices that allow operators to use hardware in a way that best handles the constraints of various situations. For example, need higher performance for some data? Create a storage policy that only uses the SSDs in the cluster. Need data to be available across the globe? Create a storage policy that encompasses data centers across the world. Need data to be in a particular country? Create a policy that will place data only in that region.
Swift’s underlying storage methods are also very flexible. Its pluggable
architecture allows the incorporation of new storage systems. Typically, direct-attached
storage devices are used to build a cluster, but emerging technology (such as
key/value Ethernet drives from Seagate) and other open source and commercial
storage systems that have adaptors can become storage targets in a Swift
cluster.
-
Open source
-
Swift is open-sourced under the Apache 2 license as part of the OpenStack
project. With more than 150 participating developers as of early 2014, the
Swift community is growing every quarter. As with other open source
projects, source code can be reviewed by many more developers than is the case with proprietary software. This means potential bugs tend to be more visible and are more rapidly corrected than with proprietary software. In the long term, “open” generally wins.
-
Large ecosystem
-
The Swift ecosystem is powered by open source code, but unlike some open source projects, it is a large ecosystem with multiple companies that test and develop Swift at scale. Having so many vendors participating greatly reduces the risk of vendor lock-in for users. The large number of organizations and developers participating in the OpenStack project means that the development velocity and breadth of tools, utilities, and services for Swift is great and will only increase over time. Many tools, libraries, clients, and applications already support Swift’s API and many more are in the works. With such a vibrant and engaged ecosystem, it is easy to obtain tools, best practices, and deployment know-how from other organizations
and community members who are using Swift.
-
Runs on commodity hardware
Swift is designed from the ground up to handle failures, so reliability of individual components is less critical. Swift installations
can run robustly on commodity hardware, and even on regular desktop drives rather than more expensive enterprise drives. Companies can choose hardware quality
and configuration to suit the tolerances of the application and
their ability to replace failed equipment.
Swift’s ability to use commodity hardware means there is no lock-in with
any particular hardware vendor. As a result, deployments can continually take
advantage of decreasing hardware prices and increasing drive capacity. It also allows data to be moved from one media to another to address constraints such as IO rate or latency.
-
Developer-friendliness
Developers benefit from the rich and growing body of Swift tools and libraries. Beyond the core functionality to store and serve data durably at large scale, Swift has many built-in features that make it easy for application developers and users. Features that developers might find useful include:
-
Static website hosting
-
Users can host static websites, which support client-side JavaScript
and CSS scripting, directly from Swift. Swift also supports custom error pages and auto-generated listings.
-
Automatically expiring objects
-
Objects can be given an expiration time after
which they are no longer available and will be deleted. This is very useful
for preventing stale data from circulating and to comply with data
retention policies.
-
Time-limited URLs
-
URLs can be generated that are valid for only a limited
period of time. These URLs can prevent hotlinking or enable temporary write
permissions without needing to hand out full credentials to an untrusted
party.
-
Quotas
-
Storage limits can be set on containers and accounts.
-
Upload directly from HTML forms
-
Users can generate web forms that upload data
directly into Swift so that it doesn’t have to be proxied through another
server.
-
Versioned writes
-
Users can write a new version of an object to Swift while keeping all
older versions.
-
Support for chunked transfer encoding
-
Users can upload data to Swift without
knowing ahead of time how large the object is.
-
Multirange reads
-
Users can read one or more sections of an object with a single
read request.
-
Access control lists
-
Users can configure access to their data to give or
deny others the ability to read or write the data.
-
Programmatic access to data locality
-
Deployers can integrate Swift with
systems such as Hadoop and take advantage of locality information to lower
network requirements when processing data.
-
Customizability
-
Middleware can be developed and run directly on the storage
system. For further details on these features, see Part II.
-
Operator-friendly
-
Swift is appealing to IT operators for a number of reasons. It
lets you use low-cost, industry-standard servers and disks. With Swift, you can
manage more data and use cases with ease. Because an API is used to store and serve data, you do not spend time managing volumes for individual projects. Enabling new applications is easy and quick. Finally, Swift’s durable architecture with no single point of failure lets you avoid catastrophic failure and rest a bit easier. The chapters in this book on deploying and operating Swift clusters will provide you with an overview of how easy it really is.
-
Upcoming features
-
The Swift developer community is working on many additional features that will be added to upcoming releases of Swift, such as storage policies and support for erasure coding. Storage policies will allow deployers and users to choose what hardware data is on, how the data is stored across that hardware, and in which region the data resides. The erasure coding support in Swift will enable deployers to store data with erasure coding instead of (or in addition to) Swift’s standard replica model. The design goal is to be able to have erasure-coded storage plus replicas coexisting in a single Swift cluster. This will allow a choice in how to store data and will allow applications to make the right trade-offs based on their use.
SwiftStack is a company that provides highly available and scalable object storage software based on OpenStack Swift and is one of your alternatives to installing, integrating, and operating Swift directly from source.
Several of the core contributors who are part of the approval process for code contributions to the Swift repositories work at SwiftStack. This, in combination with the company’s real-world experience in deploying Swift, allows SwiftStack to contribute heavily upstream and often lead many of the major initiatives for Swift in collaboration with the rest of the Swift developer community.
The SwiftStack software package includes an unmodified, 100% open source version of OpenStack Swift and adds software components for deployment, integration (with authentication and billing systems), monitoring, and management of Swift clusters. SwiftStack also provides training, consulting, and 24×7 support for SwiftStack software, including Swift. The SwiftStack product is composed of two parts:
-
SwiftStack Node
-
This runs OpenStack Swift. SwiftStack Node software automates the installation of the latest, stable version of Swift via a package-based installer. At this writing, there are installers for CentOS/Red Hat Server 6.3, 6.4, or
6.5 (64-bit), or Ubuntu 12.04 LTS Precise Server (64-bit). Additionally, the
SwiftStack Node software provides preconfigured runtime elements, additional integrations, and access methods described below.
-
SwiftStack Controller
-
The SwiftStack Controller is an out-of-band management system that manages one or more Swift clusters and automates the deployment, integration, and ongoing operations of SwiftStack Nodes. As such, the SwiftStack Controller decouples the control and management of the storage nodes from the physical hardware. The actual storage services run on the servers where Swift is installed, while the deployment, management, and monitoring are conducted out-of-band by the SwiftStack Controller.
The SwiftStack product offers several benefits to operators:
-
Automated storage provisioning
-
Devices are automatically identified by agents running on the SwiftStack Node,
and an operator places those nodes in a region or zone. The SwiftStack
Controller keeps track of all the devices and provides a consistent interface
for operators to add and remove capacity.
-
Automated failure management
-
Agents running on the SwiftStack Node detect drive failures and can alert an
operator. Although Swift will automatically route around a failure,
this feature lets operators
deal with failures in a consistent way. Extensive dashboards on
cluster statistics also let operators see how
the cluster is operating.
-
Lifecycle management
-
A SwiftStack Controller can be upgraded separately from Swift. A Controller
can perform a rolling, no-downtime upgrade of Swift as new versions are
released. It can be configured with a warm standby if desired. A
Controller can also fold in old and new hardware in the same cluster. Agents
identify the available capacity and will gradually rebalance data automatically.
-
User and capacity management
-
Several authentication modules are supported by SwiftStack, including on-cluster
accounts. You can create storage groups (which can be provisioned
with an API), and integrate with the Lightweight Directory Acccess Protocol (LDAP) and OpenStack
Keystone. Additionally, these show capacity and trending. A per-account utilization API provides
the ability to do chargeback.
-
Additional access methods
-
SwiftStack includes a web UI for users that can be custom-tailored with
additional CSS to suit your organization. SwiftStack also includes the ability
to provide filesystem access via the Common Internet File System (CIFS)/Network File System (NFS).
Other SwiftStack features include:
-
Built-in load balancer
-
SSL termination for HTTPS services
-
Disk management tools
-
Swift ring building and ring deployment
-
Automated gradual capacity adjustments
-
Health-check and alerting agents
-
Node/drive replacement tools
-
System monitoring and stats collection
-
Capacity monitoring and trending
-
Web client / user portal
Additional information on SwiftStack is available at
the SwiftStack site.
Now that you have some background, in Chapter 3 we’ll start digging more deeply into
Swift’s architecture and how it works.