AWS resources

Many Hadoop technologies can be deployed on AWS as part of a self-managed cluster. But just as Amazon offers support for Elastic MapReduce, which handles Hadoop as a managed service, there are a few other services that are worth mentioning.

HBase on EMR

This isn't really a distinct service per se, but just as EMR has native support for Hive and Pig, it also now offers direct support for HBase clusters. This is a relatively new capability, and it will be interesting to see how well it works in practice; HBase has historically been quite sensitive to the quality of the network and system load.

SimpleDB

Amazon SimpleDB (http://aws.amazon.com/simpledb) is a service offering an HBase-like data model. This isn't actually implemented atop Hadoop, but we'll mention this and the following service as they do provide hosted alternatives worth considering if a HBase-like data model is of interest. The service has been around for several years and is very mature with well understood use cases.

SimpleDB does have some limitations, particularly on table size and the need to manually partition large datasets, but if you have a need for an HBase-type store at smaller volumes, it may be a good fit. It's also easy to set up and can be a nice way of having a go at the column-based data model.

DynamoDB

A more recent service from AWS is DynamoDB, available at http://aws.amazon.com/ dynamodb. Though its data model is again very similar to that of SimpleDB and HBase, it is aimed at a very different type of application. Where SimpleDB has quite a rich search API but is very limited in terms of size, DynamoDB provides a more constrained API but with a service guarantee of near-unlimited scalability.

The DynamoDB pricing model is particularly interesting; instead of paying for a certain number of servers hosting the service, you allocate a certain read/write capacity and DynamoDB manages the resources required to meet this provisioned capacity. This is an interesting development as it is a purer service model, where the mechanism of delivering the desired performance is kept completely opaque to the service user. Look at DynamoDB if you need a much larger scale of data store than SimpleDB can offer, but do consider the pricing model carefully as provisioning too much capacity can become very expensive very quickly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.59