Reducing accessibility

You can reduce costs significantly without deleting any data by reducing the relative accessibility of the data. We will cover some ways to do this next:

  • Compression: Compression is your friend. Compressing data leaves you with all the information at the (typically) slight penalty of increased time to access it. Using compression formats such as Avro and Parquet can significantly reduce storage size (and therefore costs) in Hadoop clusters and S3 folders, while often improving performance. The performance improvements require some thoughtful design of the file format, but are a best practice anyway. HDFS supports other compression formats such as GZIP and Snappy as well. This should be the first thing you do to reduce the file size. Even better, plan for it as part of the initial storage design.
  • Changing the storage technology to lower-cost options: Keep the data, but move to lower-cost methods. There is usually a performance penalty, but this can easily be worth it if the data moved is not accessed often. This could be a change from SSD-backed storage to hard disk-backed storage. The final step could be from hard disk to tape.
  • Changing accessibility service levels: This method is more geared toward cloud storage services and is analogous to changing storage technology (although you can do that also in the cloud). For Amazon S3, this could be a change to Standard–Infrequent Access level service or to Amazon Glacier for very infrequently accessed data. S3 allows automated scheduling of when files should be moved into lower service levels based on rules, such as the age of the file.
  • Changing redundancy levels: HDFS keeps multiple copies of files for durability. The standard setting is three copies, but this is configurable. You could change the redundancy level for less valuable files and save some costs. Amazon S3 also has a reduced redundancy option.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.84.155