exanubes
Q&A

S3: AWS file storage solution

Most applications will need somewhere to store files in. For AWS that place is S3, which is a Simple Storage Service and it’s one of the longest running services in AWS. It’s been around for a very long time.

S3 provides us with secure, durable, highly scalable object storage. As name suggests, it’s easy to use and has a very simple web interface to store and retrieve data from.

S3 Basics

  • As mentioned, S3 is a safe object-based file storage. Object-based means that files are stored as objects rather than files and folders as in a file-storage we’re all used to in our Operating Systems.

  • The data stored in S3 can be as small as 0 Bytes or as big as 5TB and it is all spread across multiple devices and facilities.

  • There is no limit to how big an S3 Bucket can get. Buckets are basically folders that hold our files.

  • One important thing to note is that S3 is a universal namespace which means that all bucket names must be globally unique. For example, you won’t be able to name your bucket exanubes becuase it’s already taken by me.

  • Buckets have to have a unique name because they later resolve to a DNS url e.g https://s3-eu-central-1.amazonaws.com/exanubes

  • Suitable for storing files, not for operating systems or programmes

S3 Object Structure

To reiterate, S3 is a object-based file storage. Meaning every file we upload will be stored as an object. These objects consist of:

Key

This is just the name of the object

Value

Data of the uploaded file, whether it’s a text file, video or a picture it will all be stored as a sequence of bytes

Version ID

Unique identifier in case you choose to version your files

Metadata

Simply put it’s data about data. Some are system defined and some can be defined by the user. Example metadata would be Content-Length which describes the object’s size in bytes, Content-Type which would tell us what kind of file it is, this can also be modified by the user. There’s plenty more in aws documentation .

Subresources

There are various options for bucket configuration. It can be configured for website hosting, cors, logging or managing lifecycles of objects in the bucket. You can find many more subresources in aws documentation .

Data Consistency Model

Read after Write consistency for PUTS of new Objects

This means, that when writing new file and then trying to read it right after that, you will be able to access that data. Changes are instantaneous.

Eventual Consistency for overwrite PUTS and DELETES

However, when updating or deleting an existing file and then trying to read it right after, you might still get the older version of the file. For example, you might still be able to read a file even though you have just deleted it. Changes take time to propagate.

Guarantees

  • 99.9% availability is guaranteed by Amazon, however, S3 has been built for 99.99% availability
  • Eleven 9’s (99.999999999%) guarantee for information durability, meaning it’s virtually impossible to lose data/files uploaded to S3

Features

Tiered Storage Available

S3 offers different storage tiers depending on user’s needs. Covered in more detail below

Lifecycle Management

Decide what storage tier an object should be in over the course of its life cycle

Versioning

Version control of files - this way you’ll know if the changes to a file have propagated or maybe you’re still reading the old file. This also enables you to restore a file to a previous version.

Encryption

Self explanatory. Encrypt your files to avoid them getting into the wrong hands and leaking sensitive data.

Secure data using Access Control Lists

Specify who can access data on individual file basis. When a file holds sensitive employee information maybe only the HR department should be able to access it. ACL allows for that kind of granular control over file access.

Secure bucket using Bucket Policies

Works similar to ACL, however, it is bucket-wide. For example, we can deem the entire bucket private and inaccessible by the public, only by internal staff.

Storage Classes

Standard

  • Stored across multiple devices
  • Stored across multiple facilities
  • Designed to withstand the loss of 2 facilities at the same time
  • 99.99% availability
  • Eleven 9’s durability

Infrequently Accessed (IA)

  • For less frequently accessed data but requiring rapid access nonetheless
  • Lower fee than Standard
  • Retrieval fee when accessing files counted per GB retrieved
  • Ideal for long-tem storage, backups and data store for disaster-recovery files

One Zone IA

  • Same as IA but without multiple AZ data resilience
  • Low cost option

Intelligent Tiering

  • Utilises ML to optimize costs automatically
  • Moves data to the most cost-effective access tier
  • No performance impact or operational overhead
  • Stores objects in four access tiers optimised for frequent, infrequent, archive and deep archive access
  • Frequent and infrequent access tiers have the same low latency as Standard and IA
  • Small Monthly monitoring and auto-tiering fee
  • No retrieval fees
  • No additional tiering fees when objects are moved between tiers

Glacier

  • Data archiving solution
  • Secure, durable and low-cost
  • Store unlimited amount of data at costs competitive to or lower than on-premises solutions
  • Configure retrieval times from minutes to hours

Glacier Deep Archive

  • Lowest cost storage class
  • Suitable for data where 12 hour retrieval time is acceptable

Outposts

  • Brings S3 Object storage to on-premises environments
  • Made for workloads with data residency requirements
  • Good for very demanding performance needs by keeping data close to on-premises applications

Fees

  • Storage
  • Requests
  • Storage Management Pricing
  • Data Transfer Pricing
  • Transfer Acceleration
  • Cross Region Replication Pricing

Transfer Acceleration (TA)

Enables fast, easy and secure transfer of files over long distances between users and buckets. TA utilises CloudFront’s globally distributed network of edge locations. As data arrives at an edge location, it is then routed to S3 over an optimized network path.

Transfer Acceleration Diagram

Cross-Region Replication

Very self-explanatory, this means our bucket won’t only exist in the Region of our choosing but will also be replicated to other Regions providing very significant benefits. As mentioned previously in this article, all S3 Buckets share the same namespace which means they have to be globally unique, however, it’s not very prudent to expect our users from the US to download assets from an S3 Bucket in Australia or vice versa.

This model gives us full control over the location of our data without having to juggle multiple buckets. Sometimes there could be regulatory reasons, that require you to hold a copies of your data far away from the original. You can definitely cater to that with S3 cross-region replication.

Cross-Region Replication Diagram

Recap

It’s worth remembering that S3 is an object-based storage. Meaning it’s good for storing files but not operating systems or programmes. Objects are stored within buckets and they all share a global namespace.

Each object has a key, value, version, metadata and subresources.

Data inside S3 Buckets is eventually consistent for updates and deletes and immediately consistent for reads after writes. This means we can read a file immediately after uploading it to S3, but when updating a file we could still get the older version back.

Some of the features of S3 include tiered storage, lifecycle management, versioning, encryption, securing data with ACL and securing buckets with policies.

S3 offers many storage options starting from standard and infrequently accessed, through archiving solutions using Glacier, Intelligent Tiering for cost optimisation using machine learning and even on-premises solutions with Outposts.

Important to keep in mind what we pay for with S3 which would be of course storage, requests, storage management and data transfer. Then we have additional options like Transfer Acceleration which allows us to utilise Amazon’s high speed, low latency network for data transfer. Last but not least, we can also pay for Cross Region Replication in order to create bucket replicas in regions most suitable to our needs.