S3: AWS file storage solution

Most applications will need somewhere to store files in. For AWS that place is S3, which is a Simple Storage Service and it’s one of the longest running services in AWS. It’s been around for a very long time.

S3 provides us with secure, durable, highly scalable object storage. As name suggests, it’s easy to use and has a very simple web interface to store and retrieve data from.

S3 Basics

As mentioned, S3 is a safe object-based file storage. Object-based means that files are stored as objects rather than files and folders as in a file-storage we’re all used to in our Operating Systems.
The data stored in S3 can be as small as 0 Bytes or as big as 5TB and it is all spread across multiple devices and facilities.
There is no limit to how big an S3 Bucket can get. Buckets are basically folders that hold our files.
One important thing to note is that S3 is a universal namespace which means that all bucket names must be globally unique. For example, you won’t be able to name your bucket exanubes becuase it’s already taken by me.
Buckets have to have a unique name because they later resolve to a DNS url e.g https://s3-eu-central-1.amazonaws.com/exanubes
Suitable for storing files, not for operating systems or programmes

S3 Object Structure

To reiterate, S3 is a object-based file storage. Meaning every file we upload will be stored as an object. These objects consist of:

Key

This is just the name of the object

Value

Data of the uploaded file, whether it’s a text file, video or a picture it will all be stored as a sequence of bytes

Version ID

Unique identifier in case you choose to version your files

Metadata

Simply put it’s data about data. Some are system defined and some can be defined by the user. Example metadata would be Content-Length which describes the object’s size in bytes, Content-Type which would tell us what kind of file it is, this can also be modified by the user. There’s plenty more in aws documentation .

Subresources

There are various options for bucket configuration. It can be configured for website hosting, cors, logging or managing lifecycles of objects in the bucket. You can find many more subresources in aws documentation .

Data Consistency Model

Read after Write consistency for PUTS of new Objects

This means, that when writing new file and then trying to read it right after that, you will be able to access that data. Changes are instantaneous.

Eventual Consistency for overwrite PUTS and DELETES

However, when updating or deleting an existing file and then trying to read it right after, you might still get the older version of the file. For example, you might still be able to read a file even though you have just deleted it. Changes take time to propagate.

Guarantees

99.9% availability is guaranteed by Amazon, however, S3 has been built for 99.99% availability
Eleven 9’s (99.999999999%) guarantee for information durability, meaning it’s virtually impossible to lose data/files uploaded to S3

Features

Tiered Storage Available

S3 offers different storage tiers depending on user’s needs. Covered in more detail below

Lifecycle Management

Decide what storage tier an object should be in over the course of its life cycle

Versioning

Version control of files - this way you’ll know if the changes to a file have propagated or maybe you’re still reading the old file. This also enables you to restore a file to a previous version.

Encryption

Self explanatory. Encrypt your files to avoid them getting into the wrong hands and leaking sensitive data.

Secure data using Access Control Lists

Specify who can access data on individual file basis. When a file holds sensitive employee information maybe only the HR department should be able to access it. ACL allows for that kind of granular control over file access.

Secure bucket using Bucket Policies

Works similar to ACL, however, it is bucket-wide. For example, we can deem the entire bucket private and inaccessible by the public, only by internal staff.

Storage Classes

Standard

Stored across multiple devices
Stored across multiple facilities
Designed to withstand the loss of 2 facilities at the same time
99.99% availability
Eleven 9’s durability

Infrequently Accessed (IA)

For less frequently accessed data but requiring rapid access nonetheless
Lower fee than Standard
Retrieval fee when accessing files counted per GB retrieved
Ideal for long-tem storage, backups and data store for disaster-recovery files

One Zone IA

Same as IA but without multiple AZ data resilience
Low cost option

Intelligent Tiering

Utilises ML to optimize costs automatically
Moves data to the most cost-effective access tier
No performance impact or operational overhead
Stores objects in four access tiers optimised for frequent, infrequent, archive and deep archive access
Frequent and infrequent access tiers have the same low latency as Standard and IA
Small Monthly monitoring and auto-tiering fee
No retrieval fees
No additional tiering fees when objects are moved between tiers

Glacier

Data archiving solution
Secure, durable and low-cost
Store unlimited amount of data at costs competitive to or lower than on-premises solutions
Configure retrieval times from minutes to hours

Glacier Deep Archive

Lowest cost storage class
Suitable for data where 12 hour retrieval time is acceptable

Outposts

Brings S3 Object storage to on-premises environments
Made for workloads with data residency requirements
Good for very demanding performance needs by keeping data close to on-premises applications

Fees

Storage
Requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration
Cross Region Replication Pricing

Transfer Acceleration (TA)

Enables fast, easy and secure transfer of files over long distances between users and buckets. TA utilises CloudFront’s globally distributed network of edge locations. As data arrives at an edge location, it is then routed to S3 over an optimized network path.

Cross-Region Replication

Very self-explanatory, this means our bucket won’t only exist in the Region of our choosing but will also be replicated to other Regions providing very significant benefits. As mentioned previously in this article, all S3 Buckets share the same namespace which means they have to be globally unique, however, it’s not very prudent to expect our users from the US to download assets from an S3 Bucket in Australia or vice versa.

This model gives us full control over the location of our data without having to juggle multiple buckets. Sometimes there could be regulatory reasons, that require you to hold a copies of your data far away from the original. You can definitely cater to that with S3 cross-region replication.

Recap

It’s worth remembering that S3 is an object-based storage. Meaning it’s good for storing files but not operating systems or programmes. Objects are stored within buckets and they all share a global namespace.

Each object has a key, value, version, metadata and subresources.

Data inside S3 Buckets is eventually consistent for updates and deletes and immediately consistent for reads after writes. This means we can read a file immediately after uploading it to S3, but when updating a file we could still get the older version back.

Some of the features of S3 include tiered storage, lifecycle management, versioning, encryption, securing data with ACL and securing buckets with policies.

S3 offers many storage options starting from standard and infrequently accessed, through archiving solutions using Glacier, Intelligent Tiering for cost optimisation using machine learning and even on-premises solutions with Outposts.

Important to keep in mind what we pay for with S3 which would be of course storage, requests, storage management and data transfer. Then we have additional options like Transfer Acceleration which allows us to utilise Amazon’s high speed, low latency network for data transfer. Last but not least, we can also pay for Cross Region Replication in order to create bucket replicas in regions most suitable to our needs.