Most applications will need somewhere to store files in. For AWS that place is S3, which is a Simple Storage Service and it’s one of the longest running services in AWS. It’s been around for a very long time.
S3 provides us with secure, durable, highly scalable object storage. As name suggests, it’s easy to use and has a very simple web interface to store and retrieve data from.
S3 Basics
As mentioned, S3 is a safe object-based file storage. Object-based means that files are stored as objects rather than files and folders as in a file-storage we’re all used to in our Operating Systems.
The data stored in S3 can be as small as 0 Bytes or as big as 5TB and it is all spread across multiple devices and facilities.
There is no limit to how big an S3 Bucket can get. Buckets are basically folders that hold our files.
One important thing to note is that S3 is a universal namespace which means that all bucket names must be globally unique. For example, you won’t be able to name your bucket
exanubes
becuase it’s already taken by me.Buckets have to have a unique name because they later resolve to a DNS url e.g
https://s3-eu-central-1.amazonaws.com/exanubes
Suitable for storing files, not for operating systems or programmes
S3 Object Structure
To reiterate, S3 is a object-based file storage. Meaning every file we upload will be stored as an object. These objects consist of:
Key
This is just the name of the object
Value
Data of the uploaded file, whether it’s a text file, video or a picture it will all be stored as a sequence of bytes
Version ID
Unique identifier in case you choose to version your files
Metadata
Simply put it’s data about data. Some are system defined and some can be defined by the user. Example metadata would be Content-Length
which describes the object’s size in bytes, Content-Type
which would tell us what kind of file it is, this can also be modified by the user. There’s plenty more in aws documentation .
Subresources
There are various options for bucket configuration. It can be configured for website hosting, cors, logging or managing lifecycles of objects in the bucket. You can find many more subresources in aws documentation .
Data Consistency Model
Read after Write consistency for PUTS of new Objects
This means, that when writing new file and then trying to read it right after that, you will be able to access that data. Changes are instantaneous.
Eventual Consistency for overwrite PUTS and DELETES
However, when updating or deleting an existing file and then trying to read it right after, you might still get the older version of the file. For example, you might still be able to read a file even though you have just deleted it. Changes take time to propagate.
Guarantees
- 99.9% availability is guaranteed by Amazon, however, S3 has been built for 99.99% availability
- Eleven 9’s (99.999999999%) guarantee for information durability, meaning it’s virtually impossible to lose data/files uploaded to S3
Features
Tiered Storage Available
S3 offers different storage tiers depending on user’s needs. Covered in more detail below
Lifecycle Management
Decide what storage tier an object should be in over the course of its life cycle
Versioning
Version control of files - this way you’ll know if the changes to a file have propagated or maybe you’re still reading the old file. This also enables you to restore a file to a previous version.
Encryption
Self explanatory. Encrypt your files to avoid them getting into the wrong hands and leaking sensitive data.
Secure data using Access Control Lists
Specify who can access data on individual file basis. When a file holds sensitive employee information maybe only the HR department should be able to access it. ACL allows for that kind of granular control over file access.
Secure bucket using Bucket Policies
Works similar to ACL, however, it is bucket-wide. For example, we can deem the entire bucket private and inaccessible by the public, only by internal staff.
Storage Classes
Standard
- Stored across multiple devices
- Stored across multiple facilities
- Designed to withstand the loss of 2 facilities at the same time
- 99.99% availability
- Eleven 9’s durability
Infrequently Accessed (IA)
- For less frequently accessed data but requiring rapid access nonetheless
- Lower fee than Standard
- Retrieval fee when accessing files counted per GB retrieved
- Ideal for long-tem storage, backups and data store for disaster-recovery files
One Zone IA
- Same as IA but without multiple AZ data resilience
- Low cost option
Intelligent Tiering
- Utilises ML to optimize costs automatically
- Moves data to the most cost-effective access tier
- No performance impact or operational overhead
- Stores objects in four access tiers optimised for frequent, infrequent, archive and deep archive access
- Frequent and infrequent access tiers have the same low latency as Standard and IA
- Small Monthly monitoring and auto-tiering fee
- No retrieval fees
- No additional tiering fees when objects are moved between tiers
Glacier
- Data archiving solution
- Secure, durable and low-cost
- Store unlimited amount of data at costs competitive to or lower than on-premises solutions
- Configure retrieval times from minutes to hours
Glacier Deep Archive
- Lowest cost storage class
- Suitable for data where 12 hour retrieval time is acceptable
Outposts
- Brings S3 Object storage to on-premises environments
- Made for workloads with data residency requirements
- Good for very demanding performance needs by keeping data close to on-premises applications
Fees
- Storage
- Requests
- Storage Management Pricing
- Data Transfer Pricing
- Transfer Acceleration
- Cross Region Replication Pricing
Transfer Acceleration (TA)
Enables fast, easy and secure transfer of files over long distances between users and buckets. TA utilises CloudFront’s globally distributed network of edge locations. As data arrives at an edge location, it is then routed to S3 over an optimized network path.
Cross-Region Replication
Very self-explanatory, this means our bucket won’t only exist in the Region of our choosing but will also be replicated to other Regions providing very significant benefits. As mentioned previously in this article, all S3 Buckets share the same namespace which means they have to be globally unique, however, it’s not very prudent to expect our users from the US to download assets from an S3 Bucket in Australia or vice versa.
This model gives us full control over the location of our data without having to juggle multiple buckets. Sometimes there could be regulatory reasons, that require you to hold a copies of your data far away from the original. You can definitely cater to that with S3 cross-region replication.
Recap
It’s worth remembering that S3 is an object-based storage. Meaning it’s good for storing files but not operating systems or programmes. Objects are stored within buckets and they all share a global namespace.
Each object has a key, value, version, metadata and subresources.
Data inside S3 Buckets is eventually consistent for updates and deletes and immediately consistent for reads after writes. This means we can read a file immediately after uploading it to S3, but when updating a file we could still get the older version back.
Some of the features of S3 include tiered storage, lifecycle management, versioning, encryption, securing data with ACL and securing buckets with policies.
S3 offers many storage options starting from standard and infrequently accessed, through archiving solutions using Glacier, Intelligent Tiering for cost optimisation using machine learning and even on-premises solutions with Outposts.
Important to keep in mind what we pay for with S3 which would be of course storage, requests, storage management and data transfer. Then we have additional options like Transfer Acceleration which allows us to utilise Amazon’s high speed, low latency network for data transfer. Last but not least, we can also pay for Cross Region Replication in order to create bucket replicas in regions most suitable to our needs.