exanubes
Q&A

S3 Signed URLs #3 S3 Lifecycle Rules: archiving and retrieval

This article covers the usage of S3 Lifecycle Rules to archive bucket objects in Glacier Flexible Retrieval. Then we’re gonna go over retrieving the archived documents using the AWS SDK v3 and notifying the user when the document is ready for download by sending an email with a download link.

As always, you can see the finished code on GitHub and I have also posted this in video format which covers some additional ground related to the UI!

In the meantime, I discovered a new shining object called sst and migrated the repository to it. Because of this, some of the code might have to be adjusted to work with aws cdk.

Adding lifecycle rules

First off, I’m gonna need to add a lifecycle rule so that the documents get archived after ONE_DAY which is the minimum amount of time that aws allows.

const bucket = new Bucket(stack, 'uploads', {
	name: 'exanubes-upload-bucket-sst',
	cors: [
		{
			allowedMethods: [HttpMethods.POST],
			allowedOrigins: ['http://localhost:5173'],
			allowedHeaders: ['*']
		}
	],
	cdk: {
		bucket: {
			blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
			versioned: true,
			lifecycleRules: [
				{
					enabled: true,
					noncurrentVersionTransitions: [
						{
							transitionAfter: Duration.days(1),
							storageClass: StorageClass.GLACIER
						}
					],
					transitions: [
						{
							transitionAfter: Duration.days(30),
							storageClass: StorageClass.GLACIER
						}
					]
				}
			]
		}
	}
});

So to the bucket configuration from previous articles I added a lifecycleRules prop, which takes in an array of different rules. In this case I’m going to transition non-current versions to Glacier after 1 day and current versions after 30 days which is just for posterity as I will not use it in this example.

Keep in mind that you will have to wait until the lifecycle rule is triggered even if you have non-current versions in your bucket that are older than 1 day.

Recognizing archived documents

In the previous article , I’ve added a zod validator for transforming the s3 api’s response . I’m gonna add a new StorageClass prop which will tell me each version’s storage class and then I can transform it into a boolean value for ease of use.

export const versionResponseValidator = z
	.object({
		//...properties
		StorageClass: z.string().optional()
	})
	.transform((arg, ctx) => ({
		//...properties
		isArchived: arg.StorageClass === 'GLACIER'
	}));

Restoring archived documents

Restoring a document is an asynchronous action that’s decoupled from the actual http request that triggers it. The way it works is, first we need to tell aws that we want to restore a document, aws will start a job that’s supposed to get the document from archive and only then will it be made available to us.

export async function restoreObject(props) {
	/**@type {import('@aws-sdk/client-s3').RestoreObjectCommandInput}*/
	const input = {
		Bucket: props.bucket,
		Key: props.key,
		VersionId: props.versionId,
		RestoreRequest: {
			Days: 1,
			GlacierJobParameters: {
				Tier: 'Expedited'
			}
		}
	};

	const command = new RestoreObjectCommand(input);

	return client.send(command);
}

To initiate the job, we need to send a RestoreObjectCommand to aws. Here we need to define the bucket, key and versionId of the document that the user wants to access. Then we need to define the RestoreRequest which takes in the amount of days that the document should be available for and the GlacierJobParameters which defines the method of retrieval.

For ease of use in development I've chosen Expedited which is the fastest method – up to 5 minutes – but also the most expensive one per GB retrieved. There are also Standard and Bulk tiers available – up to 5 and 12 hours restore times respectively for Glacier Flexible Retrieval storage class.

Sending a notification

Now that we have the job started, we need to notify the user when the document is ready for download. For this I’m gonna use the SNS service which is a pub/sub service that can be used for implementing a fan-out pattern meaning spreading out a single event among multiple subscribers, in this case I’m gonna use it to only trigger a single lambda so admittedly this is an overkill. Good practice though.

export async function handler(event: SNSEvent) {
	const [{ Sns }] = event.Records;
	const s3Event: S3Event = JSON.parse(Sns.Message);

	const [{ s3, glacierEventData }] = s3Event.Records;

	const { key, versionId } = s3.object;

	const validUntil = glacierEventData?.restoreEventData.lifecycleRestorationExpiryTime;

	const signedUrl = await generatePresignedUrl({
		bucketName: s3.bucket.name,
		key,
		versionId
	});

	const html = render(<GlacierObjectRestored signedUrl={signedUrl} expiry={validUntil} />);

	return sendEmail(html);
}

First off, I’m parsing the SNS message which actually contains a stringified version of the S3Event object which is the same as the one that’s sent to the lambda when it’s triggered by an s3 event. Then, I’m gonna extract the key and versionId from the s3 property and the validUntil from glacierEventData.

I’m generating a presigned url for the document so that it can be downloaded from the email and rendering an html template with the url and the expiry time. Finally I’m gonna send the email using the sendEmail function that I’ll cover shortly.

Yet again, shining object syndrome kicked in and I used react email to compose the email template

Sending an email with SES

For sending an email we’re gonna need to provide a source and destination emails, subject line and a message body .

async function sendEmail(html: string) {
	const input: SendEmailCommandInput = {
		Source: 'noreply@example.com',
		Destination: {
			ToAddresses: ['john.doe@example.com']
		},
		Message: {
			Body: {
				Html: {
					Charset: 'UTF-8',
					Data: html
				}
			},
			Subject: {
				Charset: 'UTF-8',
				Data: 'Your document is ready!'
			}
		}
	};
	const command = new SendEmailCommand(input);

	return emailClient.send(command);
}
You will need to verify both sender and recipient emails in the SES console before you can send emails if you're in sandbox mode.

Listening for S3 Object Events

With the lambda in place, now I can put it together with the SNS topic and the S3 bucket.

const topic = new Topic(stack, 'objectRestored', {
		subscribers: {
			notificationEmail: {
				type: 'function',
				function: {
					functionName: 'object-restored-notification-email',
					handler: 'packages/functions/src/object-restored-notification-email.handler',
					architecture: 'arm_64',
					permissions: ['ses', 's3']
				}
			}
		}
	});

First I need to create a topic and subscribe the lambda to it and let’s not forget about the IAM permissions for SES and S3 services that the lambda is using.

Once that’s done, I can add a notification configuration to the bucket to trigger the topic when a document on object_restore_completed event.

const bucket = new Bucket(stack, 'uploads', {
	name: 'exanubes-upload-bucket-sst',
	cors: [
		//...
	],
	cdk: {
		//...
	},
	notifications: {
		GlacierObjectRestored: {
			events: ['object_restore_completed'],
			type: 'topic',
			topic
		}
	}
});
Keep in mind that s3 allows only one trigger per event, so in order to trigger multiple lambdas you would add a subscriber to the topic not another notification to the Bucket config.

Summary

So to sum up, we’ve added a lifecycle rule that archives non-current document versions in Glacier Flexible Retrieval storage class after 1 day. Then, by using AWS SDK, we’ve initiated a restore job for the archived document and configured a fan-out pattern using sns and lambda to notify the user once the document was ready.

Ideally we would track the documents being restored so that we can display them correctly in the UI, we could use DynamoDB and utilise the TTL attribute or a regular relational database which is what I covered in the video version of this article.