exanubes
Q&A

S3 Signed URLs #4 Uploading large files to s3 with multipart upload using signed URLs

This article is part of a series

  1. Upload and download files using signed urls
  2. Versioning documents in amazon s3
  3. S3 Lifecycle Rules: archiving and retrieval
  4. Uploading large files to s3 with multipart upload using signed URLs

In this article we will learn how to upload large files to s3 using multipart upload and signed URLs. The key point for me was to do the multipart upload using signed URLs from the client. The examples I could find all used sent the file from the server to s3. I wanted to do this from the client so that I could upload files directly from the browser to s3.

You can find finished code for this article on Github and I’ve also recorded a video that covers more implementation details.

What is multipart upload?

Multipart upload is a mechanism that allows you to upload large files to s3 in parts. This is useful when you have a very large file that could take a long time to upload. If the upload fails for some reason you don’t have to start from the beginning. You can just retry the missing or failed parts. Once all parts are uploaded you can complete the upload and the file will be available in s3.

Retrying failed parts is not covered in this article.

Initiate a multipart upload

Unlike regular uploads, multipart uploads are a little bit more hands-on as they are a multi-step process. The first step is to tell AWS that we want to do a multipart upload so that AWS provides us with an UploadId.

export async function startMultipartUpload(props) {
	const input = {
		Bucket: props.bucket,
		Key: props.key
	};

	const command = new CreateMultipartUploadCommand(input);

	return client.send(command);
}

All we have to do is provide the bucket and the key of the file we want to upload. The response will contain the UploadId that we will need throughout the process.

Generate Signed URLs

Once we have the UploadId we can generate signed URLs for each part of the file. The signed URLs will be used to upload each part to s3.

export async function getMultipartUrls(props) {
	const input = {
		Bucket: props.bucket,
		Key: props.key,
		UploadId: props.uploadId,
		PartNumber: 0
	};

	const parts = Array.from({ length: props.parts }, (_, index) => index + 1);

	return Promise.all(
		parts.map((partNumber) => {
			const command = new UploadPartCommand({
				...input,
				PartNumber: partNumber
			});
			return getSignedUrl(client, command, { expiresIn: 600 });
		})
	);
}

For this to work we need to provide the same bucket and key as before. We also need to provide the UploadId that we got from the previous step and we need to know how many parts we’re gonna be sending. In this case I opted for creating an array of part numbers and then mapped over them to generate a signed URL for each part. Each link will expire after ten minutes so depending on how long you expect your uploads take, this value might have to be different.

Complete multipart upload

Once all parts are uploaded we need to tell AWS to complete the upload. This is done by providing the UploadId and the list of parts that were uploaded as well as the bucket name and object key.

export async function completeMultipartUpload(props) {
	const input = {
		Bucket: props.bucket,
		Key: props.key,
		UploadId: props.uploadId,
		MultipartUpload: {
			Parts: props.parts
		}
	};

	const command = new CompleteMultipartUploadCommand(input);

	return client.send(command);
}

The Parts property is an array of objects that contain the PartNumber and the ETag of each part. The ETag is a hash of the part that AWS generates.

Abort multipart upload

In case anything goes wrong, you want to make sure that you do not leave the multipart upload hanging as it will incur charges for storing the parts.

export async function abortMultipartUpload(props) {
	const input = {
		Bucket: props.bucket,
		Key: props.key,
		UploadId: props.uploadId
	};

	const command = new AbortMultipartUploadCommand(input);

	return client.send(command);
}

Analogous to previous steps, we just need to provide the UploadId and the bucket and key of the file and AWS will handle the rest.

Uploading parts to s3

Now that we have the signed URLs we can upload each part to s3.

export async function uploadPartsToS3(file, signedUrls) {
	return Promise.all(
		signedUrls.map(async (signedUrl, index) => {
			const start = index * MINIMUM_PART_SIZE;
			const end = (index + 1) * MINIMUM_PART_SIZE;
			const isLastPart = index === signedUrls.length - 1;
			const part = isLastPart ? file.slice(start) : file.slice(start, end);
			const response = await fetch(signedUrl, {
				method: 'PUT',
				body: part
			});

			/**@type {import('@aws-sdk/client-s3').CompletedPart}*/
			return {
				ETag: response.headers.get('etag'),
				PartNumber: index + 1
			};
		})
	);
}

A lot is happening here. First, I needed to define the minimum part size, then I used it in conjunction with the index of the part to slice the file into chunks. Then using a PUT method I send the part of the file as request body to the signed URL generated previously. Last but not least, I’m returning an ETag and PartNumber to be used when completing the multipart upload in the MultipartUpload.Parts property.

As per AWS Documentation , the minimum part size is from 5MiB to 5GiB but the minimum part size limit does not apply to the last part of the upload.

Updating bucket configuration

If you’ve been following along with the previous articles you might have realized already that the current bucket configuration will not be sufficient.

const bucket = new Bucket(stack, 'uploads', {
	name: 'exanubes-upload-bucket-sst',
	cors: [
		{
			allowedMethods: [HttpMethods.POST, HttpMethods.PUT],
			allowedOrigins: ['http://localhost:5173'],
			allowedHeaders: ['*'],
			exposedHeaders: ['etag']
		}
	],
	cdk: {
		//...props
	},
	notifications: {
		// ...props
	}
});

To upload the parts to s3 we used a PUT request so it has to be added to the allowedMethods property and we’re using the ETag header from the response for each of the parts so that needs to be added to the exposedHeaders property.

Alternative ways for aborting multipart uploads

It could happen that you end up with some orphaned multipart uploads. This could be due to a bug in your code or maybe a user lost internet connection and the abort command couldn’t reach AWS. In any case, you want to make sure that you clean up after yourself. There are a couple of ways to do this.

Using the AWS CLI

First you want to list all the multipart uploads that are in progress.

$ aws s3api list-multipart-uploads --bucket exanubes-upload-bucket-sst

{"Uploads": [
    {
    "UploadId": "vyeTimRN3zf44xNORCtK0PeAIzQqdXWfNO.F.NEoVVYmZZ.WihKsxX.yA8BIgQ0wZmQ_ewjy4eESMd4RRM0nMVm8cXA18CqDk7DUeg_dys jIV5f8uF5xQ0LP9ZeUo0T0r1_YX88_VKIRmIUog0p16A--",
    "Key": "e9946a16-2a6a-455e-81c7-5dcd37db8241",
    "Initiated": "2023-11-29T12:27:51+00:00",
    "StorageClass": "STANDARD",
    "Owner": {..}
    ...
    }
  ]
}

This returns a list of all the multipart uploads that are currently in progress for the specified bucket and it provides all the information we need to abort the upload

aws s3api abort-multipart-upload --upload-id UPLOAD_ID --key KEY --bucket exanubes-upload-bucket-sst

Analogus to the AbortMultipartUploadCommand , we need to provide the uploadId, key and bucket of the file we want to abort.

Lifecycle rules

Another way to do it and probably the most efficient outside of handling failures in the code is to define a lifecycle rule in the bucket configuration.

const bucket = new Bucket(stack, 'uploads', {
		name: 'exanubes-upload-bucket-sst',
		cors: [
			//...props
		],
		cdk: {
			bucket: {
				//...props
				lifecycleRules: [
					{
						abortIncompleteMultipartUploadAfter: Duration.days(1),
						//...props
					}
				]
			}
		},
		notifications: {
			//...props
		}
	});

AWS CDK exposes a property called abortIncompleteMultipartUploadAfter that we can use to automatically abort incomplete multipart uploads after the set amount of time.

Summary

In this article we’ve covered how to upload large files to s3 using multipart upload and signed URLs, going through each step of the process, from initiating the upload through generating signed urls and finally completing the upload. We’ve also covered how to abort multipart uploads in three different ways. Using an AWS SDK command, using the AWS CLI and using bucket lifecycle rules.