In this article we will learn how to upload large files to s3 using multipart upload and signed URLs. The key point for me was to do the multipart upload using signed URLs from the client. The examples I could find all used sent the file from the server to s3. I wanted to do this from the client so that I could upload files directly from the browser to s3.
You can find finished code for this article on Github and I’ve also recorded a video that covers more implementation details.
What is multipart upload?
Multipart upload is a mechanism that allows you to upload large files to s3 in parts. This is useful when you have a very large file that could take a long time to upload. If the upload fails for some reason you don’t have to start from the beginning. You can just retry the missing or failed parts. Once all parts are uploaded you can complete the upload and the file will be available in s3.
Initiate a multipart upload
Unlike regular uploads, multipart uploads are a little bit more hands-on as they are a multi-step process. The first step is to tell AWS
that we want to do a multipart upload so that AWS provides us with an UploadId
.
export async function startMultipartUpload(props) {
const input = {
Bucket: props.bucket,
Key: props.key
};
const command = new CreateMultipartUploadCommand(input);
return client.send(command);
}
All we have to do is provide the bucket and the key of the file we want to upload. The response will contain the UploadId
that we will need throughout the process.
Generate Signed URLs
Once we have the UploadId
we can generate signed URLs for each part of the file. The signed URLs will be used to upload each part to s3.
export async function getMultipartUrls(props) {
const input = {
Bucket: props.bucket,
Key: props.key,
UploadId: props.uploadId,
PartNumber: 0
};
const parts = Array.from({ length: props.parts }, (_, index) => index + 1);
return Promise.all(
parts.map((partNumber) => {
const command = new UploadPartCommand({
...input,
PartNumber: partNumber
});
return getSignedUrl(client, command, { expiresIn: 600 });
})
);
}
For this to work we need to provide the same bucket and key as before. We also need to provide the UploadId
that we got from the previous step and
we need to know how many parts we’re gonna be sending. In this case I opted for creating an array of part numbers and then mapped over them to generate
a signed URL for each part. Each link will expire after ten minutes so depending on how long you expect your uploads take, this value might have to be different.
Complete multipart upload
Once all parts are uploaded we need to tell AWS to complete the upload. This is done by providing the UploadId
and the list of parts that were uploaded as well
as the bucket name and object key.
export async function completeMultipartUpload(props) {
const input = {
Bucket: props.bucket,
Key: props.key,
UploadId: props.uploadId,
MultipartUpload: {
Parts: props.parts
}
};
const command = new CompleteMultipartUploadCommand(input);
return client.send(command);
}
The Parts
property is an array of objects that contain the PartNumber
and the ETag
of each part. The ETag
is a hash of the part that AWS generates.
Abort multipart upload
In case anything goes wrong, you want to make sure that you do not leave the multipart upload hanging as it will incur charges for storing the parts.
export async function abortMultipartUpload(props) {
const input = {
Bucket: props.bucket,
Key: props.key,
UploadId: props.uploadId
};
const command = new AbortMultipartUploadCommand(input);
return client.send(command);
}
Analogous to previous steps, we just need to provide the UploadId
and the bucket and key of the file and AWS will handle the rest.
Uploading parts to s3
Now that we have the signed URLs we can upload each part to s3.
export async function uploadPartsToS3(file, signedUrls) {
return Promise.all(
signedUrls.map(async (signedUrl, index) => {
const start = index * MINIMUM_PART_SIZE;
const end = (index + 1) * MINIMUM_PART_SIZE;
const isLastPart = index === signedUrls.length - 1;
const part = isLastPart ? file.slice(start) : file.slice(start, end);
const response = await fetch(signedUrl, {
method: 'PUT',
body: part
});
/**@type {import('@aws-sdk/client-s3').CompletedPart}*/
return {
ETag: response.headers.get('etag'),
PartNumber: index + 1
};
})
);
}
A lot is happening here. First, I needed to define the minimum part size, then I used it in conjunction with the index of the part to slice the file into chunks.
Then using a PUT
method I send the part of the file as request body to the signed URL generated previously. Last but not least, I’m returning an ETag
and PartNumber
to be used when completing the multipart upload in the MultipartUpload.Parts
property.
Updating bucket configuration
If you’ve been following along with the previous articles you might have realized already that the current bucket configuration will not be sufficient.
const bucket = new Bucket(stack, 'uploads', {
name: 'exanubes-upload-bucket-sst',
cors: [
{
allowedMethods: [HttpMethods.POST, HttpMethods.PUT],
allowedOrigins: ['http://localhost:5173'],
allowedHeaders: ['*'],
exposedHeaders: ['etag']
}
],
cdk: {
//...props
},
notifications: {
// ...props
}
});
To upload the parts to s3 we used a PUT
request so it has to be added to the allowedMethods
property and we’re using the ETag
header from the response for each of the parts so
that needs to be added to the exposedHeaders
property.
Alternative ways for aborting multipart uploads
It could happen that you end up with some orphaned multipart uploads. This could be due to a bug in your code or maybe a user lost internet connection and the abort command couldn’t reach AWS. In any case, you want to make sure that you clean up after yourself. There are a couple of ways to do this.
Using the AWS CLI
First you want to list all the multipart uploads that are in progress.
$ aws s3api list-multipart-uploads --bucket exanubes-upload-bucket-sst
{"Uploads": [
{
"UploadId": "vyeTimRN3zf44xNORCtK0PeAIzQqdXWfNO.F.NEoVVYmZZ.WihKsxX.yA8BIgQ0wZmQ_ewjy4eESMd4RRM0nMVm8cXA18CqDk7DUeg_dys jIV5f8uF5xQ0LP9ZeUo0T0r1_YX88_VKIRmIUog0p16A--",
"Key": "e9946a16-2a6a-455e-81c7-5dcd37db8241",
"Initiated": "2023-11-29T12:27:51+00:00",
"StorageClass": "STANDARD",
"Owner": {..}
...
}
]
}
This returns a list of all the multipart uploads that are currently in progress for the specified bucket and it provides all the information we need to abort the upload
aws s3api abort-multipart-upload --upload-id UPLOAD_ID --key KEY --bucket exanubes-upload-bucket-sst
Analogus to the AbortMultipartUploadCommand
, we need to provide the uploadId
, key
and bucket
of the file we want to abort.
Lifecycle rules
Another way to do it and probably the most efficient outside of handling failures in the code is to define a lifecycle rule in the bucket configuration.
const bucket = new Bucket(stack, 'uploads', {
name: 'exanubes-upload-bucket-sst',
cors: [
//...props
],
cdk: {
bucket: {
//...props
lifecycleRules: [
{
abortIncompleteMultipartUploadAfter: Duration.days(1),
//...props
}
]
}
},
notifications: {
//...props
}
});
AWS CDK exposes a property called abortIncompleteMultipartUploadAfter
that we can use to automatically abort incomplete multipart uploads after the set amount of time.
Summary
In this article we’ve covered how to upload large files to s3 using multipart upload and signed URLs, going through each step of the process, from initiating the upload through generating signed urls and finally completing the upload. We’ve also covered how to abort multipart uploads in three different ways. Using an AWS SDK command, using the AWS CLI and using bucket lifecycle rules.