Storing files in cloud storage is now a standard, and allowing users to upload files is a common feature for a web application. Specifically, files are generally uploaded to a server and then to a cloud storage service, or directly to the cloud storage service. When dealing with small files, this is an easy task to accomplish. On the other hand, when it comes to uploading large files, it can become challenging.
This is why AWS Cloud Storage and the other Amazon S3-like cloud storage services support multipart upload. This technique allows you to split a file into several small chunks and upload them all sequentially or in parallel, empowering you to deal with large files concisely.
First, let’s learn what multipart upload is, how it works, and why the best approach to it involves S3 pre-signed URLs. Then, let’s see how to implement everything required to get started with multipart uploads, both backend and frontend, through a demo application built in Node.js and React.
Prerequisites
You need these prerequisites to replicate the article’s examples:
- valid S3 credentials
- Node.js and npm 5.2+
- React >= 17.x
- aws-sdk >= 2.x
aws-sdk
is the AWS SDK for JavaScript and you install it with the following npm command:
npm install --save aws-sdk
What is multipart upload?
As explained here in the Amazon S3 documentation, multipart upload allows you to upload a single object as a set of parts, and it is typically used with large files. This technique allows you to split a file into many parts and upload them in parallel, pause and resume the operation whenever you need, and even start uploading an object before knowing its global size.
In other words, multipart upload enables faster, more controllable, and more flexible uploads into any Amazon S3-like cloud storage.
How does multipart upload work?
After uploading all parts of your object, you have to tell your cloud storage that the operation has been completed. Then, the data uploaded will be presented as a single object, equal to the uploaded file.
So, these are the steps involved in a multipart upload request:
- Splitting an object into many parts
- Initiating the multipart upload process
- Uploading each part
- Completing the multipart upload process
This approach to uploading is generally used in the frontend to give users the possibility to upload large files. In detail, the official documentation recommends using the multipart approach with any file larger than 1GB. But notice that you can use multipart upload with any file, regardless of its size.
Multipart upload with S3 pre-signed URLs
The best frontend approach to multipart upload involves pre-signed URLs. An S3 pre-signed URL is a URL signed with an AWS access key that temporarily grants you restricted access to a particular S3 object. With an S3 pre-signed URL, you can perform a GET or PUT operation for a predefined time limit. By default, an S3 pre-signed URL expires in 15 minutes.
S3 pre-signed URLs are particularly useful because they allow you to keep your S3 credentials and buckets private, granting access to a resource for a limited time only. All you have to do is generate them in your backend and then provide them to the frontend side.
Therefore, pre-signed URLs are a secure way to upload any kind of file to your S3 bucket. Plus, they allow you to avoid creating and managing roles, as well as changing the bucket ACL or providing users with special accounts to upload files.
As you can imagine, they are especially useful when it comes to multipart uploads. This is because you can generate a pre-signed URL for each part your original object was split into, and then upload the part with its respective URL.
If you want to follow an approach to multipart upload involving pre-signed URLs, you have to consider a new step. This is the list of all steps required:
- Splitting an object into many parts
- Initiating the multipart upload process
- Creating a pre-signed URL for each part
- Uploading each part through its pre-signed URL
- Completing the multipart upload process
Now, let’s delve into the pros and cons of the pre-signed URL approach to multipart upload.
Pros
- It comes with all the benefits of multipart upload, such as improved throughput due to parallel uploads, quick recovery in case of network issues because each part is small and easier to resend compared to the entire file, and the ability to pause and resume object uploads
- It is inherently secure
- It removes the need to have roles or accounts with special restrictions
- It is a maintainable and easy-to-understand approach to multipart upload
Cons
- It involves extra logic, both backend and frontend
- It requires you to know the object size beforehand to generate all pre-signed URLs
Implementing multipart upload with S3 pre-signed URLs
You can clone the GitHub repository supporting this article and immediately try the demo application by launching the following commands:
git clone https://github.com/Tonel/multipart-upload-js-demo cd multipart-upload-js-demo npm i npm backend:start npm frontend:start
To make the application work, you have to address the “TODO” left in the /backend/controllers/upload.js
file asking you to add your S3
credentials in the code.
Otherwise, keep following this step-by-step tutorial to learn how to build it and how it works.
Multipart upload in Node.js
To implement multipart upload signed URLs in S3, you need three APIs. But first, you require an AWS.S3
instance.
You can initialize it using your S3 credentials and the aws-sdk
library as follows:
const AWS = require("aws-sdk") const s3Endpoint = new AWS.Endpoint("<YOUR_ENDPOINT>") const s3Credentials = new AWS.Credentials({ accessKeyId: "<YOUR_ACCESS_KEY>", secretAccessKey: "<YOUR_SECRET_KEY>", }) const s3 = new AWS.S3({ endpoint: s3Endpoint, credentials: s3Credentials, })
Now, let’s now delve deeper into how to build the three APIs to implement multipart upload:
- POST
/uploads/initializeMultipartUpload
async function initializeMultipartUpload(req, res) { const { name } = req.body const multipartParams = { Bucket:"", Key:
${name}
, ACL: "public-read", } const multipartUpload = await s3.createMultipartUpload(multipartParams). promise() res.send({ fileId: multipartUpload.UploadId, fileKey: multipartUpload.Key, }) },
The name
parameter passed in the body of the request represents the name of the file that will be created in the cloud storage at the end of the multipart upload operation. This API takes care of initializing a multipart upload request by calling the createMultipartUpload()
function available from the previously created s3
object.
This API is necessary because to perform a multipart upload request, you need the UploadId
value. This is used by the cloud storage service to associate all the parts involved in the multipart upload at the end of the process, while the Key
parameter represents the full name of the file.
- POST
/uploads/getMultipartPreSignedUrls
async function getMultipartPreSignedUrls(req, res) { const { fileKey, fileId, parts } = req.bodyconst multipartParams = { Bucket: BUCKET_NAME, Key: fileKey, UploadId: fileId, }const promises = [] for (let index = 0; index < parts; index++) { promises.push( s3.getSignedUrlPromise("uploadPart", { ...multipartParams, PartNumber: index + 1, }), ) } const signedUrls = await Promise.all(promises) // assign to each URL the index of the part to which it corresponds const partSignedUrlList = signedUrls.map((signedUrl, index) => { return { signedUrl: signedUrl, PartNumber: index + 1, } }) res.send({ parts: partSignedUrlList, }) },
This API is responsible for returning the pre-signed URLs associated with the parts involved in the multipart request. It requires the fileKey
and fileId
parameter retrievable from the previous API, and the number of parts
the original file was split into to upload through multipart upload.
Then, it uses this info to generate the S3 pre-signed URLs by calling the getSignedUrlPromise()
function on the s3
object. As you can see, the PartNumber
parameter telling which part the URL is associated with is an index starting from 1
. Using an index starting from 0
would lead to an error.
- POST
/uploads/finalizeMultipartUpload
async function finalizeMultipartUpload(req, res) { const { fileId, fileKey, parts } = req.bodyconst multipartParams = { Bucket: BUCKET_NAME, Key: fileKey, UploadId: fileId, MultipartUpload: { // ordering the parts to make sure they are in the right order Parts: _.orderBy(parts, ["PartNumber"], ["asc"]), }, }const completeMultipartUploadOutput = await s3.completeMultipartUpload(multipartParams).promise() // completeMultipartUploadOutput.Location represents the // URL to the resource just uploaded to the cloud storage res.send() },
This last API finalizes a multipart upload request. Again, it requires the fileId
and fileKey
coming from the first API. Also, it requires the parts
parameter, which is a list of objects having the following type:
{ PartNumber: number ETag: string }
As defined in the official documentation, an ETag
is an ID that identifies a newly created object’s data. As you will see soon, this can be retrieved in the response header of a successful upload request executed with a pre-signed URL.
Then, this data is used to call the completeMultipartUpload()
function, which finalizes the multipart upload request and makes the uploaded object available in the cloud storage. Notice that the Parts
field of MultipartUpload
must have an ordered list, and the Lodash orderBy()
is used to ensure that.
Now, you have everything required to start performing multipart upload requests on your frontend application. Let’s learn how.
Multipart upload in React
Dealing with multipart upload on the frontend is a bit tricky. This is particularly true if you want to upload many parts in parallel and plan to provide users with the ability to abort the operation. So, instead of reinventing the wheel, you should adapt the utility class coming from this repository to your needs.
To be more specific, you can implement a utility class to perform multipart upload:
import axios from "axios" // initializing axios const api = axios.create({ baseURL: "http://localhost:3000", }) // original source: https://github.com/pilovm/multithreaded-uploader/blob/master/frontend/uploader.js export class Uploader { constructor(options) { // this must be bigger than or equal to 5MB, // otherwise AWS will respond with: // "Your proposed upload is smaller than the minimum allowed size" this.chunkSize = options.chunkSize || 1024 * 1024 * 5 // number of parallel uploads this.threadsQuantity = Math.min(options.threadsQuantity || 5, 15) this.file = options.file this.fileName = options.fileName this.aborted = false this.uploadedSize = 0 this.progressCache = {} this.activeConnections = {} this.parts = [] this.uploadedParts = [] this.fileId = null this.fileKey = null this.onProgressFn = () => {} this.onErrorFn = () => {} } // starting the multipart upload request start() { this.initialize() } async initialize() { try { // adding the the file extension (if present) to fileName let fileName = this.fileName const ext = this.file.name.split(".").pop() if (ext) { fileName += `.${ext}` } // initializing the multipart request const videoInitializationUploadInput = { name: fileName, } const initializeReponse = await api.request({ url: "/uploads/initializeMultipartUpload", method: "POST", data: videoInitializationUploadInput, }) const AWSFileDataOutput = initializeReponse.data this.fileId = AWSFileDataOutput.fileId this.fileKey = AWSFileDataOutput.fileKey // retrieving the pre-signed URLs const numberOfparts = Math.ceil(this.file.size / this.chunkSize) const AWSMultipartFileDataInput = { fileId: this.fileId, fileKey: this.fileKey, parts: numberOfparts, } const urlsResponse = await api.request({ url: "/uploads/getMultipartPreSignedUrls", method: "POST", data: AWSMultipartFileDataInput, }) const newParts = urlsResponse.data.parts this.parts.push(...newParts) this.sendNext() } catch (error) { await this.complete(error) } } sendNext() { const activeConnections = Object.keys(this.activeConnections).length if (activeConnections >= this.threadsQuantity) { return } if (!this.parts.length) { if (!activeConnections) { this.complete() } return } const part = this.parts.pop() if (this.file && part) { const sentSize = (part.PartNumber - 1) * this.chunkSize const chunk = this.file.slice(sentSize, sentSize + this.chunkSize) const sendChunkStarted = () => { this.sendNext() } this.sendChunk(chunk, part, sendChunkStarted) .then(() => { this.sendNext() }) .catch((error) => { this.parts.push(part) this.complete(error) }) } } // terminating the multipart upload request on success or failure async complete(error) { if (error && !this.aborted) { this.onErrorFn(error) return } if (error) { this.onErrorFn(error) return } try { await this.sendCompleteRequest() } catch (error) { this.onErrorFn(error) } } // finalizing the multipart upload request on success by calling // the finalization API async sendCompleteRequest() { if (this.fileId && this.fileKey) { const videoFinalizationMultiPartInput = { fileId: this.fileId, fileKey: this.fileKey, parts: this.uploadedParts, } await api.request({ url: "/uploads/finalizeMultipartUpload", method: "POST", data: videoFinalizationMultiPartInput, }) } } sendChunk(chunk, part, sendChunkStarted) { return new Promise((resolve, reject) => { this.upload(chunk, part, sendChunkStarted) .then((status) => { if (status !== 200) { reject(new Error("Failed chunk upload")) return } resolve() }) .catch((error) => { reject(error) }) }) } // calculating the current progress of the multipart upload request handleProgress(part, event) { if (this.file) { if (event.type === "progress" || event.type === "error" || event.type === "abort") { this.progressCache[part] = event.loaded } if (event.type === "uploaded") { this.uploadedSize += this.progressCache[part] || 0 delete this.progressCache[part] } const inProgress = Object.keys(this.progressCache) .map(Number) .reduce((memo, id) => (memo += this.progressCache[id]), 0) const sent = Math.min(this.uploadedSize + inProgress, this.file.size) const total = this.file.size const percentage = Math.round((sent / total) * 100) this.onProgressFn({ sent: sent, total: total, percentage: percentage, }) } } // uploading a part through its pre-signed URL upload(file, part, sendChunkStarted) { // uploading each part with its pre-signed URL return new Promise((resolve, reject) => { if (this.fileId && this.fileKey) { // - 1 because PartNumber is an index starting from 1 and not 0 const xhr = (this.activeConnections[part.PartNumber - 1] = new XMLHttpRequest()) sendChunkStarted() const progressListener = this.handleProgress.bind(this, part.PartNumber - 1) xhr.upload.addEventListener("progress", progressListener) xhr.addEventListener("error", progressListener) xhr.addEventListener("abort", progressListener) xhr.addEventListener("loadend", progressListener) xhr.open("PUT", part.signedUrl) xhr.onreadystatechange = () => { if (xhr.readyState === 4 && xhr.status === 200) { // retrieving the ETag parameter from the HTTP headers const ETag = xhr.getResponseHeader("ETag") if (ETag) { const uploadedPart = { PartNumber: part.PartNumber, // removing the " enclosing carachters from // the raw ETag ETag: ETag.replaceAll('"', ""), } this.uploadedParts.push(uploadedPart) resolve(xhr.status) delete this.activeConnections[part.PartNumber - 1] } } } xhr.onerror = (error) => { reject(error) delete this.activeConnections[part.PartNumber - 1] } xhr.onabort = () => { reject(new Error("Upload canceled by user")) delete this.activeConnections[part.PartNumber - 1] } xhr.send(file) } }) } onProgress(onProgress) { this.onProgressFn = onProgress return this } onError(onError) { this.onErrorFn = onError return this } abort() { Object.keys(this.activeConnections) .map(Number) .forEach((id) => { this.activeConnections[id].abort() }) this.aborted = true } }
As you are about to see, this utility class allows you to perform a multipart request in a bunch of lines of code. The Uploader
utility class uses the axios
API client, but any other Promise-based API request will do.
This utility class takes care of splitting the file
parameter representing the object in order to upload what’s received in the constructor into smaller parts of 5 MB each. Then, it initializes the multipart upload request (uploading at most 15 parts in parallel at a time) and finally calls the finalization API defined in the previous chapter to complete the request.
In case of an error, the sendNext()
function takes care of putting the part whose upload failed back into the queue. In case of fatal errors or deliberate interruption, the upload process is stopped.
The most relevant part of the utility class is represented by the upload()
function. This is where each part is uploaded through a pre-signed URL and its corresponding ETag
value is retrieved.
Now, let’s see how you can employ the Uploader
class:
import "./App.css" import { Uploader } from "./utils/upload" import { useEffect, useState } from "react" export default function App() { const [file, setFile] = useState(undefined) const [uploader, setUploader] = useState(undefined) useEffect(() => { if (file) { let percentage = undefined const videoUploaderOptions = { fileName: "foo", file: file, } const uploader = new Uploader(videoUploaderOptions) setUploader(uploader) uploader .onProgress(({ percentage: newPercentage }) => { // to avoid the same percentage to be logged twice if (newPercentage !== percentage) { percentage = newPercentage console.log(`${percentage}%`) } }) .onError((error) => { setFile(undefined) console.error(error) }) uploader.start() } }, [file]) const onCancel = () => { if (uploader) { uploader.abort() setFile(undefined) } } return ( <div className="App"> <h1>Upload your file</h1> <div> <input type="file" onChange={(e) => { setFile(e.target?.files?.[0]) }} /> </div> <div> <button onClick={onCancel}>Cancel</button> </div> </div> ) }
As soon as a file is uploaded through the <input>
with type="file"
HTML element, the useEffect()
hook is performed. There, the Uploader
utility class is harnessed to automatically manage the multipart upload request accordingly. While the upload process is taking place, you can press the Cancel button to abort the operation.
Et voilá! Your demo application to upload large files through multipart upload is ready!
Conclusion
In this article, we looked at what S3 multipart upload is, how it works, why you should implement it by using S3 pre-signed URLs, and how to do it in Node.js and React.
As explained, multipart upload is an efficient, officially recommended, controllable way to deal with uploads of large files. This is particularly true when using S3 pre-signed URLs, which allow you to perform multipart upload in a secure way without exposing any info about your buckets.
Putting in place a server in Node.js to implement multipart upload with pre-signed URLs involves only three APIs, and it cannot be considered a difficult task. On the other hand, dealing with file splitting and parallel upload on the frontend side is a bit more tricky, but it can definitely be implemented. As we saw, you only have to define a general-purpose and reusable utility class to deal with any multipart uploads, and here we learned how to do that.
Thanks for reading! I hope that you found this article helpful. Feel free to reach out to me with any questions, comments, or suggestions.
The post Multipart uploads with S3 in Node.js and React appeared first on LogRocket Blog.
from LogRocket Blog https://ift.tt/I4SordW
via Read more