While on-premises object storage might not grab the spotlight, cloud object storage is a powerhouse, with AWS S3 leading the charge. It’s tough to pinpoint exact numbers, but recent estimates suggest that S3 holds nearly half a quadrillion objects, a staggering leap from a trillion in 2010. That’s a whole lot of data—millions of exabytes.
S3 stands for Simple Storage Service, and it’s the core of AWS’s cloud offerings. This object storage has risen to be the go-to standard, inspiring many vendors to create S3-compatible solutions for on-site deployments. AWS and other cloud providers offer various service levels, while storage software and hardware makers have joined the S3 bandwagon as well.
So, what exactly is S3 storage? It allows for the storage of any data type—images, videos, documents—though it’s not a fit for every application like databases. Each piece of data, or object, gets a unique identifier, which is what sets object storage apart from traditional file systems. There’s no hierarchy. The data can be stored anywhere, as long as it’s linked to that identifier.
Every S3 object comes with metadata, some of which AWS generates automatically. This includes details like date, size, encryption, and access information. Users can also create their own metadata for better organization and management. The maximum size for a single upload is 160GB, but objects can reach up to 5TB using a multi-part upload, which allows breaking files into up to 10,000 parts.
In S3, objects sit in buckets. Buckets are essential to the storage structure and are tied to specific AWS regions. Each customer manages their own buckets, controlling access, setting rules, tracking costs, and managing replication. The S3 Console provides an easy way to upload, download, and manage everything within those buckets. Though S3 allows for folders, think of them more as labels instead of actual components of the storage structure.
When it comes to commands, S3 relies on basic HTTP methods like GET, PUT, and DELETE. Users can access these commands through the AWS console, command line, or API. These commands enable tasks like creating buckets, changing permissions, or syncing files with local directories.
AWS offers several storage classes within S3, ranging from options for frequently accessed data to those designed for archiving. For swift access, options like S3 Standard and S3 One Zone-IA come into play. When data isn’t accessed as often, you can opt for S3 Standard-IA or classes targeting archival needs like Glacier Instant Retrieval and Deep Archive. There’s also S3 Intelligent-Tiering, where data access patterns dictate storage costs.
However, while S3 is incredibly versatile and cost-effective, it’s not the best fit for every situation. It excels at bulk storage for unstructured data, such as backups or disaster recovery. But for ultra-fast data access, especially in database transactions, it doesn’t quite stack up against block storage.
For those who want to keep things on-premises, AWS offers S3 via Outposts, letting you store data on-site. But since S3 operates based on HTTP verbs and REST API, numerous other providers also offer S3-compatible options. Companies like Cloudian, Dell, Minio, NetApp, Pure Storage, QNAP, Red Hat, Scality, and StoneFly provide various on-premises solutions that can seamlessly integrate with S3 protocols.