Storage
AWS storage services are grouped into three categories : block storage, file storage and object storage.
- Block storage is raw storage in which the hardware storage device that is formatted into segments called blocks and attached to the compute system for use.
-
File storage is built on top of block storage, typically serving as a file share or file server.
- File storage is a hierarchy of directories and subdirectories.
-
Object storage also known as object-based storage, is a method of storing files in a flat address space based on attributes and metadata.
- Unlike file storage, object storage doesn't differentiate between types of data.
EBS
Amazon Elastic Block Storage (EBS) is a block-level storage device that you can attach to an Amazon EC2 instance.
These storage devices are called Amazon EBS volumes.
EBS volumes act similarly to external drives in more than one way:
- Most Amazon EBS volumes can only be connected with one computer at a time.
- Most EBS volumes have a one-to-one relationship with EC2 instances, so they cannot be shared by or attached to multiple instances at one time.
- EBS Provisioned IOPS io2 or io1 volumes can be concurrently attached to up to 16 Nitro-based EC2 instances within the same AZ.
- You can detach an EBS volume from one EC2 instance and attach it to another instance in the same AZ.
- EBS is zonal, EC2 instance and EBS volume should be in the same AZ to be able to attach it.
You can scale Amazon EBS volumes in two ways:
- Increase the volume size, as long as it doesn’t increase above the maximum size limit.
- You can also change volume types on the fly (e.g: go from gp2 to io2 without downtime)
- Attach multiple volumes to a single Amazon EC2 instance. EC2 has a one-to-many relationship with EBS volumes.
EBS snapshots
EBS snapshots are incremental backups that only save the blocks on the volume that have changed after your most recent snapshot.
The backups are stored redundantly in multiple AZs using Amazon S3.
- Snapshots are point in time, incremental. The first snapshot can take a time.
- Consistent Snapshots : Snapshots only capture data that has been written to your EBS,
which might exclude any data that has been locally cached by the app or OS.- Recommended to stop the instance and take a consistent snapshot.
- Encrypted Snapshot : the snapshot of an encypted volume will be encrypted automatically.
- To create a snapshot of the unencrypted root device volume, create a copy of the snap and
select encryption option
.- Create an AMI from the encrypted snap and use that AMI to launch new encrypted instances.
- To create a snapshot of the unencrypted root device volume, create a copy of the snap and
- Sharing Snapshot : Snapshost can be shared only in the region in which it was created. Use
copy
to sharte to other regions.
EBS Encryption
EBS encrypts your volume with a data key using AES-256 algorithm.
EBS encryption uses AWS KMS and customer master keys (CMK) when creating volumes and snapshots.
When you encrypt an EBS volume:
- Data at rest is encrypted inside a volume.
- All data in flight moving between the instance and the volume is encrypted.
- All snapshots are encrypted and also all volumes created from these snapshots
File Storage
For running file systems workflows on AWS, you can select from Amazon Elastic File System (Amazon EFS), Amazon FSx for Lustre, Amazon FSx for Windows File Server.
EFS - FSx
-
Amazon EFS is a scalable, elastic, cloud-native file system for Linux.
-
Uses NFSv4 protocol, supports encryption at rest using KMS.
- Works with EC2 instances in multiple Availability Zones.
- Can support 1000 concurrent connections
- Can handle up to 10 Gbps in throughput.
- Scale your storage to petabytes
- Amazon FSx for Windows File Server is an AWS fully managed file system for Windows environments. (SMB based files)
- Supports AD users, ACL and security policies, replication...
- Amazon FSx for Lustre is an AWS fully managed parallel file system built on Lustre for HPC, ML workloads.
- It provides high-speed, high-capacity distributed storage. It can store data directly on S3.
Amazon S3
Amazon Simple Storage Service (S3) is an object storage service.
Object storage stores data in a flat structure, using unique identifiers to look up objects when requested.
An object is a file combined with metadata. You can store as many of these objects as you’d like. It is stored in containers called buckets.
When you create a bucket, you specify at the very minimum two details:
- The AWS Region you want the bucket to reside in
- The bucket name, it must be unique across all AWS accounts. Be between 3-63 characters long.
- AWS uses the bucket name as part of the object identifier.
- Bucket URL :
http://$BUCKET.s3.$REGION.amazonaws.com/$key
By default, you can create up to 100 buckets in your AWS accounts and an object size can up to 5 TB.
S3 Storage Classes
S3 Storage Classes
- Standard : for frequently accessed data. The default storage class.
- Standard Infrequent Access : used for data that is accessed less frequently but requires rapid access when needed.
- Great for backups, Disaster recovery files, long-term storage.
- One zone Infrequent Access : like Standard-IA but data is stored redundantly within a single AZ.
- Costs 20% less than Standard-IA.
- Great for non-critical data, long-lived.
- Intelligent Tier: optimizes costs by automatically moving data to the most cost effective tier based on how frequently you access each object.
- Glacier : Class for data archiving
- Glacier Instant Retrieval : long-term data archiving with instant retrieval time for your data.
- Glacier Flexible retrieval : for archive data that doesn't require immediate access but the flexibility to retrieve large sets of data at no cost such as backup or DR use cases.
- Expedited (1-5 minutes), standard(3-5 hours), free bulk(5-12 hours).
- Glacier Deep Archive : To retain data sets for 7+ years, to mmet regulatory compliance requirements.
- Standard retrieval time is 12 hours and bulk retrieval time is 48 hours.
S3 Security
S3 Security
Buckets are private by default.
You can also use Amazon Macie do discover and protect sensitive data in S3 bucket
IAM policies and S3 bucket policies can be used to be more specific about who can do what with a S3 resource.
You should use IAM policies for private buckets in the following two scenarios:
- You have many buckets with different permission requirements.
- You want all policies to be in a centralized location.
Unlike IAM, S3 bucket policies are only attached to S3 buckets, specify what actions are allowed or denied on the bucket.
S3 bucket policies can only be placed on buckets and cannot be used for folders or objects.
You should use S3 bucket policies in the following scenarios:
- You need a simple way to do cross-account access to S3 without using IAM roles.
- Your IAM policies bump up against the defined size limit. S3 bucket policies have a larger size limit.
- You want to make entire buckets public.
Pre-Signed URLs can be used to grant time-limited access to others with temporary URLs.
S3 ACLs enable you to manage access to buckets and objects. Each bucket and object has an ACL attached to it as a subresource.
It defines which AWS accounts or groups are granted access and the type of access.
Amazon S3 has a set of predefined groups:
- AuthenticatedUsers : Access permission to this group allows any AWS account to access the resource.
- AllUsers : Access permission to this group allows anyone in the world access to the resource.
- Aws user account ID.
You can also enforce write-once-read-many (WORM) policies with S3 Object Lock.
You can configure S3 Object Lock in one of two modes: Governance mode and Compliance mode.
- When deployed in Governance mode, AWS accounts with specific IAM permissions are able to remove S3 Object Lock from objects.
-
If you require stronger immutability in to comply with regulations, you can use Compliance mode.
- In Compliance mode, no user or root account can remove the protection.
- When an object is locked, its retention mode can't be changed and its retention period can't be shortened.
-
A retention period (RT) protect an object version for a fixed amount of time.
- S3 stores a timestamp in the object version'metadata to indicate when the retention period expires.
-
Legal Holds also enables you to place a legal hold on an object version.
- Like RT, it prevents an object version from being overwritten or deleted.
-
S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier Vaults.
Amazon S3 reinforces encryption in transit and at rest. To protect data at rest, you can use encryption.
- Encryption in Transit : SSL/TLS, HTTPS
- Encryption at Rest - Server-side : This allows Amazon S3 to encrypt your object before saving it on disks and then decrypt it when you download the objects.
- SSE-S3 : S3 managed keys, using AES 256-bit`encryption
- SSE-KMS : AWS Key Management Service
- SSE-C : Customer-provided Keys
- Encryption at Rest - Client-side : You can encrypt your data yourself and then upload the encrypted data to Amazon S3.
- In this case, you manage the encryption process, the encryption keys, and all related tools.
To enforce Server-Side Encryption:
- Using AWS console : Just select the encryption seeting on your buckets.
- For
PUT
request, use headerx-amz-server-side-encryption
with valueAES256
or àws:kms` to tell S3 to encrypt the object at the time of upload. - Using Bucket Policy : It can deny all PUT requests that don't include
x-amz-server-side-encryption
in the requests header.
S3 Versioning
S3 Versioning - Lifecycle
Enablle versioning in S3 to have multiple versions of an object within S3.
- All versions : of an object are stored in S3. This include all writes and even if you delete an object.
- Cannot Be disabled : Once enabled, versioning cannot be disabled, only suspended.
- Supports MFA : can MFA to authenficate, to delete object.
- Backup : can be a great backup tool.
- Lifecycle Rules : can be integrated with lifecycle rules.
You can use Lifecycle Management :
- To automate moving object between different storage tiers.
- In conjunction with versioning.
It can be applied to current and previous versions.
S3 Performance
Performance Tips
Upload - Multipart Uploads
- Recommended for files over 100 MB .
- Required for files over 5 GB
- Parallelize uploads (increase efficiency)
- AWS CLI : automatically do multipart upload for you, resume the upload in case of failure.
Download - S3 Byte-Range Fetches
- Parallelize downloads by specifying byte ranges.
- If there 's a failure in the download, it's only for a specific byte range.
- Can be used to download partial ammounts of the file such as header information
Use S3 bucket prefixes to increase performance by spreading your read accorss different prefixes.
Data Transfer - Migration
Data transfer and migration services are combined to service specific use case and workflow requirements.
AWS offers services that are designed to solve a wide variety of common use case needs:
Transfer Family, Snow Family, DataSync, App Migration Service(AWS MGN formerly CloudEndure)
AWS Snow Family
The AWS Snow Family provides offline data transfers using physical devices. The Snow Family is composed of
- AWS Snowcone : A small, rugged, portable, secure edge computing, storage, and data transfer device.
- 8TB (HDD), 14TB (SSD) storage -4 GM RAM - 2 vCPU.
- AWS Snowball : A rugged petabyte-scale data transport device with onboard storage and compute capabilities.
- 48TB - 81TB storage, Compute and GPU.
- AWS Snowmobile : A large truck to migrate or transport exabyte-scale datasets into and out of the AWS Cloud.
AWS Storage Gateway
Storage Gateway is a hybrid cloud storage service that helps you merge on-premises resources with the cloud. It relies on S3 to store data.
It can help with a long one-time migration or a long-term pairing of your architecure with AWS.
Store Gateway offerings
Storage Gateway is made up of three separate offerings.
Used to store Network File System (NFS) and Server Message Block (SMB) files in customer-manageable S3 buckets.
- NFS or SMB mount.
- Extend on-premises storage.
- Keep a local copy of recently used files.
- Help with migrations to AWS.
Used to keep copies of your local block storage data volumes in a service-managed S3 bucket.
- iSCSI mount.
- Cached or stored mode.
- create EBS snapshots
- Good for backup or migration.
used to connect an on-premises software appliance with cloud-based storage to provide seamless integration with data security features between your on-premises.
AWS DataSync
AWS DataSync is an online data transfer service that simplifies moving data between
on-premises storage systems and AWS Storage services. It can also transfer data between AWS Storage services.
DataSync can copy data between the following systems : NFS/SMB shares, Snowcome, S3 bucket, EFS, self-managed object storage...
It is agent-based, one-time migration solution and asynchrone.
It provides built-in security capabilities(such as encryption of data in transit and data integrity verification in transit and at rest), control and monitoring capabilities(such as data transfer scheduling) and granular visibility into the transfer process through Amazon CloudWatch Metrics, logs and events.
AWS Transfer Family
AWS Transfer Family provides fully managed support for file transfers directly into and out of S3 or EFS.
It supports SFTP, FTPS and FTP.
The Transfer Family helps you migrate your file transfer workflows to AWS by integrating with
existing authentication systems and providing DNS routing with Amazon Route 53.
AWS MGN
With AWS Application Migration Service (MGN), formerly CloudEndure Migration, you can quickly realize the benefits of migrating applications to the cloud without changes and with minimal downtime.
MGN uses EBS as underlying storage.