Database

Data source types

Structured dataUnstructured data Semistructured data

Structured data is often organized to support transactional and analytical applications.
It is most commonly stored in relational databases but can also be stored in non-relational databases.

Unstructured data : is not organized in any distinguishable or predefined manner.

Common stores for unstructured data are non-relational key-value databases.
Can be stored as files in file store or object store(S3)

Semistructured data can be just as predictable and organized as structured data.
It is flexible and can be updated without the requirement to change the schema for every single record in a table.

Data source and database types

A relational database is built to store structured data in tables using a defined schema.
Key-value databases are a type of non-relational database that store unstructured data in the form of key-value pairs.
Document stores are a type of non-relational database that store semistructured and unstructured data in the form of files.
In-memory data stores can be used for both structured and semistructured data sources.
Graph databases are purpose-built to store any type of data: structured, semistructured, or unstructured.

AWS DMS

AWS Database Migration Service to easily migrate databse and datasore between AWS and on-premise.
It provides SCT (Schema Conversion Tool) a conversion tool that allows for translation database schemas to new platforms.
You have the option to perform a one-time migration or continuously replicate ongoing changes.

Amazon RDS

Amazon Relational Database Service (RDS) lets customers create and manage relational databases without the operational burden of database management.

Amazon RDS supports the following three instance families/classes:

Standard, which includes general-purpose instances.
Memory Optimized, which is optimized for memory-intensive applications.
Burstable Performance, which provides a baseline performance level, with the ability to burst to full CPU usage.

Amazon RDS supports most of the popular relational database management systems:

Commercial : Oracle, SQL Server
Open Source : MySQL, PostgreSQL, MariaDB
Cloud Native : Amazon Aurora

RDS is for OLTP (Online transaction processing) workloads, not suitable for OLAP (online analytical processing) workloads. Use Redshift for data warehousing and OLAP tasks.

Secure RDS Database Instance

Restrict access to the database by placing it inside of VPC
Create an internet gateway if you need to accept requests directly from internet sometimes.
Control access to the instance with security groups. RDS can use three types of security groups: Database, VPC, and EC2.
RDS uses IAM to create and manage credentials
RDS requires both authentication and permission to access tables and data.
Secure communications to and from the database instance(data in transit). Use SSL connections
Protect data in the database(data in rest). RDS uses the industry-standard AES-256 bit encryption algorithm to encrypt the data.

Multi-AZ & Read Replicas

Redundancy with RDS Multi-AZRead Replica

When you enable RDS Multi-AZ, RDS creates a redundant copy of your database in another AZ.
The data in the primary DB instance is synchronously replicated to the standby DB instance.

Some key facts:

Used for Disaster Recovery
In the event of failure, availability issue, RDS will trigger an automatic failover to the standby instance.

A read replica is a read-only copy of the primary database. Each replica has its own DNS endpoint.
It can be promoted to be its own database (this breaks the replication).

Some key facts:

Scaling read Performance : primarily used for scaling,durability not for Disaster Recovery.
Requires Automatic Backup : auto backup must be enabled in order to deploy a read replica.
Multiple read replicas are supported : up to 5 replicas to each DB instances.

Amazon Aurora

Amazon Aurora is a MySQL- and PostgreSQL-compatible database built for the cloud.

It is more durable, more available and provides faster performance than the Amazon RDS version of MySQL and PostgreSQL(5/3 times the throughput of standard MySQL/PostgreSQl).

Aurora automatically maintains 6 copies of your data across 3 AZs.
It will automatically attempt to recover the database in a healthy AZ with no data loss.
You can create up to 15 read replicas that can serve read-only traffic as well as failover.
Aurora automatically backs up your database to Amazon S3, enabling granular point-in-time recovery.
Low-Latency, Multi-AZ, Multi-Region Replication, High Performance and Scalability.
Use Aurora Serverless, for cost-effective option for infrequent or unpredictable workloads.

Amazon DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

With DynamoDB, you can create database tables that can store and retrieve any amount of data and serve any level of request traffic.
DynamoDB transactions provides Atomicity - Consistency - Isolation - Durability (ACID).

You can scale up or scale down your tables throughput capacity without downtime or performance degradation.

In DynamoDB, tables/items/attributes are the core components that you work with.
A table is a collection of items and each item is a collection of attributes.

Keys

DynamoDB uses partition keys to find each item in the database.
- Data is distributed on physical storage nodes and the partition key is used to determine which of those nodes the item is located on.
A item can have an optional sort key to store related attributes in a sorted order.
Each table also has a primary key, which represents the table’s key or keys
- If there is no sort key, the primary and partition keys are the same.
- If there is a sort key, the primary key is a combination of the partition and sort keys called a composite primary key.

Indexes

DynamoDB has 2 types of secondary indexes : local and global. These indexes improve the application's ability to access data quickly and efficiently.
- A local secondary index uses the table’s partition key with a unique sort key. You are allowed 5 per table.
- A global secondary index uses a partition key and sort key that can be different from those on the table. This allows you to model very complex data access patterns. You are allowed up to 20 global indexes per table.

DynamoDB Security

DynamoDB uses IAM to create and manage credentials
DynamoDB requires both authentication and permission to access tables and data.
DynamoDB provides end-to-end enterprise-grade encryption for data that is both in transit and at rest.

key Facts

Managed Multi-Master, Multu-Region Replication

Gobally distributed applications.
Based on DynamoDb streams.
Multi-region redundancy of DR or HA.
Replication latency under 1s.

DocumentDB

Amazon DocumentDB is a fully managed non-relational database service that allows you to run MongoDB on the AWS.
It is purpose-built for storing, indexing and querying semistructured data in JSON documents.
You can use it to easily migrate the on-premise MongoDB data to the Cloud.

The DocumentDB basic building block is the cluster.
A cluster consists of one or more instances and a cluster storage volume that manages the data for those instances.
All writes are done through the primary instance. All instances (primary and replicas) support read operations.

The cluster's data is stored in the cluster volume which stores 6 copies of the data across 3 different AZs

DocumentDB Security

Authentication and Authorization through IAM users, roles and policies.
- Authentication is enabled by default. It cannot be disabled and it’s configured via standard MongoDB tools and drivers.
Data in transit is encrypted utilsing TLS .
DocumentDB uses AES-256 bit encryption algorithm to encrypt the data while at rest.
DocumentDB clusters only run within the Amazon VPC.

Caching

CloudFront

CloudFront is a fast Content Delivery Network (CDN) service that securely delivers data to the customers globally.
It helps reduce latency and provide higher transfer speeds using AWS edge locations.

Important settings

Security : Default to HTTPS connections with the possibility to add cusstom SSL certificate.
Endpoint Support : Can be used to front AWS endpoints along with non-AWS applications.
Expiring Content : Can force an expiration of content from the cache if you can't wait for the TTL.
Global Distribution : Can't pick specific countries, just general areas.

CloudFront is the only option to add HTTPS to a static website being hosted in a S3 bucket.

Elasticache

There are 3 common types of data caches: built-in - application - remote.
A remote cache is a centralized, in-memory repository that can dramatically improve the responsiveness of databases and applications.
It stores data externally from the database in a non-relational key-value database.

ElastiCache is a remote cache engine and supports the two most common open source caching engines: Memcached and Redis.
Two common approaches to caching are lazy loading and write-through.

Lazy loading is reactive. Data is put into the cache the first time it is requested.
Write-through is proactive. Data is put into the cache at the same time it is put into the database.

Memcached - Redis - DAX

MemcachedRedisDAX

Simple database caching solution.
Not a database by itself.
No failover or Multi-AZ support.
No backups.

Supported as a caching solution.
Functions as standalone database.
Failover and Multi-AZ support.
Supports backups

DynamoDB Accelerator (DAX)

In-Memory Cache : DAX can reduce DynamoDB response times from milliseconds to microseconds.
Location: The cache is highly available and lives inside the VPC.
Control: You determine the node size and count for the cluster, TTL for the data ...

ElastiCache Security

Access control and authentication for ElastiCache are managed through IAM
It supports encryption at rest and in transit.
It can also authenticate clients.
ElastiCache are created within the Amazon VPC and managed through VPC security groups
- On premises servers can use it through either a VPN or AWS Direct Connect

Disaster Recovery Strategies

DR Strategies

Backup & RestorePilot LightWarm StandbyMulti-site Active/Active

This protects against data loss and corruption. It involves replicating data to other AWS regions/AZs.
The redeployment includes configuration, infra, application code to the recovery regions.

This strategy is ideal for use cases that are low priority and is the lowest cost of all the strategies.

Pilot light requires the replication of your data from one AWS region to another. You also need to create a copy of the infrastructure of your core workload.
Databases and object storage is always enabled but application servers are kept off until needed during the DR.

Warm standby involves creating a smaller-scale version of your current production environment in a different region/AZ. This environment is fully functional.

The difference between this and the pilot light approach is that here the system is fully on and ready to use while pilot light keeps only essential services active to minimize costs.

This allows you to run your workloads at the same time in various regions, using either a multi-site active/active or a hot standby active/passive strategy.
The multi-site active/active strategy ensures that traffic can be served from any region, while the hot standby strategy services traffic only from a single region.

It is the most complex and expensive disaster recovery strategy, it also provides near-realtime recovery.