Compute Services

There are 3 main types of compute in AWS:

Instances : Compute resources(EC2 instances ...) are measurables quantities of compute power(CPU, RAM ...) that can be requested, allocated, consumed.
EC2 instances are VMs that emulate physical hardware components.
Containers : Container management services that can run containers on either customer managed EC2 instances OR as a managed serverless offering running containers on AWS Fargate.
- EKS(Elastic Kubernetes Service) is a managed service that you can use to run k8s on AWS without nessarily having to operate your on worker nodes.
- ECS(Elastic Container Service) using it, you can deploy containerized workoads on a managed cluster of EC2 instances.
Lambda : using it, you can run code without provisioning or managing servers(Serverless). You pay only for the compute time you consume.

Instances - EC2

When you stop and start an EC2 instance, your instance may be placed on a new underlying physical server.
It gets a new public IP address but maintains the same private IP addresses.

When you restart an EC2 instance after hibernation

Frozen processes resume from their original states.
Private IP addresses and Elastic IP are maintained not the public IP.
Instance store volumes doesn't maintain their data.

EC2 Pricing Options

On-demandReservedSpotDedicated

Pay by the hour/second depending on the type of instace you run.
Low cost and flexibility of EC2 without any upfront payment or long-term commitment.

Reseved capacity for 1 or 3 years. Up to 72% discount on the hourly charge.
Predictable usage, specific capacity requirements, upfront payment.

Standard RIs : Up to 72% off the on-demand price.
Convertible RIs : Up to 54% off the on-demand price. Has option to change to a different RI type of equal or greater value.
Scheduled RIs : Run within the time window you define.
Match a capacity reservation to a predictable recurring schedule that on requires a fraction of a day/week/month.

Reserved instances operate at a regional level.

Purchase unused capacity at a discount of up to 90% for stateless, fault-tolerant, flexible applications.

To use Spot Instances, you must first decide on your max Sopt price.
The instance will be provisioned so long as the Spot price is below your max price.
If the Sopt price goes above, AWS will give you a 2-minute warning before it interrupts your instance.

You may also use a Spot block to stop your Sopt instance from being termiated even if the price goes over your max price.
You can set Spot blocks for between 1 to 6 hours.

A Spot Fleet : is a collection of Spot Instances and optionally On-Demand Instances.
It attempts to launch the number of Spot and On-Demand instances to meet the target capacity specified in Spot Fleet request.

A physical EC2 server dedicated for your use.
Great for server-bound licenses to reuse or compliance requirements.

Networking with EC2

By default, your EC2 instances are placed in a network called the default VPC.
Any resource you put inside the default VPC will be public and accessible by the internet.

You can attach 3 different types of virtual networking card to your EC2 instance.

ENI : Elastic Network Interface : for basic, day-to-day networking.
EN : Enhanced Networking : uses Single Root I/O vrtualization(SR-IOV) to provide high performance. Speed between 10 - 100 Gbps.
EFA : Elastic Fabric Adapter , to accelerate HPC and machine learning applications, provide lower and consistent latency.

Placement groups

When you launch a new EC2 instance, the EC2 service attempts to place the instance in such a way that
all of your instances are spread out across underlying hardware to minimize correlated failures.

Depending on the type of workload, you can create a placement group using one of the following placement strategies:

ClusterSpreadPartition

Cluster placement packs instances close together inside a single Availability Zone.
This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of HPC applications.

Spread placement strictly places a small group of instances across distinct underlying hardware to reduce correlated failures.
Recommended for applications that have a small number of critical instances that should be kept separate from each other.

Partition placement spreads your instances across logical partitions such that groups of instances in one partition
do not share the underlying hardware with groups of instances in different partitions.

Each partition has its own set of racks.
This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka.

Containers ECS - EKS

Containerized application can either run on an EC2 instance or run serverlessly on AWS Fargate.

To run and manage your containers in EC2 instances mode, you need to install the Amazon ECS container agent on your EC2 instances.
An instance with the container agent installed is often called a container instance

To prepare your application to run on Amazon ECS, you create a task definition.
The task definition is a text file, in JSON format, that describes one or more containers:

{
"family": "webserver",
"containerDefinitions": [ {
"name": "web",
"image": "nginx",
"memory": "100",
"cpu": "99"
} ],
"requiresCompatibilities": [ "FARGATE" ],
"networkMode": "awsvpc",
"memory": "512",
"cpu": "256"
}

ECS VS EKS

If you already use Kubernetes, you can use Amazon EKS to orchestrate the workloads in the AWS Cloud.
Amazon EKS is conceptually similar to Amazon ECS, but with the following differences:

An EC2 instance with the ECS agent installed and configured is called a container instance. In Amazon EKS, it is called a worker node.
An ECS container is called a task. In Amazon EKS, it is called a pod.
While Amazon ECS runs on AWS native technology, Amazon EKS runs on top of Kubernetes.

Containers VS Lambda

When not to use containers:

When applications need persistent data storage
When applications have complex networking, routing or security requirements

Serverless considerations:

Will you be using multiple AWS services where one service might need to call another service?
Do your applications finish quickly?
Serverless is most suitable for applications that don't run longer than 15 minutes
Large, long-running, workloads are expensive to run on serverless and not an optimal fit for this compute type.

EC2 mode VS Fargate mode

EC2Fargate

Responsible for underlying Operating System.
EC2 pricing model.
Long-running containers.
Multiple containers share the same host.

No Operating System Access
Pay based on resources allocated and time ran.
Short-running tasks.
Isolated environments.

AWS Fargate

Fargate is a serverless compute engine for containers that works with both ECS and EKS.
AWS owns and manages infrastructure.

AWS Lambda

AWS has several serverless compute options, including AWS Fargate and AWS Lambda.
Every definition of serverless mentions the following four aspects:

No servers to provision or manage.
Scales with usage.
You never pay for idle resources.
Availability and fault tolerance are built-in.

The code that you run on Lambda is called a Lambda function. Think of a function as a small self-contained application.
After you create your Lambda function, it is ready to run as soon as it is initiated.

Each function includes your code and some associated configuration information, including the function name and resource requirements.
Lambda functions are stateless, with no affinity to the underlying infrastructure.
Lambda can rapidly launch as many copies of the function as needed to scale to the rate of incoming events.

Event sources can invoke a Lambda function in three general patterns. These patterns are called invocation models.

Invocation Models

Synchronous invocationAsynchronous invocationPolling invocation

When you invoke a function synchronously, Lambda runs the function and waits for a response.
When the function completes, Lambda returns the response from the function's code with additional data, such as the version of the function that was invoked.
Synchronous events expect an immediate response from the function invocation.

No retries by default, you should implement it in in your code.

The following AWS services invoke Lambda synchronously:

Amazon API Gateway
Amazon Cognito
AWS CloudFormation
Amazon Alexa
Amazon Lex
Amazon CloudFront

When you invoke a function asynchronously, events are queued and the requestor doesn't wait for the function to complete.
This model is appropriate when the client doesn't need an immediate response.

With the asynchronous model, you can make use of destinations. Use destinations (SNS, SQS, Lambda ...)
to send records of asynchronous invocations to other services.

Built in – retries twice

The following AWS services invoke Lambda asynchronously:

Amazon SNS
Amazon S3
Amazon EventBridge

This invocation model is designed to integrate with AWS streaming and queuing based services with no code or server management.
Lambda will poll/watch these services, retrieve any matching events, and invoke your functions.

This invocation model supports the following services:

Amazon Kinesis
Amazon SQS
Amazon DynamoDB Streams

Execution environment lifecycle

Init phaseInvoke phaseShutdown phase

Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers,
initializes any extension, initializes the runtime, and then runs the function’s initialization code (the code outside the main handler).

The Init phase is split into three sub-phases:

Extension init : starts all extensions
Runtime init : bootstraps the runtime
Function init : runs the function's static code

These sub-phases ensure that all extensions and the runtime complete their setup tasks before the function code runs.

In this phase, Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function invocation.

If the Lambda function does not receive any invocations for a period of time, this phase initiates.
In the Shutdown phase, Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment.

Runtime shutdown
Extension shutdown

Lambda Function Permissions - VPC

With Lambda functions, there are two sides that define the necessary scope of permissions:

permission to invoke the function.
controlled using an IAM resource-based policy.
permission of the Lambda function itself to act upon other services.
IAM execution role defines the permissions that control what the function is allowed to do when interacting with other AWS services.
- IAM policy + Trust policy : A trust policy defines what actions your role can assume. (To grant Lambda the "AssumeRole" permission.)

Resource policies grant permissions to invoke the function, whereas the execution role strictly controls what the function can to do within the other AWS service.

Accessing resources in a VPC

Enabling your Lambda function to access resources inside a VPC requires additional VPC-specific configuration information, such as VPC subnet IDs and security group IDs.
This functionality allows Lambda to access resources in the VPC. It does not change how the function is secured.

Configuring Your Lambda Functions

Memory : You can allocate up to 10 GB of memory to a Lambda function.
Lambda allocates CPU and other resources linearly in proportion to the amount of memory configured.
Any increase in memory size triggers an equivalent increase in CPU available to your function
Timeout : Lambda timeout value dictates how long a function can run before Lambda terminates the Lambda function.
- This limit means that a single invocation of a Lambda function cannot run longer than 900 seconds(15 minutes).
Concurrency and scaling : is the third major configuration that affects your function's performance and its ability to scale on demand.
Unreserved concurrency : the amount of concurrency that is not allocated to any specific set of functions. The minimum is 100 unreserved concurrency.
Reserved concurrency : guarantees the maximum number of concurrent instances for the function. No charge is incurred for configuring reserved concurrency for a function.
Provisioned concurrency : initializes a requested number of runtime environments so that they are prepared to respond immediately to your function's invocations.
- This option is used when you need high performance and low latency.

Limit/Reserve Concurrency - Scaling

Limit a function’s concurrency to achieve the following:

Limit costs.
Regulate how long it takes you to process a batch of events.
Match it with a downstream resource that cannot scale as quickly as Lambda.

Reserve function concurrency to achieve the following:

Ensure that you can handle peak expected volume for a critical function .
Address invocation errors.

A function concurrency of 0 is similar to an emergency brake.
Functions that are assigned more memory might actually be cheaper to run.

Scaling

Some considerations for scaling AWS Lambda functions:

Burst behavior.
Memory configuration.
Concurrency limits.

Lambda Monitoring

AWS Lambda integrates with other AWS services to help you monitor and troubleshoot your Lambda functions.

You can use Amazon CloudWatch metrics, AWS X-Ray and others like AWS CloudTrail and Dead-letter queues.

CloudWatch Lambda InsightsAWS X-Ray

Amazon CloudWatch Lambda Insights is a monitoring and troubleshooting solution for serverless applications running on Lambda.
Lambda Insights collects, aggregates, and summarizes system-level metrics.
It also summarizes diagnostic information such as cold starts and Lambda worker shutdowns to help you isolate issues with your Lambda functions and resolve them quickly.

Lambda Insights uses a new CloudWatch Lambda extension, which is provided as a Lambda layer.
When you enable this extension on a Lambda function, it collects system-level metrics and emits a single performance log event for every invocation of that Lambda function.

You can use AWS X-Ray to visualize the components of your application, identify performance bottlenecks, and troubleshoot requests that resulted in an error.
Your Lambda functions send trace data to X-Ray, and X-Ray processes the data to generate a service map and searchable trace summaries.

You can use X-Ray for:

Tuning performance
Identifying the call flow of Lambda functions and API calls
Tracing path and timing of an invocation to locate bottlenecks and failures

AWS Outposts

AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs and tools to your premises.
By providing local access to AWS managed infrastructure, you can use AWS Outposts to build
and run applications on premises using the same programming interfaces as in AWS Regions.

An Outpost is a pool of AWS compute and storage capacity deployed at a site. AWS operates, monitors, and manages this capacity as part of an AWS Region.
You can create subnets on your Outpost and specify them when you create AWS resources, such as EBS volumes, EC2/ECS/RDS instances

Instances in Outpost subnets communicate with other instances in the AWS Region using private IP addresses, all within the same VPC

Some available AWS resources on Outposts : EC2/EBS, ECS clusters, EKS nodes, S3, RDS DB instance, App LB, EMR clusters.