DEV Community: AWS Community Builders

# Treat: A Global Gifting Platform Empowering Local Businesses

Seth David Gyimah — Fri, 26 Jun 2026 04:53:04 +0000

This project was built for the purposes of entering the H0 Hackathon.

The Problem

Gifting is one of the most natural human expressions of connection. People love surprising friends, family, and colleagues with food, gifts, services, and experiences.

However, the current experience is fragmented:

People rely on multiple apps and coordination tools
Most gifting defaults to cash transfers, losing emotional intent
Real-world fulfillment is disconnected from digital intent

At the same time, millions of local businesses globally — from restaurants and food vendors to salons, hotels, and service providers — struggle to reach customers consistently without expensive marketing platforms or technical infrastructure.

This creates a gap between intentional giving and local business access to demand.

What I Built

Treat is a global gifting platform that allows anyone to send meaningful real-world experiences to people they care about.

Instead of sending money or generic gift cards, users send:

Food from local restaurants
Services (spa, salon, experiences)
Physical goods from nearby merchants

Recipients receive a secure link via SMS, claim their treat, and redeem it at participating merchants.

Treat is designed so that:

Merchants get visibility at zero marketing cost
Senders create meaningful moments instead of transactions
Recipients enjoy flexible, low-friction redemption

Live project: https://www.sendatreat.app/

How I Built It (AWS + Vercel Architecture)

Treat is a cloud-native distributed system designed for scale and real-world transaction workflows.

Frontend (Vercel)

Built with React + Vite
Deployed on Vercel via GitHub integration
Handles user experience for sending, claiming, and managing treats

Backend (AWS)

Built with Ruby on Rails API
Deployed on AWS ECS Fargate (containerized services)
Designed as a decoupled API layer for web + future mobile clients

Database Layer

Amazon Aurora PostgreSQL
Stores users, merchants, payments, claims, and transaction state

Payments

Paystack (primary in supported regions)
Handles secure card and mobile money payments
Supports instant merchant settlement after verification

Communications

SMS + Email notifications power:
- Treat delivery
- Claim flows
- OTP verification
- Merchant alerts

Key Workflow

Sender purchases a treat from a merchant
Recipient receives SMS link instantly
Recipient claims and verifies identity (OTP-based security layer)
Merchant is notified and prepares fulfillment
After verification at merchant location:
- Payment is released instantly
- Treat is served

Challenges I Ran Into

1. Designing a 3-sided marketplace flow

Unlike traditional e-commerce, Treat requires coordination between:

Sender
Recipient
Merchant

This meant designing a system where emotional gifting and transactional accuracy coexist.

2. Payments and settlement complexity

A major challenge was ensuring:

Low friction checkout for senders
Instant payout to merchants after verification
Cross-provider payment consistency across countries

This required integrating regional payment providers and limiting rollout to supported markets.

3. Real-world redemption vs digital intent

Bridging digital gifting with physical merchant redemption required careful design of:

OTP verification flows
QR-based validation
Merchant-assisted and self-checkout flows

What I’m Proud Of

Treat is now live and operational at: https://www.sendatreat.app/
Active deployment across multiple supported regions
Fully working end-to-end gifting flow (send → claim → redeem → payout)
Early adoption by local merchants and testers
Built a system that supports real-world gifting, not just digital transactions

What I Learned

Building Treat reinforced that:

Gifting is emotional, not transactional
UX matters as much as infrastructure in real-world systems
Local businesses thrive when embedded into personal moments, not just marketplaces

What’s Next

Expand to more countries and payment providers
Mobile app (iOS + Android) rollout
WhatsApp + richer notification channels
Delivery partnerships for non-pickup experiences
Corporate gifting and rewards integration

Ultimately, Treat aims to become the global infrastructure for meaningful gifting and local business discovery.

Tech Stack

AWS (ECS Fargate, Aurora PostgreSQL)
Ruby on Rails API
React + Vite
Vercel
Paystack
SMS/Email infrastructure

Try It Out

🌍 https://www.sendatreat.app/

Only 1 Post to Help You Understand the Big Picture of System Design

Hoang Guruu — Fri, 26 Jun 2026 03:17:17 +0000

Check More Resource Right Now: https://lnkd.in/gsXedZBf

98 SYSTEM DESIGN CONCEPTS FOR BEGINNERS

1. Scalability

Concept: The system can handle more users without slowing down.
Why use it: To prevent the system from becoming slow or crashing as traffic grows.
How to use it: Upgrade the server or add more servers.

2. Availability

Concept: Users can access the system whenever they need it.
Why use it: To reduce service downtime.
How to use it: Run multiple servers and prepare backup servers.

3. Reliability

Concept: The system works correctly and rarely fails.
Why use it: To provide stable and accurate results.
How to use it: Test the system, back up data, and handle errors carefully.

4. Latency

Concept: The time between sending a request and receiving a response.
Why use it: Lower latency makes the application feel faster.
How to use it: Use caching, a CDN, and faster database queries.

5. Throughput

Concept: The number of requests a system can process in a period of time.
Why use it: To measure how much work the system can handle.
How to use it: Measure requests, transactions, or data processed per second.

6. Capacity

Concept: The maximum workload a system can handle.
Why use it: To know when the system needs more resources.
How to use it: Run load tests to find CPU, memory, and request limits.

7. Client–Server

Concept: The client sends a request, and the server processes it and returns a result.
Why use it: To separate the user interface from data processing.
How to use it: A browser calls a server API to get data.

8. Database

Concept: A place where system data is stored.
Why use it: To save data permanently and find it easily.
How to use it: Store users, products, orders, and transactions.

9. SQL vs NoSQL

Concept: SQL stores data in tables, while NoSQL supports more flexible formats.
Why use it: Different systems need different ways to store data.
How to use it: Use SQL for related data and NoSQL for flexible or large-scale data.

10. Load Balancing

Concept: It shares requests across multiple servers.
Why use it: To stop one server from becoming overloaded.
How to use it: Place a load balancer in front of application servers.

11. Caching

Concept: Frequently used data is saved temporarily for faster access.
Why use it: To improve response time and reduce database load.
How to use it: Store popular data in Redis or memory.

12. Cache Invalidation

Concept: Old cached data is removed or updated.
Why use it: To prevent users from seeing outdated information.
How to use it: Clear the cache when the original data changes or expires.

13. CDN

Concept: A network of servers that stores content in different locations.
Why use it: To help users load images, videos, and files faster.
How to use it: Store static files on servers close to users.

14. DNS

Concept: It changes a domain name into an IP address.
Why use it: Users can remember a name instead of a number.
How to use it: Point a domain name to a server or load balancer.

15. API Design

Concept: The way applications communicate with each other.
Why use it: To make APIs clear, easy to use, and easy to maintain.
How to use it: Clearly define URLs, inputs, outputs, and error codes.

16. REST

Concept: An API style that uses URLs and HTTP methods.
Why use it: It is simple, popular, and easy to connect with other systems.
How to use it: Use GET, POST, PUT, and DELETE for common actions.

17. GraphQL

Concept: The client asks for exactly the data it needs.
Why use it: To avoid receiving too much or too little data.
How to use it: Send a query containing the required fields.

18. gRPC

Concept: A fast way for services to communicate.
Why use it: It is often faster and smaller than text-based APIs.
How to use it: Define functions with Protocol Buffers and call them from another service.

19. Authentication

Concept: It checks who the user is.
Why use it: To stop strangers from accessing an account.
How to use it: Use passwords, OTPs, tokens, or biometrics.

20. Authorization

Concept: It checks what a user is allowed to do.
Why use it: To stop users from accessing functions they do not have permission to use.
How to use it: Check roles and permissions before allowing an action.

21. Rate Limiting

Concept: It limits how many API requests can be sent in a period of time.
Why use it: To prevent spam, attacks, and excessive usage.
How to use it: For example, allow each user 100 requests per minute.

22. Fault Tolerance

Concept: The system still works when one part fails.
Why use it: To stop one small failure from crashing the whole system.
How to use it: Use backup servers, retries, and automatic failover.

23. High Availability

Concept: The system is almost always available.
Why use it: To reduce service interruptions.
How to use it: Run several copies of the system on different servers or locations.

24. CAP Theorem

Concept: During a network failure, a distributed system must choose between consistency and availability.
Why use it: To help choose the right design for a distributed system.
How to use it: Choose the priority based on the system, such as banking or social media.

25. Consistency Models

Concept: They define when users can see the latest data.
Why use it: To balance correct data with system speed and availability.
How to use it: Choose strong consistency or eventual consistency.

26. Replication

Concept: Data is copied to several servers.
Why use it: To improve reading speed and reduce the risk of data loss.
How to use it: One main server writes data, and other servers keep copies.

27. Partitioning

Concept: A large amount of data is divided into smaller parts.
Why use it: To make data easier to store and process.
How to use it: Divide data by date, region, customer, or type.

28. Sharding

Concept: Data is divided across multiple database servers.
Why use it: So one database does not need to store everything.
How to use it: Divide users by ID, country, or region.

29. Indexing

Concept: An index works like a table of contents for data.
Why use it: To find data without scanning the entire table.
How to use it: Add indexes to columns that are often searched or sorted.

30. Denormalization

Concept: Some data is copied to make reading faster.
Why use it: To reduce the number of table joins.
How to use it: Store the customer name directly inside an order record.

31. ACID

Concept: A set of rules that keeps database transactions safe.
Why use it: To prevent incorrect or incomplete updates.
How to use it: Use transactions for payments, transfers, and orders.

32. BASE

Concept: The system stays available even when data is not immediately the same everywhere.
Why use it: It works well for large distributed systems.
How to use it: Allow servers to synchronize data after a short delay.

33. Microservices

Concept: An application is divided into small independent services.
Why use it: Each service can be developed and deployed separately.
How to use it: Separate user, payment, and order services.

34. Monolith

Concept: All application functions are inside one large application.
Why use it: It is easier to build and deploy at the beginning.
How to use it: Keep the interface, business logic, and data access in one project.

35. Event-Driven Architecture

Concept: The system reacts when an event happens.
Why use it: To reduce direct dependency between system components.
How to use it: When an order is created, send events to email and inventory services.

36. Message Queue

Concept: A queue stores tasks that need to be processed.
Why use it: The sender does not need to wait for the receiver.
How to use it: Put email sending or image processing into a queue.

37. Pub/Sub

Concept: One service publishes an event, and many services receive it.
Why use it: One event can trigger several actions.
How to use it: A successful payment event can update inventory, email, and reports.

38. Synchronous vs Asynchronous

Concept: Synchronous work waits for a result; asynchronous work continues without waiting.
Why use it: To choose the right processing style for each task.
How to use it: Use synchronous processing for payments and asynchronous processing for emails.

39. Idempotency

Concept: Sending the same request many times still creates only one final result.
Why use it: To avoid duplicate payments or orders.
How to use it: Give each payment request a unique ID.

40. Backpressure

Concept: The receiver asks the sender to slow down.
Why use it: To prevent the receiver from becoming overloaded.
How to use it: Limit the queue size or process data in smaller groups.

41. Circuit Breaker

Concept: The system temporarily stops calling a failing service.
Why use it: To avoid sending more requests to a service that is already broken.
How to use it: Stop calls after several failures and try again later.

42. Bulkhead

Concept: System parts are separated from each other.
Why use it: A failure in one part does not affect the whole system.
How to use it: Give each service its own resources and connection pool.

43. Retry Logic

Concept: The system automatically tries again after a failure.
Why use it: To recover from temporary network or service problems.
How to use it: Retry after 1 second, then 2 seconds, then 4 seconds.

44. Timeout

Concept: The system stops waiting after a set amount of time.
Why use it: To stop requests from waiting forever.
How to use it: Set a maximum waiting time for APIs and databases.

45. Service Discovery

Concept: Services can find the addresses of other services.
Why use it: Service addresses may change when the system scales.
How to use it: Register service addresses in a central system.

46. API Gateway

Concept: A central entry point that sends requests to the correct service.
Why use it: To manage routing, authentication, and rate limits in one place.
How to use it: The client calls the gateway, and the gateway calls the correct service.

47. Load Shedding

Concept: The system rejects some requests when it is overloaded.
Why use it: To keep important services working.
How to use it: Drop low-priority requests or return a “system busy” message.

48. Autoscaling

Concept: The system automatically adds or removes servers.
Why use it: To handle high traffic and save money during low traffic.
How to use it: Add servers when CPU usage or traffic passes a limit.

49. Blue-Green Deployment

Concept: The old and new versions run in two separate environments.
Why use it: To release a new version with little downtime.
How to use it: Test the new environment and then move traffic to it.

50. Canary Release

Concept: A small group of users receives the new version first.
Why use it: To find problems before releasing to everyone.
How to use it: Send 5% of traffic to the new version and increase it slowly.

51. Feature Flags

Concept: A switch that turns a feature on or off.
Why use it: To control releases without deploying new code.
How to use it: Enable a feature for employees or a small user group first.

52. Observability

Concept: The ability to understand what is happening inside the system.
Why use it: To find the cause of problems faster.
How to use it: Combine logs, metrics, and tracing.

53. Logging

Concept: Recording events that happen inside the system.
Why use it: To investigate errors and understand system activity.
How to use it: Record time, requests, errors, and processing details.

54. Metrics

Concept: Numbers that show the health and performance of the system.
Why use it: To know whether the system is fast, slow, or overloaded.
How to use it: Track CPU, memory, requests, error rate, and latency.

55. Tracing

Concept: Following one request through multiple services.
Why use it: To find which service caused a delay or error.
How to use it: Add a trace ID and record every processing step.

56. Correlation ID

Concept: A shared ID used by all logs for the same request.
Why use it: To find the complete history of one request.
How to use it: Create the ID when the request starts and pass it through all services.

57. Monitoring

Concept: Continuously watching the system.
Why use it: To find problems before they affect many users.
How to use it: Use dashboards to watch servers, databases, and APIs.

58. Alerting

Concept: Sending a warning when the system has a problem.
Why use it: To help the operations team respond quickly.
How to use it: Send an email or message when errors or CPU usage become too high.

59. Full-Text Search

Concept: Searching for words or sentences inside text.
Why use it: To quickly search articles, products, or documents.
How to use it: Index the content with Elasticsearch or a similar tool.

60. Time Series

Concept: Data recorded at different points in time.
Why use it: To track how something changes over time.
How to use it: Store CPU usage, stock prices, or temperature readings.

61. Vector Database

Concept: A database that finds data with similar meaning.
Why use it: It is useful for AI search, images, and similar text.
How to use it: Convert data into vectors and search for the nearest vectors.

62. Materialized View

Concept: A query result that is calculated and saved in advance.
Why use it: To make complex reports load faster.
How to use it: Save daily or monthly sales summaries.

63. Query Optimization

Concept: Improving a database query so it runs faster.
Why use it: To reduce latency and resource usage.
How to use it: Add indexes, read less data, and check the query plan.

64. Connection Pooling

Concept: Reusing database connections that are already open.
Why use it: Opening a new connection for every request is slow and expensive.
How to use it: Create a shared pool of database connections.

65. Cache Stampede

Concept: Many requests hit the database when cached data expires.
Why use it: It must be prevented because it can overload the database.
How to use it: Lock cache refreshes or use different expiration times.

66. Cache Warming

Concept: Loading data into the cache before users request it.
Why use it: The first user does not need to wait for the database.
How to use it: Load popular products into the cache before peak hours.

67. CDN Caching

Concept: Saving copies of content on CDN servers.
Why use it: To help users in different regions load content faster.
How to use it: Cache images, videos, CSS, and JavaScript files.

68. Data Compression

Concept: Making data smaller before storing or sending it.
Why use it: To save storage space and network bandwidth.
How to use it: Compress files, images, or API responses with gzip.

69. Serialization

Concept: Changing an object into a format that can be stored or sent.
Why use it: Systems cannot directly send objects from memory.
How to use it: Convert objects into JSON, XML, or Protocol Buffers.

70. Deserialization

Concept: Changing received data back into an object.
Why use it: The application needs objects to continue processing.
How to use it: Convert a JSON response into an application object.

71. WebSockets

Concept: The client and server keep a two-way connection open.
Why use it: To send real-time updates without repeated requests.
How to use it: Use it for chat, notifications, and live prices.

72. WebRTC

Concept: A technology for real-time audio, video, and data communication.
Why use it: To support low-latency video calls.
How to use it: Use it for online meetings, video calls, and screen sharing.

73. CQRS

Concept: Reading data and writing data use separate models.
Why use it: Each side can be optimized for its own purpose.
How to use it: Use one model for updates and another for displaying data.

74. Event Sourcing

Concept: Every change to the data is stored as an event.
Why use it: To view history and rebuild an earlier state.
How to use it: Store events such as order created, paid, and cancelled.

75. Service Mesh

Concept: A layer that manages communication between microservices.
Why use it: To manage security, routing, and monitoring in one place.
How to use it: Place a proxy beside each service to manage traffic.

76. Sidecar

Concept: A supporting component that runs beside the main application.
Why use it: To add functions without changing much application code.
How to use it: Use a sidecar for proxying, logging, or security.

77. BFF – Backend for Frontend

Concept: Each type of frontend has its own backend.
Why use it: Web and mobile applications often need different data.
How to use it: Create one backend for the web and another for mobile.

78. Strangler Pattern

Concept: Replacing an old system one part at a time.
Why use it: To reduce the risk of rewriting the entire system.
How to use it: Move each function from the old system to the new system gradually.

79. LSM Trees

Concept: A data structure designed for fast writing.
Why use it: It works well for systems that write a lot of data.
How to use it: Write data to memory first and organize it on disk later.

80. B-Trees

Concept: A tree structure that helps find data quickly on disk.
Why use it: It is useful for exact searches and range searches.
How to use it: Databases often use B-Trees for indexes.

81. Merkle Trees

Concept: A tree of hashes that represents data.
Why use it: To compare and check data without reading everything.
How to use it: Compare hashes between nodes to find different data.

82. Bloom Filter

Concept: A fast structure that checks whether data may exist.
Why use it: To avoid database queries when data definitely does not exist.
How to use it: Check the Bloom Filter before reading the real database.

83. HyperLogLog

Concept: An algorithm that estimates the number of unique items.
Why use it: To count approximately while using very little memory.
How to use it: Estimate unique users or IP addresses.

84. MapReduce

Concept: Data is divided, processed, and then combined.
Why use it: To process very large datasets across many machines.
How to use it: Map processes the data, and Reduce combines the results.

85. Batch Processing

Concept: Data is collected and processed in groups.
Why use it: It works well when results are not needed immediately.
How to use it: Generate daily reports or calculate salaries at the end of the month.

86. Stream Processing

Concept: Data is processed as soon as it arrives.
Why use it: To produce near real-time results.
How to use it: Process transactions, sensor data, or user events continuously.

87. ETL

Concept: Extract data, transform it, and load it into another system.
Why use it: To clean and standardize data from different sources.
How to use it: Extract sales data, clean it, and load it into a reporting system.

88. Data Pipeline

Concept: An automatic flow that moves and processes data.
Why use it: To reduce manual work and keep data updated.
How to use it: Connect data sources, processing steps, and storage systems.

89. Data Lake

Concept: A storage system for many types of raw data.
Why use it: To store data now and decide how to use it later.
How to use it: Store logs, files, images, videos, and sensor data.

90. Data Warehouse

Concept: A storage system for cleaned and organized reporting data.
Why use it: To make business analysis faster and more consistent.
How to use it: Combine sales, customer, and financial data.

91. Secrets Management

Concept: Secure management of passwords, tokens, and API keys.
Why use it: To stop secret information from appearing in source code.
How to use it: Store secrets in a secure tool and control access.

92. RBAC

Concept: Permissions are given based on user roles.
Why use it: To manage access more easily when there are many users.
How to use it: Create roles such as Admin, Manager, and User.

93. SSO

Concept: Users sign in once and access multiple systems.
Why use it: Users do not need to remember many passwords.
How to use it: Connect applications to one central login system.

94. Encryption

Concept: Data is changed into a form that strangers cannot read.
Why use it: To protect data when it is stored or sent over a network.
How to use it: Encrypt HTTPS traffic, databases, files, and sensitive information.

95. Checksum

Concept: A value used to check whether data has changed.
Why use it: To detect damaged or modified files.
How to use it: Calculate the checksum before and after transfer, then compare the values.

96. Erasure Coding

Concept: Data is divided into pieces that can be used for recovery.
Why use it: Data can still be restored even when some pieces are lost.
How to use it: Store the pieces on different disks or servers.

97. Consensus

Concept: Multiple servers agree on the same state or result.
Why use it: To prevent servers from storing different results.
How to use it: Servers vote and accept the result supported by the majority.

98. Leader Election

Concept: The servers choose one server to coordinate the others.
Why use it: To prevent several servers from doing the same important task.
How to use it: Servers vote, and when the leader fails, they choose a new leader.

Unit Prices Are Falling, So Why Are the Bills Going Up? Tokenomics for AI Platform Owners

Kento IKEDA — Thu, 25 Jun 2026 21:28:23 +0000

"Model unit prices keep falling, yet our monthly AI bill keeps climbing." If you use AI personally, you can feel the creep of your subscription and metered charges. If you own AI usage inside a company, the gap is even more pronounced.

Overseas, this feeling has started getting a name: Tokenomics. On June 3, 2026, the Linux Foundation announced its intent to launch the Tokenomics Foundation, dedicated to open standards for AI cost management. Google, Microsoft, Oracle, JPMorganChase, and others — both providers and large buyers — are on board.

https://www.linuxfoundation.org/press/linux-foundation-announces-the-intent-to-launch-the-tokenomics-foundation-to-establish-open-standards-for-ai-cost-management

This post isn't an explainer of the word itself. It's an account of what changes for the people who own internal generative AI usage — the platform owners, the FinOps practitioners, the engineering leaders watching the bills — once you have this word in your vocabulary.

What Tokenomics gives you isn't another saving technique. It changes the unit of measurement and the lens through which you read AI cost.

Why Tokenomics, why now

Tokenomics sits in the lineage of cloud FinOps. The FinOps Foundation now classifies Tokenomics as the "AI Value" dimension within FinOps for AI. Where cloud FinOps tracked the variable infrastructure costs (compute, storage, networking) against value, Tokenomics tracks the variable cost of intelligence itself. It's not a replacement; it adds a probabilistic, non-deterministic layer of variable cost on top.

Tokens here means what you see on every API price sheet and usage dashboard — the smallest unit a language model reads and writes, the unit of compute. The word "tokenomics" also exists in the crypto world, but that one is about issuance, distribution, and incentives on a blockchain — tokens as units of ownership. Same word, different economies.

https://www.finops.org/insights/token-economics-the-atomic-unit-of-ai-value/

The term gained traction from spring 2026 onward. Generative AI and agents moved from pilots to production, and tokens became the largest and fastest-growing line item in many technical budgets. Per-token prices fell, but usage volume rose even faster, and bills became harder to read. The Foundation launch is industry's response: a venue to align on a common yardstick for tokens, the way cloud costs were once aligned.

As a follow-on, the annual FinOps X conference will be renamed Tokenomicon starting 2027. The word is settling into its own institutional shape.

From here, four shifts in how a platform owner sees AI cost.

Shift 1: Budget on the trajectory of consumption, not on the unit price

The first thing to change is where you anchor your budget. Stop drawing comfort from "unit prices keep dropping" and start watching the trajectory of total consumption.

Per-million-token prices for general-purpose models fell sharply from 2023 to 2025. Recently they've plateaued, while the top-tier and reasoning models have actually gone up. Yet enterprise spending keeps growing. The reason is demand elasticity: when prices drop, organizations widen modalities (text → images → video), increase agent autonomy, and lengthen reasoning chains. The volume grows faster than the price falls.

The scale shows in numbers companies publish openly. At Google I/O 2026, Google announced monthly processing of 32 quadrillion tokens across its AI products, roughly 7x the 4.8 quadrillion of the previous year. AT&T reported scaling its internal "Ask AT&T" GenAI platform from about 8 billion tokens/day to about 27 billion tokens/day after restructuring orchestration into a multi-agent setup — 3x the volume at about 90% lower cost. The IEA noted that AI-related data center electricity demand grew about 50% in 2025 alone (against overall electricity demand growth of about 3%), and attributed the gap to a surge in AI usage (roughly 3x monthly active users and 5x revenue at major model providers).

What matters: consumption is not linear in user-visible activity. A single query that triggers a RAG pipeline, hits a reasoning model, and makes several tool calls can consume tens to hundreds of times the tokens of a direct prompt to a small model. Agent-to-agent communication is itself a cost. The research community has started calling this overhead "communication tax".

https://openreview.net/forum?id=0iLbiYYIpC

Breaking down where consumption accumulates, one request typically stacks up across five elements:

These multiply rather than add, which is why the total is unreadable from surface-level activity.

For a platform owner, the action is clear: stop projecting budgets from last quarter's actuals and price trendlines. Assume that any expansion of use case will spike consumption, and put the trajectory itself on the dashboard. Unit price is no longer the subject of the budget conversation. Total consumption is.

Shift 2: Treat tokens as an invisible cost category

The next shift is to see tokens as a hidden cost category and start watching it deliberately.

Cloud instances can be resized. Storage can be audited. Tokens lack that tactile feedback. They flow quietly through every agent loop, every retrieval call, every reasoning step, and pile up as a cost no one budgeted. This is the property the Tokenomics discussion keeps pointing at.

What amplifies the invisibility is metered billing hidden inside SaaS subscriptions. What looks like a flat monthly subscription to a developer tool or business app is, in reality, a token meter waiting to spin up. Roll out AI tools, and you can get bills the seat count can't explain. The examples are not hypothetical:

Cursor moved to usage-based pricing in June 2025. With long-context agent usage, effective spend ballooned by orders of magnitude for some users. On July 4, the CEO had to issue a public apology and offer refunds.

https://cursor.com/blog/june-2025-pricing

Kiro launched with a pricing model that charged spec and vibe requests at a 5:1 ratio, immediately drew criticism, and the company officially acknowledged a bug that caused requests to be over-consumed.

https://kiro.dev/blog/important-pricing-updates/

The common pattern: subscription prices no longer signal your budget. The seat fee is a floor. What you actually pay is determined by usage, not seat count.

What a platform owner should do first is finish visibility before reaching for optimization techniques. Build a state where you can break down — by model, by product, by team, by environment — who is consuming how much. Surface the tokens hiding inside SaaS, too. Without that foundation, the optimization conversation has nothing to stand on.

Shift 3: Solve reduction by design, not by discipline

The third shift is in how you think about cost reduction. Reducing tokens isn't a matter of restraint; it's a design problem. And the levers from the supply side have arrived.

1. Model routing. Instead of sending every query to the top-tier model, route to the cheapest model that can still answer. FrugalGPT, an academic approach, tries smaller models first and only escalates when needed — reporting up to 98% cost reduction vs GPT-4. RouteLLM (UC Berkeley) reports up to 85% cost reduction while preserving conversational quality. Amazon Bedrock offers this as a managed service (intelligent prompt routing) with up to 30% reduction officially advertised. Routing is no longer research-only; it's a real option from both research and managed services.

https://arxiv.org/abs/2305.05176

https://arxiv.org/abs/2406.18665

https://aws.amazon.com/bedrock/intelligent-prompt-routing/

2. Tool calls as code. Hand an agent a list of tool definitions and the definitions ride in the context every turn. Cloudflare's "Code Mode" has the agent write code that calls the tools instead. They report compressing the tool definitions of an MCP server exposing 2,500 APIs from about 1.17M tokens to about 1,000 tokens — 99.9% compression. Anthropic independently presented the same pattern as "Code Execution with MCP." This isn't a vendor-specific quirk anymore.

https://blog.cloudflare.com/code-mode-mcp/

https://www.anthropic.com/engineering/code-execution-with-mcp

3. Context compression. In a RAG pipeline, only a small fraction of the retrieved text contributes to the answer; the rest is noise that wastes tokens. If you prune it, you cut the tokens the LLM sees. Zilliz, a vector database vendor, reports 70–80% token reduction by sentence-level relevance filtering that drops weakly related sentences.

https://milvus.io/blog/semantic-highlighting-model-for-rag-context-pruning-and-token-saving.md

4. Data format choice. The serialization format you hand the LLM directly affects token volume. Microsoft's Data Science engineering blog shows that function-calling-based structured output is more token-efficient than free-form JSON for the same result. For tabular data, CSV/TSV or newer LLM-oriented formats like TOON can use 30–60% fewer tokens than JSON. Data format is a functional decision and a cost decision at the same time.

https://medium.com/data-science-at-microsoft/token-efficiency-with-structured-output-from-language-models-be2e51d3d9d5

Lining these up by reported savings and ease of adoption (difficulty is a rough indicator):

Lever	Reported reduction	Adoption difficulty
Data format choice	30–60% vs JSON	Low
Model routing	up to 98% (FrugalGPT), 85% (RouteLLM)	Medium
Context compression	70–80%	Medium
Tool calls as code (Code Mode)	~99.9% on MCP definitions	Medium–High

For a platform owner, the takeaway is the recognition that savings opportunities live in design, not in operations. Most of these can be set as organizational policy — pick a default output format, install routing, decide how tools are exposed. Not "try harder" at the team level, but "decide the standard" at the platform level. Of the four, choosing a default output format is probably the lowest-friction starting point.

Shift 4: Measure by outcome, not by volume

The last shift is in what you measure. Move from raw consumption to cost per outcome.

Counting tokens as if they were uniform misses something real. Tokens spent on a retry due to insufficient quality versus tokens in a first-shot usable response carry the same cost but different value. Tokens an agent burns going in circles look like tokens but don't translate into outcome. LLM inference research has a name for this: goodput — the throughput that meets your SLOs (latency, quality targets). Benchmarks like SemiAnalysis's InferenceX have adopted this view. What an enterprise actually buys isn't raw token volume but the usable-output portion of it.

https://bentoml.com/llm/inference-optimization/llm-inference-metrics

https://inferencex.semianalysis.com/

When you only chase volume, cost judgment goes off. What you should be watching is the fraction of tokens that yielded usable results (the yield after retries and quality misses) and cost per inference / per workflow / per outcome.

What matters most for a platform owner is keeping the balance between volume and value. Using 10x the tokens for 100x the value is economically right. Cutting tokens to a tenth and getting unusable output is not a saving. Conversely, token spend that doesn't translate into value is plain waste: verbose system prompts, oversized contexts, overuse of expensive models, tool design that ships full documents when MCP could extract only what's needed. There's also an organizational failure mode — using token usage itself as a performance metric encourages meaningless AI use just to game the number, as several reports have documented. Cost-per-outcome as the indicator prevents both directions of failure: the cost-cutting order that kills quality, and the value-disconnected consumption that gets ignored.

What the four shifts share

The four shifts look distinct, but they collapse into two underlying moves.

The first is changing what unit you look at. From unit price to consumption trajectory (Shift 1). From token volume to cost per outcome (Shift 4). Both reset the meter.

The second is making it visible, then putting your hands on it. Token spend hides inside SaaS and variable cost, so visibility is the prerequisite (Shift 2). Once visible, design levers — not team effort — drive the reduction (Shift 3).

Changing how you measure without acting changes nothing. Acting without changing how you measure tends to overshoot, killing quality in the name of savings. Each half alone falls short. When both arrive, AI cost shifts from something to watch by intuition to something to operate with grounding.

Pushback worth pre-empting

Four objections are worth addressing up front.

Isn't this just FinOps for AI? Largely yes. The FinOps Foundation itself positions Tokenomics within FinOps for AI, specifically in the "AI Value" topic. Tokenomics is not a new methodology; it's a chunk of FinOps for AI with its own name. That said, getting a proper name and an institutional vessel does something on its own. It doesn't mean cross-team discussion and cross-vendor comparison suddenly work — internal vocabulary takes time to spread, and shared data formats need adoption. But laying the foundation for a shared language is itself worth tracking. Think of it less as a new technique and more as infrastructure for agreement starting to form.

https://www.finops.org/topic/ai-value/

Doesn't Tokenomics narrow vision down to just tokens? A real concern. Tokens are the most measurable layer of AI cost. Beneath that sit SaaS-embedded variable costs and operational/governance costs. If you self-host models, you also carry GPU/compute/storage, data transfer, and training costs underneath.

Tokens get the spotlight because they're growing fastest, hiding hardest, and have the most-formed vocabulary. A reasonable starting point — not the whole story. Worth holding that distinction.

We don't use that many tokens. Possibly true. Possibly just invisible. The SaaS-embedded portion shows up as a flat monthly fee or a rolled-up invoice, not as itemized token usage. "Don't use" vs. "don't see" only separates when you visualize. Building visibility while scale is small beats chasing it after the bill explodes.

Unit prices keep falling — why not just wait? Falling prices apply mostly to general-purpose models. Top-tier and reasoning models are a different story. Industry estimates consistently put agent-style workloads at 5–30x the token consumption of the same task in chat form. The lower-tier price drops get swallowed by the upper-tier consumption growth. Waiting works less well as your usage shifts toward the upper tiers.

https://www.bigeye.com/blog/how-to-track-ai-agent-costs-and-token-usage

https://arxiv.org/abs/2604.22750

Where to start

No universal recipe. The first step varies with maturity and with which layer (self-hosted API, SaaS-embedded, self-hosting) your AI usage sits on. Still, a common order exists.

Start with visibility. Before optimization techniques, build the state where you can break down — by user, model, product, environment — who's consuming how much. Without this, every later judgment is a guess. The tagging exercise itself raises questions worth surfacing: prod vs. staging splits, product and team boundaries, cost allocation logic that everyone can stomach. The setup work doubles as an on-ramp for FinOps awareness inside the organization.

Next, audit billing models. For each AI-bearing SaaS and API in use, lay out the floor (the recurring portion) and the variable behavior. Once you suspend the "subscription = fixed cost" assumption, the location of budget risk looks different. Provider-side moves matter too — for example, Anthropic's April 2026 pricing structure change. Decisions about extending the recurring footprint and managing variable-cost blow-up become separate agenda items.

Then set design levers as policy. The default output format, routing, how tools are exposed. Don't leave it to the field; pick the standard from the platform. As Shift 3 noted, the default output format is the lightest place to start exercising platform authority.

Finally, push the metric from volume toward outcome. Watching cost per outcome and token yield keeps the cost-cutting order from killing quality. It also blocks the gaming pattern where token usage as a KPI breeds meaningless AI use, as Shift 4 noted. The metric step comes last, but how you align it determines how well the previous three actually deliver.

Tokenomics isn't a new saving trick. It's an auxiliary line for reading AI cost as an economy — as the relationship between volume and value. With the word settling into shared use overseas, holding the lens early, while owning AI inside your organization, is itself the first step.

Not getting hooked on per-token price moves, but reading the relationship between volume and value — that's the kind of attention platform owners will be asked for going forward.

You Don't Need an AWS Account to Learn AWS

Sarvar Nadaf — Thu, 25 Jun 2026 20:26:13 +0000

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀

I once woke up to a $14 AWS bill because I forgot to stop an EC2 instance overnight. For a student, that felt like $1400. That one mistake made me afraid to touch anything in AWS for the next three weeks.

I didn't come from a coding background. I didn't have anyone to guide me. I watched videos, read documentation, tried things on the AWS console, and every single time I was terrified of one thing the bill. After that incident, I stopped experimenting freely. I played it safe. And playing it safe is the worst thing you can do when you're trying to learn cloud.

That fear slowed me down by months.

In my first real project, I deployed a Lambda function that wrote to DynamoDB. During development, I never tested what happens when DynamoDB throttles writes because I was too afraid to generate real traffic on AWS. In production, messages started disappearing silently. I spent two days debugging something I could have caught in five minutes if I had a safe local environment to test against.

Today, after years of working as a Cloud Architect across multiple companies and countries, I can tell you this with full confidence the single most important thing that separates a good cloud engineer from a mediocre one is hands-on practice. Not certifications. Not watching 40 hours of video. Hands-on. Breaking things. Fixing things. Understanding why something failed and how to bring it back.

A few months ago I found a tool that solves all of this. It's called Floci an open-source local AWS emulator. You run it in Docker, it gives you 58 AWS services at localhost, and it costs nothing. No AWS account. No credit card. No sign-up. No auth token.

Let me show you how it works.

The Problems You're Probably Facing Right Now

Fear of billing. Every time you try something new creating a VPC, launching an instance, testing Lambda there's this voice saying "what if I forget to delete this?" That fear kills curiosity.

No safe playground. The AWS Free Tier helps, but it has limits. Cross those limits once, and you get charged. For a student or a fresher, even a small unexpected bill feels massive.

Theory without practice. You watched hours of videos about S3 and DynamoDB. You can explain what they are. But can you actually use them? Can you create a bucket from the command line? Can you put data into a table and get it back? If not, you haven't really learned it.

Starting over after every mistake. When something breaks, most beginners delete everything and recreate it. They never learn to troubleshoot. They never learn why it broke. They just run away from the problem and start fresh. I did this for months.

If any of this sounds familiar, keep reading.

Before We Start: What You Need

Two things:

Docker a tool that runs applications in isolated containers. Think of it as a lightweight virtual machine. Floci runs inside Docker.
AWS CLI the command-line tool to interact with AWS services (optional, but I strongly recommend it)

That's it. Here's how to install both:

Note: Everything in this tutorial runs 100% offline on your own machine Mac, Windows, or Linux. I'm using an EC2 instance to demonstrate the steps, but that's just my setup. You don't need a cloud server. The exact same commands work on your laptop, completely offline, with no internet required after the initial install. That's the whole point.

Install Docker

Windows / Mac: Download and install Docker Desktop. Open it once installed that's all.

Ubuntu / Debian:

sudo apt-get update && sudo apt-get install -y docker.io docker-compose-v2
sudo systemctl start docker
sudo usermod -aG docker $USER

Amazon Linux / RHEL / Fedora:

sudo dnf install -y docker
sudo systemctl start docker
sudo usermod -aG docker $USER

Install Docker Compose plugin

sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" \
  -o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose

After running the Linux commands above, log out and log back in (or run newgrp docker) so the group change takes effect. Otherwise you'll get "permission denied" errors.

Install AWS CLI

Ubuntu / Debian / Amazon Linux:

curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install

(If unzip is not found, install it first: sudo apt-get install -y unzip or sudo dnf install -y unzip)

Mac:

curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

Windows: Download and run the installer from:
https://awscli.amazonaws.com/AWSCLIV2.msi

Verify the installation on any platform:

aws --version

Getting Floci Running

Open your terminal, create a new folder, and create the compose file:

mkdir floci-playground && cd floci-playground

Now create a file called docker-compose.yml inside this folder. You can use any text editor VS Code, nano, or even notepad. Paste this content:

services:
  floci:
    image: floci/floci:latest
    ports:
      - "4566:4566"
    volumes:
      - ./data:/app/data

Start it:

docker compose up -d

If you get docker compose: command not found, try the older syntax: docker-compose up -d. Both work the same way.

Wait a few seconds, then verify it's running:

# Linux / Mac
curl http://localhost:4566/_localstack/health

# Windows (PowerShell)
Invoke-RestMethod http://localhost:4566/_localstack/health

You should see something like this:

{"services":{"s3":"running","sqs":"running","dynamodb":"running","lambda":"running",...},"version":"1.5.25"}

All services showing "running" you're good to go.

Now tell your AWS CLI to talk to Floci instead of real AWS:

# Linux / Mac
export AWS_ENDPOINT_URL=http://localhost:4566
export AWS_DEFAULT_REGION=us-east-1
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test

# Windows (PowerShell)
$env:AWS_ENDPOINT_URL="http://localhost:4566"
$env:AWS_DEFAULT_REGION="us-east-1"
$env:AWS_ACCESS_KEY_ID="test"
$env:AWS_SECRET_ACCESS_KEY="test"

The credentials can be anything. I use test because it's simple. Floci doesn't validate them it just needs non-empty values.

Your First Win: Create an S3 Bucket

S3 is the storage backbone of AWS. Logs go to S3. Backups go to S3. Static websites live on S3. If you understand S3, you already understand 30% of how AWS works.

aws s3 mb s3://my-first-bucket

Output:

make_bucket: my-first-bucket

That's it. You just created a bucket. One command. No console. No waiting.

Now upload a file:

echo "hello from my laptop" | aws s3 cp - s3://my-first-bucket/greeting.txt

Check if it's there:

aws s3 ls s3://my-first-bucket

2026-06-16 11:43:58         21 greeting.txt

Read it back:

aws s3 cp s3://my-first-bucket/greeting.txt -

hello from my laptop

That felt good, right? You just did exactly what production systems do store and retrieve data from S3. The same commands. The same behavior. When you move to real AWS someday, nothing changes except where the data lives.

Challenge: Upload 3 different files to your bucket, then try to delete the bucket without emptying it first. What error do you get? Now figure out how to fix it using only the CLI. This exact scenario comes up in every cloud job.

Create a DynamoDB Table and Store Data

DynamoDB confused me for weeks when I started. But once I actually created a table and put data into it, everything clicked. Let's make that happen for you right now.

Create a table:

aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Put an item in it:

# Linux / Mac
aws dynamodb put-item \
  --table-name Users \
  --item '{"id":{"S":"user-001"},"name":{"S":"Sarvar"},"role":{"S":"Cloud Architect"}}'

# Windows (PowerShell) - use double quotes and escape inner quotes
aws dynamodb put-item --table-name Users --item '{\"id\":{\"S\":\"user-001\"},\"name\":{\"S\":\"Sarvar\"},\"role\":{\"S\":\"Cloud Architect\"}}'

A quick note on the JSON format DynamoDB requires you to specify the data type for each value. "S" means String, "N" means Number. It looks verbose at first, but you get used to it quickly.

Get it back:

aws dynamodb get-item \
  --table-name Users \
  --key '{"id":{"S":"user-001"}}'

{
    "Item": {
        "id": { "S": "user-001" },
        "name": { "S": "Sarvar" },
        "role": { "S": "Cloud Architect" }
    }
}

You just stored and retrieved structured data from a NoSQL database. That's real cloud development.

Challenge: Put another item with the same id but a different name. Does it overwrite? Does it error? Now put 5 different users and try aws dynamodb scan --table-name Users to get all of them. This is how you learn database behavior by seeing it happen, not reading about it.

Send and Receive Messages with SQS

Almost every production system uses message queues. Order processing, notifications, async workflows queues are everywhere.

Create a queue:

aws sqs create-queue --queue-name orders

Send a message:

aws sqs send-message \
  --queue-url http://localhost:4566/000000000000/orders \
  --message-body '{"event":"order.placed","item":"cloud-book"}'

(That 000000000000 is the default AWS account ID that Floci uses. In real AWS, it would be your actual 12-digit account number.)

Receive it:

aws sqs receive-message \
  --queue-url http://localhost:4566/000000000000/orders

{
    "Messages": [
        {
            "MessageId": "ba4ea32c-cfdf-4c28-b705-8bcbb4d8a0d0",
            "Body": "{\"event\":\"order.placed\",\"item\":\"cloud-book\"}"
        }
    ]
}

Your message comes back exactly as you sent it.

Challenge: Receive the same message again. What happens? It disappears for about 30 seconds (this is called "visibility timeout" the time SQS hides a message after someone reads it, giving them time to process it). Wait 30 seconds and try again. It comes back. Now try deleting it after receiving. This is exactly how real applications process queues.

Create a Secret in Secrets Manager

Every application has passwords, API keys, and database credentials. Secrets Manager is where you store them securely.

aws secretsmanager create-secret \
  --name my-app/db-password \
  --secret-string "super-secret-password-123"

Retrieve it:

aws secretsmanager get-secret-value --secret-id my-app/db-password

{
    "Name": "my-app/db-password",
    "SecretString": "super-secret-password-123"
}

That's how real applications fetch database credentials at runtime instead of hardcoding them in code. Simple, but incredibly important in production.

Store Configuration in SSM Parameter Store

Parameter Store is where you keep application configuration feature flags, environment URLs, settings that change between dev and production.

aws ssm put-parameter \
  --name "/myapp/environment" \
  --value "development" \
  --type String

aws ssm put-parameter \
  --name "/myapp/max-retries" \
  --value "3" \
  --type String

Get them back:

aws ssm get-parameter --name "/myapp/environment"
aws ssm get-parameters --names "/myapp/environment" "/myapp/max-retries"

This is how every well-architected application manages configuration. No more hardcoding values in your code.

Use It With Python

If you want to go beyond CLI, here's how your application code talks to Floci. Make sure Python 3 is installed, then:

pip install boto3

import boto3

s3 = boto3.client("s3",
    endpoint_url="http://localhost:4566",
    region_name="us-east-1",
    aws_access_key_id="test",
    aws_secret_access_key="test")

s3.create_bucket(Bucket="my-python-bucket")
s3.put_object(Bucket="my-python-bucket", Key="hello.txt", Body=b"it works!")
print(s3.get_object(Bucket="my-python-bucket", Key="hello.txt")["Body"].read())

Output: b'it works!'

The only difference between this and production code is one line: endpoint_url. When you deploy to real AWS, remove that line. Everything else stays the same. You're writing production-ready code from day one.

The Way I Wish I Had Learned

Looking back at my journey, here's what I would do differently starting today:

Week 1-2: S3 and IAM. Create buckets, upload files, set permissions, try to access things you shouldn't. Break the permissions. Fix them. Understand what "Access Denied" actually means.

Week 3-4: DynamoDB. Create tables with different key structures. Put data. Query it. Understand the difference between get-item and query and scan.

Week 5-6: SQS, SNS, and Secrets Manager. Build a simple message flow. Store secrets. Retrieve configuration. These are the building blocks of every real application.

Week 7-8: Lambda. Write a simple function. Trigger it. See the logs. Floci runs real Lambda containers not mocks.

Throughout: Break everything. Delete tables while data is in them. Send malformed messages. Call APIs with wrong parameters. Read the error messages carefully. This is the real education.

Tips From Years of Cloud Work

1. Don't just create. Troubleshoot.

The skill that got me promoted from L1 Cloud Support to Cloud Architect wasn't creating infrastructure. It was fixing it when it broke. Break things deliberately on Floci and fix them without starting over.

2. Learn the CLI before the console.

In real jobs, you'll use CLI and infrastructure-as-code. Floci forces this habit because there's no console to click around in.

3. Read error messages. Actually read them.

Most beginners see an error and panic. AWS error messages are surprisingly helpful if you actually read them. Practice generating errors on Floci so you learn to read them calmly.

4. Document what you learn.

I filled four notebooks with cloud concepts. Later, I converted those notes into articles. That habit changed my career. Start writing about what you break and fix even if nobody reads it at first.

5. One concept per day is enough.

You don't need to learn all 200 AWS services. Focus on the core: storage, databases, messaging. One solid hour of hands-on practice beats five hours of video watching.

Common Mistakes Beginners Hit

Before you get stuck, here are the issues I see most often:

"Command not found: aws" - You haven't installed AWS CLI yet, or you need to restart your terminal after installation.
Commands hitting real AWS instead of Floci - You forgot to set AWS_ENDPOINT_URL. Always check with echo $AWS_ENDPOINT_URL before running commands.
"Cannot connect to the Docker daemon" - Docker Desktop isn't running. Open it first.
Data disappeared after restart - By default, Floci stores data in memory. Add FLOCI_STORAGE_MODE=hybrid to keep data between restarts:

services:
  floci:
    image: floci/floci:latest
    ports:
      - "4566:4566"
    environment:
      - FLOCI_STORAGE_MODE=hybrid
    volumes:
      - ./data:/app/data

When You're Ready for Real AWS

Here's the transition path:

Learn and practice on Floci (no cost, no risk)
When you feel confident, create an AWS Free Tier account
Deploy the same commands on real AWS they work identically
For your application code, just remove the endpoint_url line

There is no rewrite. No migration. No new learning curve. The skills you build on Floci transfer directly to real AWS because it's the same API, the same CLI, the same SDK.

Cleaning Up

When you're done for the day:

docker compose down

This stops everything. If you used hybrid storage mode, your data stays in the ./data folder for next time.

Final Thoughts

When I started learning cloud, I was scared of bills, confused by services, and paralyzed by the fear of breaking something expensive. I shipped broken code to production because I couldn't test properly locally. I wasted months playing it safe when I should have been experimenting aggressively.

That was then. Today, you have Floci.

If you're a fresher starting your cloud journey, a student preparing for certifications, or a working professional switching to cloud start here. Build things. Break things. Fix things. Do it a hundred times until the commands feel natural and the errors feel familiar.

The cloud rewards people who practice. Not people who watch.

In my next article, I'll show you how to deploy a complete serverless API Lambda + API Gateway + DynamoDB entirely on Floci, and then move it to real AWS with zero code changes. Follow me so you don't miss it.

Resources:

Floci GitHub:
floci-io / floci

Light, fluffy, and always free - The AWS Local Emulator alternative
Light, fluffy, and always free
No account. No auth token. No feature gates. Just docker compose up

Quick Start · Features · Services · SDKs · Testcontainers · Migration · Docs

What is Floci?

Floci is a free, open-source local AWS emulator for development, testing, and CI.

It gives you AWS-shaped services on your machine without requiring a cloud account, an auth token, or paid feature gates. Point your AWS SDK, CLI, Terraform, CDK, OpenTofu, or test suite at http://localhost:4566 and keep your existing workflows.

Floci is named after floccus, the cloud formation that looks like popcorn.

Quick Start

The fastest way to run Floci is with the official CLI
floci start
Export the AWS environment variables:
eval $(floci env)
Use your existing AWS tools normally:
aws s3 mb s3://my-bucket aws dynamodb create-table \ --table-name demo-table \ --attribute-definitions AttributeName=pk,AttributeType=S \ --key-schema AttributeName=pk,KeyType=HASH \ --billing-mode PAY_PER_REQUEST aws
…
View on GitHub
Floci Documentation: floci.io/floci/
Docker Hub: hub.docker.com/r/floci/floci

If this helped you, share it with someone who's starting their cloud journey. Drop a comment if you have questions I'll respond to every one.

Connect with me on LinkedIn for more cloud architecture, DevOps, and career guidance.

📌 Wrapping Up

Thanks for reading! If this was helpful:

❤️ Like if it added value
💾 Save for later
🔄 Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning 🚀

Zipping 15Gb of S3 files in 6s. How the power of community made it possible.

Paul SANTUS — Thu, 25 Jun 2026 18:23:01 +0000

In my first article, I showed how parallelizing zip assembly across multiple Lambdas can beat the single-Lambda bandwidth ceiling. I zipped 6.9GB in 35 seconds with just 5 workers.

Since then, Jérémie published a follow-up article where a contributor (Fitz) introduced a brilliant optimization: UploadPartCopy. Instead of downloading (or even streaming) big files through Lambda just to upload them back into the zip, you can tell S3 to copy them server-side. This halves the bandwidth requirement and brought his single-Lambda solution down to 106 seconds.

I took Fitz's UploadPartCopy idea and combined it with my parallel approach. Here's what happened.

What I took from Jérémie and Fitz

The UploadPartCopy insight is elegant: since ZIP STORE mode has deterministic offsets, we know exactly where each file's data lands in the final archive. For big files (≥5MB), we can:

Write just the local file header (50 bytes) in an UploadPart
Have S3 copy the file data directly via UploadPartCopy — no download, no upload, instant

This means workers barely use any memory or bandwidth for big files.

Only issue is that S3 multipart upload API requires all segments (except the last one) to be bigger than 5MB. So the local file header needs to be appended to an another file (or group of files).

My planner Lambda groups small files together until they reach 5MB, appends the LOC header of the next big file, then the worker fires an UploadPartCopy for that big file's data.

When we run out of small files, we stream the smallest remaining big file and pair it with (the LOC header then) a copy of the largest remaining one.

For CRC32 (required in zip headers): files uploaded with modern AWS SDKs already have CRC32 stored as object metadata. A simple HeadObject call retrieves it — no need to read the file.

Step Functions: three limitations

My original architecture used Step Functions to orchestrate workers. Here's what I hit.

1. Inline Map caps at ~40 concurrent iterations

The AWS documentation says the Inline Map state supports "up to 40 concurrent iterations." In practice I saw up to 55, but never more. With 1500 duos to process, Step Functions queued them in batches of 55.

I switched to Distributed Map which launches Express child workflow executions. All 1120 iterations started within 2 seconds. Problem solved? Not quite.

2. Distributed Map: fast to dispatch, slow to collect

With Distributed Map, all workers started within 2 seconds. Every single one finished in under 1 second (mostly UploadPartCopy calls). Total Lambda compute: ~500ms average.

Yet the Map state took 38 seconds to complete.

The bottleneck? Step Functions' internal machinery for collecting and aggregating results from 1120 Express child executions. I confirmed: all workers started at 10:06:52-53, all finished by 10:06:54, but the Map state didn't report success until 10:07:28. 35 seconds of pure orchestration overhead.

3. The 256KB payload limit

Step Functions states can pass at most 256KB between them. With 3000 files:

The planner's assignment list exceeds 256KB → had to write to S3
The aggregated worker results exceed 256KB → had to write CRC32s to S3, read them back in the finalizer

This added complexity and latency (the finalizer reading 1500 small S3 files — 29 seconds sequentially, until I parallelized it down to 1.5s).

After all these fixes, the Step Functions version ran in 41 seconds for 3000 × 5MB files. Respectable — 2.5× faster than Jérémie's 106s — but I knew most of that time was Step Functions overhead, not actual work.

The final version: direct Lambda invocation

I stripped out Step Functions entirely and wrote a single orchestrator Lambda that:

Lists files, computes zip layout (the job of the "planner" Lambda in my StepFunction architecture), and initiates multipart upload (~0.5s)
Invokes all worker Lambdas synchronously in parallel using goroutines + the Lambda SDK (~0.5s to dispatch)
Collects results (workers return inline, no S3 round-trip for parts)
Reads CRC32 files from S3 in parallel, builds central directory, completes multipart upload (~1s)

Orchestrator Lambda (15min timeout, 1024MB)
    │
    ├─── goroutine → Invoke Worker 1 (sync) → return {parts}
    ├─── goroutine → Invoke Worker 2 (sync) → return {parts}
    ├─── ...
    └─── goroutine → Invoke Worker N (sync) → return {parts}
    │
    └─── All done → Build CD → CompleteMultipartUpload

The Lambda SDK's synchronous Invoke blocks until the worker returns. With 200 concurrent goroutines, all workers are dispatched instantly. No orchestration overhead, no state size limits for the parts (only CRC32s go to S3), no 35-second result aggregation.

Now the theoretical time is: orchestration time + time to upload the smallest large file that stays orphan after we pair all large files with groups of small files or single large files

Results: 3000 × 5MB benchmark

Approach	Time	Notes
Jérémie Gen1 (Rust, streaming)	212s	Single Lambda, 512MB
Jérémie Gen2 (Rust, UploadPartCopy)	106s	Single Lambda, 640MB
My Step Functions version	41s	Distributed Map, 1120 workers
My orchestrator Lambda	6s	Direct invoke, ~1500 workers

6 seconds to zip 15GB into a single valid ZIP64 archive. That's a 18× improvement over the optimized single-Lambda approach, and 35× over the original.

Worker stats:

Max memory: 85 MB (I initially allocated 3008MB — massively over-provisioned thanks to UploadPartCopy)
Average duration: 516ms per worker
Max duration: 1035ms

What I learned (round 2)

Step Functions Parallel Map adds seconds, not milliseconds. For latency-sensitive fan-out/fan-in, direct Lambda invocation is faster. Step Functions shines when you need retries, visual debugging, long-running workflows, or error handling, or lightning fast step transition speed. This outstanding performance lasts only until you need more than 40 parallel processes.
UploadPartCopy is the killer optimization. When most files are ≥5MB, workers barely do any work — they just tell S3 to copy data server-side. Memory stays under 100MB regardless of file sizes.
The orchestrator pattern is underrated. A single Lambda with goroutines can invoke hundreds of child Lambdas synchronously, collect results, and finalize — all within one execution context. No state machine, no payload limits between states, no aggregation overhead.
Over-parallelization can hurt. 1500 separate assignments created more Step Functions overhead than the actual compute. Grouping into fewer, larger batches would have been better for the SFN approach.

Try it

Code: github.com/psantus/on-demand-archive-on-s3

The repo has both approaches: Step Functions (cmd/planner + cmd/worker + cmd/finalizer) and the orchestrator Lambda (cmd/orchestrator).

Jérémie's challenge repo: github.com/RustyServerless/demo-s3-archiving

What's next?

The theoretical minimum is bounded by Lambda cold start time (~200ms) plus the slowest UploadPart call (if we lack small files, we may need to upload a large file manually to append another file's LOC to it) plus orchestrator overhead (~500ms).

Your move, Jérémie 😏

Edit: with 73.2Gb (15,000 files), my solutions gives quite acceptable performance. Just 20s (probably due to my 1000 account default concurrency, would likely be faster on an unbounded account :D)

Paul out.

Deploy your own OpenVPN Server on AWS with one prompt

Noureldin ehab — Thu, 25 Jun 2026 17:00:00 +0000

Overview

Most of your AWS resources should be in private subnets for security reasons, but that also means they’re not directly accessible from the internet. To reach them securely, you need a VPN.

In this tutorial, we’ll use OpenVPN on AWS to create a secure, encrypted connection to your private resources so your team can access them safely.

Note: Stakpak is open source, vendor neutral, and works with any model you choose.

Problem

AWS resources in private subnets aren’t accessible from the internet by default.
Teams often try to solve this by opening ports or using bastion hosts, which increases security risks.
These workarounds also add complexity to network management and access control.
A VPN is needed to provide secure and simple access without exposing services publicly.

Business Impact

Without a VPN, secure remote access is harder, slower, and riskier. A VPN simplifies access and keeps development and operations running securely.

But what is a VPN?

A VPN (Virtual Private Network) is a secure, encrypted connection that allows you to access a private network over the internet as if you were physically inside it. It’s commonly used to safely reach internal servers, databases, or applications without exposing them to the public.

Step-by-Step Guide

Prerequisites

Install Stakpak
Cloud provider credentials configured
Then just ask it to i want to install openvpn on aws so i can access my private resources
Here you chose your preferences

I want to know more about the different architectures, so let's ask about it

Here I chose

Which AWS Region? EU West 1
Do you have a VPC set ups? Yeah, i have a VPC
How many people need VPN Access? Just one person needs access
AWS Client VPN or Self Hosted Open VPN or Open VPN from Market Place? Self Hosted Open VPN

I will just tell it to continue with the defaults

Now we can review the commands and press Enter to continue it will be:

Get the VPC details
Get the subnet details
Check the internet gateway

Now it will create a security group for open vpn and get the latest Ubuntu version

Now it will create the security group rules, SSH key, and launch the ec2 instance

Now that we have the EC2 ready, Stakpak will start setting up open VPN

That's it, now we can use OpenVPN

Extra Resources:

References

Beyond the System Prompt: Building Modular AI Agents with Strands Skills

Milad Rezaeighale — Thu, 25 Jun 2026 09:08:31 +0000

Anyone who has shipped a multi-capability agent knows the pattern. You start clean. Then the product needs more. You append instructions. Then edge cases. Then domain-specific rules for each capability. Six months later your system prompt is 3,000 tokens of competing guidance that the model has to reconcile on every single call — whether it needs that context or not.

The problem isn't prompt engineering skill. It's architecture. You're treating instruction delivery like a static config file when it should be dynamic.

This is the same problem software engineering solved decades ago with modular design. You don't load every library into memory at startup. You import what you need, when you need it.

Skills bring that principle to agent instruction design.

What Skills Are

Skills are self-contained instruction packages that an agent loads on demand. The agent's context stays lean — only skill names and descriptions are present at startup. When the agent determines it needs a specific capability, it fetches the full instructions at that moment and executes within them.

Three properties make this meaningful at scale:

Isolation — each skill's instructions are scoped. They can't conflict with each other because they're never in context at the same time unless explicitly needed.

Token efficiency — you pay only for what's active. An agent with ten skills doesn't carry ten sets of instructions into every call.

Maintainability — skills are versioned and updated independently. Changing how your agent handles one domain doesn't touch anything else.

This is progressive disclosure applied to LLM context management.

Strands and the AgentSkills Plugin

Strands is AWS's open-source agent SDK for Python and TypeScript. It takes a model-driven approach — instead of hardcoding orchestration logic, the LLM itself decides when to call tools, which order to execute steps, and when it has enough information to respond. This makes agents significantly more flexible without requiring complex orchestration code.

Strands ships with built-in tool support, multi-agent orchestration, and a plugin system for extending agent behavior. One of those plugins is AgentSkills — a production implementation of the progressive disclosure pattern.

Setting up an agent with Strands takes less than ten lines:

from strands import Agent

agent = Agent(system_prompt="You are a helpful assistant.")
response = agent("What is the capital of France?")

Adding skills is one extra step:

from strands import Agent, AgentSkills

plugin = AgentSkills(skills="./skills/")
agent = Agent(plugins=[plugin])

From that point, the agent manages skill discovery and activation automatically — you don't wire any routing logic.

How AgentSkills Works in Detail

The plugin operates in three phases:

Discovery At initialization, AgentSkills scans your skills directory and injects only the skill names and descriptions into the system prompt:

<available_skills>
  <skill>
    <name>email-drafter</name>
    <description>Drafts professional emails from a plain-English brief.</description>
  </skill>
  <skill>
    <name>bug-investigator</name>
    <description>Analyzes errors and returns a structured diagnosis.</description>
  </skill>
  <skill>
    <name>git-commit-writer</name>
    <description>Writes conventional commit messages from a change description.</description>
  </skill>
</available_skills>

That's all the agent sees upfront — names and descriptions. No instructions, no domain logic, no token cost beyond the metadata.

2. Activation
When the agent receives a message it determines requires a specific skill, it calls the built-in skills tool with the skill name as the argument. This is a standard tool call — the same mechanism the agent uses for any other tool. No special routing, no conditional logic on your side.

3. Execution
The tool returns the full contents of the SKILL.md — instructions, rules, output format, everything. The agent now operates within those instructions for that response. Activated skills persist in agent state for the remainder of the session, so they don't need to be re-fetched on follow-up messages in the same domain.

Let's See It in Action

To make skill activation visible, I built a simple Streamlit UI — three skills loaded into one agent, each triggered by a different type of message.

I sent this prompt:

I'm getting this error in my React app, can you help me debug it? TypeError: Cannot read properties of undefined (reading 'map') at App.js:42

The agent identified it as a bug report, activated the bug-investigator skill on demand, and returned a structured diagnosis — no routing logic, no conditionals, no hardcoded rules.

Same agent, one prompt, the right skill loaded automatically.

Defining a Skill

A skill is a directory with a single SKILL.md file. The file has two parts: a YAML frontmatter header that the plugin reads, and a markdown body that becomes the agent's instructions.

skills/
├── bug-investigator/
│ └── SKILL.md
├── email-drafter/
│ └── SKILL.md
└── git-commit-writer/
└── SKILL.md

---
name: bug-investigator
description: "Analyzes an error message or stack trace and returns a structured diagnosis with root cause and fix."
---

# Bug Investigator Skill

You are a senior software debugger. When given an error message or stack trace, respond in this exact format:

🔍 Root Cause:
<one clear sentence explaining why this error occurs>

🛠 Fix:
<step-by-step instructions to resolve it>

✅ Example:
<a minimal corrected code snippet>

Rules:
- Be precise — if the error is ambiguous, ask one clarifying question.
- Always explain the why, not just the what.
- Keep the example under 10 lines.

The name field must be lowercase alphanumeric with hyphens, 1–64 characters. The description is what the agent reads to decide whether to activate the skill — write it as a clear, specific one-liner. Vague descriptions lead to wrong activations.

An optional allowed-tools field restricts which tools the skill can use:

---
name: pdf-processor
description: Extracts text and tables from PDF files using shell scripts.
allowed-tools: file_read shell
---

Two Ways to Define Skills

Filesystem-based is the standard approach — each skill in its own directory, versioned alongside your code, easy to review and update independently.

Programmatic is useful when instructions need to be generated at runtime — pulled from a database, built from environment config, or constructed dynamically per tenant:

from strands import Skill, AgentSkills, Agent

skill = Skill(
    name="summarizer",
    description="Condenses any text into a bullet-point summary preserving all key facts.",
    instructions=(
        "Extract the 3-5 most important points as bullet points. "
        "Add a one-sentence TL;DR at the top. "
        "Do not add information not present in the source text."
    )
)

plugin = AgentSkills(skills=[skill])
agent = Agent(plugins=[plugin])

Both approaches compose cleanly:

plugin = AgentSkills(skills=["./skills/", dynamic_skill])

This is the practical setup for most production agents — static skills for stable capabilities, programmatic skills for anything that varies by environment or user context.

When to Reach for Skills

Skills aren't the right tool for every agent. If your agent has one job, a well-crafted system prompt is simpler and sufficient.

Skills pay off when:

Your agent handles genuinely different domains where instruction sets would conflict
You're optimizing for token cost at scale across high-volume calls
You need independent versioning of capabilities across a team
You're building toward a multi-skill agent that will grow over time They're a step below full multi-agent orchestration — more structure than a monolithic prompt, less overhead than spawning separate agents per capability.

Try It
Full project with Streamlit UI on GitHub:

👉 https://github.com/miladrezaei-ai/strands-agent-skills

git clone https://github.com/miladrezaei-ai/strands-agent-skills
cd strands-agent-skills
uv sync
aws configure   # or AWS SSO
uv run streamlit run app.py

Where does your current agent prompt need this kind of separation? Would love to hear what you're building.

AWS Lambda MicroVMs: I Tested the New Stateful Serverless Primitive

Alexey Vidanov — Thu, 25 Jun 2026 03:49:35 +0000

What just happened

On June 22, 2026, AWS quietly launched AWS Lambda MicroVMs. Not a Lambda feature update. A new compute primitive sitting between AWS Lambda Functions (stateless, 15-min max) and EC2 (full VM, you manage everything).

Each MicroVM is an isolated Firecracker VM with its own HTTPS endpoint, running your code from a pre-built snapshot. Stateful. Up to 8 hours. Suspend when idle, resume on demand.

I tested it the same week. Here's what I found.

The test setup

A minimal Python HTTP server packaged as a Dockerfile:

from http.server import HTTPServer, BaseHTTPRequestHandler
import json, time, os

class Handler(BaseHTTPRequestHandler):
    start_time = time.time()
    request_count = 0

    def do_GET(self):
        Handler.request_count += 1
        body = json.dumps({
            "message": "Hello from Lambda MicroVM!",
            "uptime_seconds": round(time.time() - Handler.start_time, 2),
            "requests_served": Handler.request_count,
            "pid": os.getpid()
        })
        self.send_response(200)
        self.send_header("Content-Type", "application/json")
        self.end_headers()
        self.wfile.write(body.encode())

HTTPServer(("0.0.0.0", 8080), Handler).serve_forever()

The Dockerfile:

FROM public.ecr.aws/lambda/microvms:al2023-minimal
RUN dnf install -y python3 && dnf clean all
WORKDIR /app
COPY app.py .
EXPOSE 8080
CMD ["python3", "app.py"]

How it works

Three steps:

Zip code + Dockerfile → upload to Amazon Simple Storage Service (Amazon S3)
create-microvm-image builds the container, starts the app, takes a Firecracker snapshot of memory and disk
run-microvm launches from that snapshot

Every launch resumes from the pre-initialized state. No cold boot. Your app is already running the moment the MicroVM starts.

aws lambda-microvms create-microvm-image \
  --name hello-microvm-test \
  --code-artifact "uri=s3://my-bucket/artifact.zip" \
  --base-image-arn arn:aws:lambda:us-east-1:aws:microvm-image:al2023-1 \
  --build-role-arn arn:aws:iam::123456789:role/MicroVMBuildRole

Image build took about 3 minutes. Once done:

aws lambda-microvms run-microvm \
  --image-identifier arn:aws:lambda:us-east-1:123456789:microvm-image:hello-microvm-test \
  --execution-role-arn arn:aws:iam::123456789:role/MicroVMExecutionRole \
  --idle-policy '{"maxIdleDurationSeconds":300,"suspendedDurationSeconds":60,"autoResumeEnabled":true}'

Response:

{
  "microvmId": "microvm-489fbc1b-1c73-3b37-a9f2-266d0173cb94",
  "state": "RUNNING",
  "endpoint": "34cf7dac-bb5c.lambda-microvm.us-east-1.on.aws"
}

The numbers

Metric	Measured
Image build	~3 minutes
Launch API call	1.17s
Time to RUNNING	~12s
First request (from snapshot)	911ms
Warm request latency	~340ms
Suspend → Resume	1.86s

The 340ms warm latency includes my network round-trip from Hamburg to us-east-1. The actual compute latency is lower.

Statefulness proof

This is the part that matters. After three requests:

{"requests_served": 3, "uptime_seconds": 434.76, "pid": 1}

Suspend the MicroVM. Resume it. Send another request:

{"requests_served": 5, "uptime_seconds": 454.1, "pid": 1}

Same PID. Counter continued from where it left off. Uptime kept ticking (includes suspended time). Full memory and disk state preserved across suspend/resume.

Authentication

Each request needs a JWE token generated via the API:

aws lambda-microvms create-microvm-auth-token \
  --microvm-id microvm-489fbc1b \
  --expiration-in-minutes 15 \
  --allowed-ports '[{"port":8080}]'

The token goes in the X-aws-proxy-auth header. Short-lived, scoped to specific ports. No way to hit someone else's MicroVM.

What this replaces

Before Lambda MicroVMs, running untrusted code (AI-generated, user-submitted) meant:

Containers with custom hardening — shared kernel, escape risk, significant engineering to harden
EC2 per user — minutes to start, expensive, you manage everything
Lambda Functions — 15-min max, stateless, no interactive sessions

Lambda MicroVMs fills the gap: VM-level isolation with serverless operational model. No capacity planning. No kernel to patch. Suspend when idle, pay only for snapshot storage.

Specs and limits

Compute: 0.5–8 GB RAM baseline, burst to 32 GB. 0.25–4 vCPU baseline, burst to 16.
Disk: up to 32 GB
Runtime: max 8 hours
Architecture: ARM64 only (for now)
Protocols: HTTP/1.1, HTTP/2, gRPC, WebSocket, SSE
Regions: us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1

Pricing model

Three dimensions:

Compute: per-second, based on your chosen baseline + peak usage above it
Snapshot operations: read/write when launching or suspending
Snapshot storage + data transfer

Suspended MicroVMs cost only storage. No compute charges while idle.

Who should care

If you're building any of these, Lambda MicroVMs changes your architecture:

AI agent sandboxes (execute generated code safely)
Browser-based IDEs (each user gets their own env)
CI/CD runners (isolated per job, no shared state)
Jupyter/analytics (state persists across sessions)
Vulnerability scanning (disposable, isolated)

What I'd watch

ARM64 only is a constraint for workloads compiled for x86
5 regions at launch means some customers wait
The snapshot-based model means your app's initialization needs to be snapshot-friendly (no stale connections, no clock-sensitive state at init) ~~- Pricing details not fully public yet at time of writing~~

Getting started

You need AWS CLI v2.35.10+. The lambda-microvms service is a separate command namespace:

aws lambda-microvms list-managed-microvm-images --region us-east-1
aws lambda-microvms create-microvm-image --help
aws lambda-microvms run-microvm --help

The base image (al2023-1) is Amazon Linux 2023 minimal. Your Dockerfile adds what you need on top.

Pricing

Lambda MicroVMs bills per second across three dimensions. You configure a baseline and pay for
burst capacity only when used.

Compute (eu-west-1):

vCPU: $0.0000291572 per second
Memory: $0.0000038603 per second per GB

You pay baseline while running. Burst above baseline is charged only for the seconds consumed
at peak, not for the full duration.

Snapshot operations and storage are charged separately (pricing not fully detailed at
launch).

Real-world example: Playwright browser automation

Baseline: 1 vCPU / 2 GB RAM. Chromium bursts to 2 vCPU + 4 GB for 3 seconds during page render.

Simple scrape (stays at baseline) — 5s duration → $0.000185 per invocation → $1.85 at 10K/month

Heavy page (burst 3s of 8s) — 8s duration → $0.000405 per invocation → $4.05 at 10K/month

Full PDF render (burst 5s of 12s) — 12s duration → $0.000996 per invocation → $9.96 at 10K/month

A Playwright job that needs 4 GB for 3 seconds of an 8-second run costs half of what a fixed 4 GB allocation would for the full duration. Configure for your typical workload, let Lambda handle the spikes.

Suspended MicroVMs incur only snapshot storage costs. No compute charges while idle.

Tested June 24, 2026. Lambda MicroVMs launched June 22 in preview.

Sources

Blog: https://aws.amazon.com/blogs/aws/run-isolated-sandboxes-with-full-lifecycle-control-aws-lambda-introduces-microvms/
Product page: https://aws.amazon.com/lambda/lambda-microvms/
CLI: aws-cli v2.35.10+ (aws lambda-microvms)

I Built a Full-Stack Fantasy Football App Using Kiro and Vercel v0

Ogbeide Godstime Osemenkhian — Wed, 24 Jun 2026 18:47:44 +0000

The Idea

Fantasy Premier League has 11 million players. The official app gives you a wall of numbers and leaves you to interpret them. I wanted something that felt easier to use; squad management, player transfers, a credit economy, and leaderboards, with a clean mobile-first UI on top of AWS infrastructure.

The H0: Hack the zero Stack with Vercel v0 and AWS Database Hackathon gave me a deadline to actually ship it.

The Two-Tool Workflow

I split the work between two AI tools and it worked better than I expected:

v0 by Vercel handled the frontend. I described pages and components in natural language and v0 generated production-ready Next.js 16 code with Tailwind CSS and shadcn/ui. The squad builder pitch visualization, dashboard layout, transfer history cards, and responsive mobile nav all came from v0. It nailed the App Router file conventions and Tailwind utility classes without hand-holding.

Kiro handled the backend and orchestration. I used Kiro's spec-driven workflow: Requirements → Design → Tasks. Once the spec was locked, Kiro executed all 32 implementation tasks autonomously; API routes, auth middleware, DynamoDB operations, credit logic, transfer validation.

Architecture: Deliberately Simple

Browser → Vercel (Next.js API Routes) → DynamoDB (via AWS SDK)

No API Gateway, No Lambda in the request path. Next.js API routes call DynamoDB directly using @aws-sdk/lib-dynamodb with IAM credentials injected as Vercel environment variables.

Auth is AWS Cognito. The JWT lives in an httpOnly secure cookie called squadiq-token. Middleware validates presence; API routes decode the sub claim for the userId.

Lambda only exists for background work:

Player Sync - daily cron that pulls data from the FPL API into DynamoDB
Scoring Engine - EventBridge trigger after match completion

The DynamoDB Single-Table Design

Everything lives in one table (SquadIQ-dev) with composite keys:

PK	SK	Entity
`USER#<id>`	`PROFILE`	User profile
`USER#<id>`	`SQUAD#<competitionId>`	Squad
`USER#<id>`	`TRANSFER#<ts>`	Transfer record
`USER#<id>`	`CREDIT#<ts>`	Credit transaction
`LEAGUE#<id>`	`META`	League metadata
`PLAYER#<id>`	`META`	Player data

This means every query is a single GetItem or Query on the partition key. No scans, no joins.

What Kiro's Spec Workflow Actually Looks Like

Write requirements (user stories + acceptance criteria)
Generate a technical design (data model, API contracts, auth flow)
Generate implementation tasks from the design
Run all tasks; Kiro writes the code, runs type checks, iterates on errors

The spec files live in .kiro/specs/squadiq-live-features/ and serve as living documentation. When I needed to add a new endpoint, I updated the spec and Kiro knew exactly what to implement.

What v0 Did Well

v0 is remarkable at generating complete, styled, accessible React components from descriptions. Things that normally take an hour of fiddling with CSS grid took one prompt:

"Dashboard with a small pitch on the left and stat cards stacked vertically on the right"
"Squad builder with a football pitch background showing players in formation positions"
"Transfer history page with a hero banner and card list"

It understood Tailwind responsive modifiers (md:, lg:), shadcn/ui component patterns, and Next.js Image optimization out of the box.

Lessons Learned

Spec-first saves time. 30 minutes on requirements prevented hours of rework. When you define the API response shape before writing code, frontend and backend stay aligned.

You don't need Lambda for everything. Next.js API routes with IAM credentials are fast, cheap, and way simpler to debug than a Lambda + API Gateway stack.

DynamoDB single-table design is worth the upfront cost. Once your access patterns are clear, queries are trivial and fast. The hard part is resisting the urge to add a second table "just in case."

AI tools work best with clear boundaries. v0 for UI, Kiro for backend + orchestration. Letting each tool do what it's best at produced better results than asking one tool to do everything.

The Stack

Next.js 16, React 19, TypeScript
Tailwind CSS + shadcn/ui
AWS Cognito, DynamoDB, IAM, Lambda (background)
Vercel (hosting + serverless)
Kiro (spec-driven development)
v0 (frontend generation)
GitHub Actions (CI/CD)

Migrate from NGINX to Caddy on AWS

Noureldin ehab — Wed, 24 Jun 2026 17:00:00 +0000

Why Migrate to Caddy?

Caddy is open source, and it provides automatic HTTPS and certificate renewal out of the box, removing the need for Certbot or cron jobs. It offers secure defaults, simpler configuration, which makes it a lightweight and low maintenance replacement for nginx

It acts as a reverse proxy, load balancer, and static file server out of the box, with secure defaults and minimal setup.

Note: Stakpak is open source, vendor neutral, and works with any model you choose.

Step by Step Guide

Architecture

Our current setup uses a single tier architecture on AWS to host a static HTML website. It runs on a t3.micro EC2 instance using nginx 1.28.0, serving files from /usr/share/nginx/html/. The instance is part of the default VPC and resides in a public subnet, allowing direct internet access.

Traffic is managed by a security group with inbound rules open to:

SSH (port 22)
HTTP (port 80)
HTTPS (port 443)

DNS is handled through Amazon Route 53, where an A record points the domain migratingtocaddy.guku.io to the instance’s public IP. TLS certificates are issued by Let’s Encrypt and configured via Certbot with the nginx plugin, enabling automatic HTTPS redirection.

The problem with this architecture:

Depends on manual Certbot setup (The renewal cron job can easily be forgotten)
nginx configuration is unnecessarily complex
No built in automation for TLS or reloads
Higher maintenance for updates and security hardening

Let's see how we can fix these problems with caddy

Prerequisites

Install Stakpak
Open your terminal and type "stakpak"
You should configure your cloud credentials before opening stakpak, since Stakpak will use your existing machine setup to work

Guide

Then ask Stakpak to Migrate from NGINX to Caddy with 0 downtime on AWS
First Stakpak will check what is our current set up on AWS

Now, Stakpak recommended three zero down time strategies for the migration

Since we don't want downtime because of the DNS access and TLS let's choose the second option

Now that we have the ALB and target groups, Stakpak will install Caddy
After installing Caddy Stakpak will copy the website content
Now wait for the health checks so we make sure Caddy is working fine

Now Stakpak is updates the DNS to point to the ALB
Thats it, we are ready to redirect the traffic to Caddy, and since we are using ALB we will be able to roll back if needed

Now it's working🥳

ps: don't forget to check our new Slack Integration👀

Extra Resources:

References

How I Built a Full High Availability AWS Infrastructure with Terraform Modules

Emmanuel Ulu — Wed, 24 Jun 2026 13:49:58 +0000

Introduction

Most AWS tutorials teach you how to launch a single EC2 instance in a public subnet and call it a day. That's fine for learning the basics, but it's nowhere near what production infrastructure looks like.
In this article I'll walk you through how I designed and deployed a full multi-tier, multi-AZ High Availability infrastructure on AWS written entirely in Terraform, structured as reusable modules. By the end you'll understand not just what I built, but why each decision was made.
This is part of my AWS SAA-C03 certification preparation as an AWS Community Builder 2026 (Serverless track).

What Does "High Availability" Actually Mean?

High Availability means your system keeps running even when something fails. In AWS, the primary failure unit is an Availability Zone a physically separate data centre within a region.

True HA means no single AZ failure can bring your application down. That requires every tier network, compute, and database to span multiple AZs.

Here's what most people get wrong: they put their EC2 instances in two AZs but share a single NAT Gateway in one AZ. When that AZ goes down, all outbound traffic from private subnets dies even the instances in the healthy AZ. True HA requires one NAT Gateway per AZ.

Architecture Drawing

Architecture Overview

The infrastructure spans 3 Availability Zones in eu-west-1 (Ireland) across 4 tiers:

Internet
    |
Route 53 (app.skylumanex.click)
    |
Application Load Balancer (3 public subnets)
    |
Auto Scaling Group (3 private app subnets)
    |
RDS MySQL Multi-AZ (3 private DB subnets)

Every tier lives in its own subnet type, in its own security group, with tightly scoped ingress rules.

Terraform Module Structure

modules/
├── network/      # VPC, subnets, IGW, NAT, route tables
├── compute/      # Security groups, ALB, ASG, CloudWatch
├── database/     # RDS Multi-AZ, DB subnet group
└── dns/          # Route 53 alias + health check
environments/
└── dev/          # Root module wiring everything together

Each module has exactly 3 files: main.tf, variables.tf, and outputs.tf. Modules communicate through outputs and inputs — the networking module outputs VPC ID and subnet IDs, the compute module takes those as inputs, the database module takes the app security group ID from compute to scope its DB ingress rules.

Module 1 — Networking

The networking module creates the entire network foundation:

1 VPC (10.0.0.0/16)
3 public subnets (one per AZ) for the ALB and NAT Gateways
3 private app subnets (one per AZ) for EC2 instances
3 private DB subnets (one per AZ) for RDS
1 Internet Gateway
NAT Gateway(s) configurable via single_nat_gateway toggle
Route tables 1 public, 1 private per AZ

The NAT Gateway decision:

variable "single_nat_gateway" {
  type    = bool
  default = true  # cost-optimized for dev
}

locals {
  nat_count = var.single_nat_gateway ? 1 : length(local.azs)
}

Flip single_nat_gateway = false in production and you get one NAT Gateway per AZ true HA outbound routing at ~$97/month. Keep it true for dev at ~$33/month.

Dynamic AZ lookup no hardcoded AZ names:

data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  azs = slice(data.aws_availability_zones.available.names, 0, 3)
}

This makes the module region-agnostic. Deploy to us-east-1 and it automatically picks the right AZs.

Module 2 — Compute

The compute module creates the application layer:
Two security groups with layered access:

# ALB SG — internet can reach the ALB on port 80
resource "aws_security_group" "alb" {
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# App SG — only the ALB can reach the EC2 instances
resource "aws_security_group" "app" {
  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
}

EC2 instances are in private subnets and only accept traffic that came through the ALB. They are never directly reachable from the internet.

Launch template with security best practices:

metadata_options {
  http_endpoint               = "enabled"
  http_tokens                 = "required"  # enforces IMDSv2
  http_put_response_hop_limit = 1
}

monitoring {
  enabled = true  # detailed CloudWatch metrics
}

Auto Scaling with CloudWatch alarms:

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  comparison_operator = "GreaterThanThreshold"
  threshold           = var.cpu_high_threshold  # default 60%
  alarm_actions       = [aws_autoscaling_policy.scale_out.arn]
}

resource "aws_cloudwatch_metric_alarm" "cpu_low" {
  comparison_operator = "LessThanThreshold"
  threshold           = var.cpu_low_threshold   # default 20%
  alarm_actions       = [aws_autoscaling_policy.scale_in.arn]
}

Module 3 — Database

RDS MySQL with Multi-AZ the core HA database setting:

resource "aws_db_instance" "this" {
  engine         = var.db_engine
  instance_class = var.db_instance_class
  multi_az       = true              # synchronous standby + auto failover
  storage_encrypted = true           # encrypted at rest
  storage_type      = "gp3"          # faster and cheaper than gp2
  deletion_protection = true         # safety guardrail
  backup_retention_period = 7        # 7 days of automated backups
}

The DB security group only accepts traffic from the app security group — not from any IP address, not from the internet:

ingress {
  from_port       = 3306
  to_port         = 3306
  protocol        = "tcp"
  security_groups = [var.app_security_group_id]
}

Module 4 — DNS

Route 53 alias record pointing app.skylumanex.click to the ALB, with a health check:

resource "aws_route53_record" "app" {
  zone_id = var.hosted_zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = var.alb_dns_name
    zone_id                = var.alb_zone_id
    evaluate_target_health = true
  }
}

evaluate_target_health = true means Route 53 won't route traffic to the ALB if the ALB health checks are failing. Another layer of resilience.

Root Module — Wiring Everything Together

The environments/dev/main.tf calls all 4 modules and passes outputs between them:

module "networking" {
  source         = "../../modules/network"
  project_name   = var.project_name
  vpc_cidr_block = var.vpc_cidr_block
  single_nat_gateway = var.single_nat_gateway
  # ...
}

module "compute" {
  source                 = "../../modules/compute"
  vpc_id                 = module.networking.vpc_id
  public_subnet_ids      = module.networking.public_subnet_ids
  private_app_subnet_ids = module.networking.private_app_subnet_ids
  # ...
}

module "database" {
  source                = "../../modules/database"
  vpc_id                = module.networking.vpc_id
  private_db_subnet_ids = module.networking.private_db_subnet_ids
  app_security_group_id = module.compute.app_security_group_id
  # ...
}

module "dns" {
  source         = "../../modules/dns"
  alb_dns_name   = module.compute.alb_dns_name
  alb_zone_id    = module.compute.alb_zone_id
  hosted_zone_id = var.hosted_zone_id
}

Notice how module.networking.vpc_id flows into both compute and database. module.compute.app_security_group_id flows into database. Each module is independent but they communicate cleanly through their interfaces.

Deploying It

cd environments/dev
export TF_VAR_db_password="YourStrongPassword"
terraform init
terraform plan
terraform apply

Terraform provisions all 40 resources in the correct dependency order automatically.

The Proof

$ curl http://app.skylumanex.click
Hello from ip-10-0-11-230.eu-west-1.compute.internal

$ curl http://app.skylumanex.click
Hello from ip-10-0-12-181.eu-west-1.compute.internal

Traffic distributing across private subnets in eu-west-1a and eu-west-1b through the ALB, resolved via Route 53. The instances are never directly reachable from the internet.

Screenshots

Key Lessons

Modules enforce separation of concerns the networking module doesn't know about EC2, the compute module doesn't know about RDS. Each module has one job.
Outputs are the module API — what a module exposes in outputs.tf is its contract with the outside world. Design them carefully
The NAT Gateway is the hidden single point of failure most HA tutorials miss this. One shared NAT Gateway means one AZ failure kills all outbound private traffic.
deletion_protection = true on RDS is a guardrail, not an obstacle — it saved me from accidentally destroying a database during testing. Disable it explicitly before destroy, never by default.
Never put db_password in terraform.tfvars use TF_VAR_db_password environment variable. It never touches disk.

What's Next

Add a bastion host or SSM Session Manager for secure instance access
Enable VPC Flow Logs for network traffic visibility
Add WAF in front of the ALB
Build a staging/ environment by copying environments/dev/ the modules don't change

Resources

Terraform AWS Provider docs
AWS Well-Architected Framework — Reliability Pillar
GitHub: aws-full-ha-infra

Part of my AWS SAA-C03 prep as an AWS Community Builder 2026 (Serverless track). Follow along as I build toward certification

AWS Lambda MicroVMs: run untrusted code with VM-level isolation (no infra to manage)

will peixoto — Wed, 24 Jun 2026 12:44:14 +0000

AWS just shipped Lambda MicroVMs, a new serverless primitive that gives each user or session a VM-level isolated sandbox, with near-instant launch and state preserved for up to 8 hours, all on Firecracker. Here is what it is, when to reach for it instead of a plain Lambda Function, and how to architect on top of it.

🇧🇷 Leia em português.

Let me put you in a situation. You need to run a piece of code you did not write. Maybe it is the script your user pasted into your platform, maybe it is the snippet an AI agent just generated and wants to execute. And then comes the question that keeps anyone working with multi-tenant up at night: how do I run this without handing a stranger the keys to the house?

Until last week you had three paths, each with a catch. A VM gives you strong isolation but takes minutes to boot. A container starts in seconds but shares a kernel, so running untrusted code there takes a pile of hardening. And the Lambda Function was built for short request-response, not for a session that has to keep live state between one interaction and the next (externalizing it to DynamoDB stores the data, not the live runtime: the running process, the loaded packages, the memory). In the end you chose between performance and isolation. No way around it. Or there was.

Container, VM, or Lambda: the trade-off none of them solved alone

This pattern got common: AI coding assistants, interactive code environments, analytics, vulnerability scanners, game servers running player scripts. They all need the same thing: give each user their own environment to run code the team did not write, safely and without lag.

The knot is that real isolation and low latency pull in opposite directions. From a security angle you want a hard boundary between tenants (the Security pillar of the Well-Architected Framework: isolate what is not trusted). From an experience angle you want that environment up the instant the user shows up. Reconciling the two was the expensive work.

And there is a nice irony in this story. We spent years learning to build stateless apps, and now state is a requirement again.

The solution to the future was hiding in the past.

That is a line a friend dropped in a conversation, and it has not left my head since. Ever felt that way? Because I have. And it is roughly what Lambda MicroVMs does: it brings state back, without handing you the weight of a full VM.

What Lambda MicroVMs is

Lambda MicroVMs is a new primitive inside Lambda, built exactly for that gap. Each MicroVM gives a single user or session its own isolated environment that boots fast, keeps memory and disk for the whole session, and pauses to a low cost when the user steps away.

The magic comes from Firecracker, the same lightweight virtualization that already runs over 15 trillion Lambda invocations a month. This is not raw new tech, it is the mature foundation of Lambda itself, exposed in a new way.

The model is image-then-launch:

You build the image once (AWS runs your Dockerfile, initializes the app, and takes a snapshot of memory and disk). After that, every MicroVM you launch resumes from that snapshot instead of cold-booting. That is why launch and resume are near-instant, even for a multi-gigabyte session.

What it is actually for (with examples you will recognize)

The main cue: this only enters the picture if you are building a platform that runs third-party code. If your app does not execute outside code, you do not need it. It is a building block for people who build that kind of product:

Replit, CodeSandbox, "VS Code in the browser": the user types code in the browser and it runs isolated, per user, holding state while the tab is open. That "runs isolated" is the MicroVM.
Code interpreter (like ChatGPT's or Claude's): you ask "plot this CSV", the AI writes Python and runs it to answer you. The runtime that executes that generated code, isolated per conversation, is the use case.
CI/CD runner (and relatives): a job runs the code of a Pull Request that may come from any stranger's fork, untrusted by definition, so you want an isolated, disposable runner per job. Same family: a scanner that runs a suspicious binary, a coding-interview platform (the candidate's code runs isolated), an AI agent that runs shell commands.

The thread tying it all together: each user, session, or job needs its own isolated environment, and the code running there is not code you wrote. That is the cue to use a MicroVM instead of a Lambda Function.

Lambda Function or Lambda MicroVM?

They do not compete, they complete each other. The official comparison:

	Lambda Functions	Lambda MicroVMs
Best for	request-response or event-driven (APIs, data processing, automation)	persistent environments running user or AI-produced untrusted code
Programming model	function handler invoked in a supported runtime	any application: run your own binaries, listen on ports, use Linux OS capabilities
Duration	up to 15 min per invocation; multi-step workflows up to a year with Lambda Durable Functions	up to 8 hours per session; suspend and resume across sessions
Runtime	service-provided runtimes (or customer-provided)	customer-provided MicroVM images
Inbound networking	direct invocations or event-source integrations; response streaming	inbound access to any port using OSI Layer 7 protocols
Concurrency	one request per execution environment at a time	multiple concurrent connections per MicroVM
Environment state	warm starts may reuse the environment, but state may not persist across invocations	memory and disk state preserved on suspend, restored on resume
Scaling	automatic: Lambda creates and destroys environments in response to traffic	developer-controlled: you create, suspend, resume, and terminate via API
Lifecycle	fully managed by Lambda	developer-controlled, with optional idle policies
Pricing	per-request + GB-seconds	per-second of compute while running + snapshot storage while suspended

The most common confusion: people assume the duration is the same as Lambda's. The startup is similar (both resume from a snapshot), but a Function dies at 15 minutes while a MicroVM holds a session for up to 8 hours with state intact. The real design: your app keeps Lambda Functions for the event-driven backbone, and calls MicroVMs only for the steps that need to run untrusted code in isolation.

How it works in practice: from endpoint to orchestration

Three things that trip people up at first, together.

The endpoint has a status. When you call run-microvm, you get an ID and a dedicated HTTPS endpoint for that MicroVM. But it is not ready instantly: it goes through states, from launch to RUNNING (about 2 seconds), and when idle it moves to suspended, coming back on resume. The endpoint is per MicroVM, per session.

One image, many MicroVMs. You build the image once (create-microvm-image) and each MicroVM is a run-microvm. Want two? Call it twice, and you get two independent instances. Idle behavior is governed by the idle-policy: maxIdleDurationSeconds (suspend after X idle) and autoResumeEnabled (the next request wakes the MicroVM on its own, in about 1s, no manual restart). When you are done, terminate-microvm releases everything.

You become the orchestrator. Since the endpoint is per session, something has to decide when to launch and where to route. Typically a Lambda Function in the backbone does it: it keeps a session -> MicroVM map (a store like DynamoDB in production), calls RunMicrovm on a user's first access, stores the ID and endpoint, mints a short-lived token with CreateMicrovmAuthToken, and proxies the request to the MicroVM's endpoint with the X-aws-proxy-auth header. If the instance is suspended and autoResume is on, the request itself wakes it. Add a routine to terminate orphan MicroVMs and you have the skeleton. The backbone code is in the next post in the series. And do not confuse this with Step Functions: MicroVM is the execution environment, Step Functions is an orchestrator, different layers.

Cost, limits, and what is still missing

Cost is a decision, not a detail. Werner Vogels keeps hammering in the Frugal Architect that cost is an architecture requirement, not a number you discover on the bill. The suspend is exactly that in practice: you pay a lot for VM-level isolation, but only while the user is active. When they leave, the MicroVM suspends and the cost drops, with no loss of state. Designing your idle-policy on purpose is a cost decision. The model, from the official table: you pay per second of compute while it runs, and only snapshot storage while it is suspended. Unit prices are on the Lambda pricing page.

Limits: ARM64, up to 16 vCPUs, 32 GB of memory, and 32 GB of disk per MicroVM, and up to 8 hours of total runtime. Provisioning is flexible: you set a baseline and burst up to 4x at peak, paying the baseline while it runs.

IaC: you can use the console, CloudFormation, and CDK.

Why Dockerfile + zip, and not a prebuilt ECR image? Aidan Steele dug into it: Lambda builds two copies of the image, one for Graviton 3 and one for Graviton 4, so it needs the source to recompile. The base comes from ECR Public, but pushing your own prebuilt image from a private ECR as the artifact is not the path. One thing that confuses people coming from containers: ECR does not leave your life. You do not deliver the MicroVM image via ECR, but inside the running MicroVM you can run Docker and docker pull your private ECR images at runtime. ECR is for consumption inside, not for delivering the image itself.

Networking and region: inbound traffic on configurable ports (HTTP/2, gRPC, WebSockets), service-provided JWE auth, outbound to the internet or your VPC. And it is available so far only in US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo).

When NOT to use it

If the workload is short request-response with no state, it stays a Lambda Function. A MicroVM there is a cannon for a mosquito. And if you just need more than 15 minutes with your own (trusted) code, a MicroVM is also overkill: for a long job, look at Fargate; for a multi-step workflow, Lambda Durable Functions (up to a year, as the table shows). MicroVMs are for when the differentiator is isolating untrusted code, not just going past 15 minutes.

There is also a gotcha AWS itself flags, and it rhymes with the determinism conversation: since the MicroVM boots from a pre-initialized snapshot (the equivalent of Lambda SnapStart, as Aidan Steele confirmed by testing), apps that generate unique content, open connections, or load ephemeral data at init may diverge. The snapshot froze a moment; whatever needs to be fresh per session cannot be frozen along with it. The fix has a name: lifecycle hooks to re-initialize randomness when each MicroVM is created. Map that out before assuming it just works.

Does it kill the container?

No, and the reason is even better.

The hype of the week is "containers are obsolete." They are not. Quite the opposite: Aidan Steele tested it and you can run Docker inside a MicroVM, with OS capabilities enabled. So the MicroVM does not kill the container, it is more isolated and still runs containers inside. The honest cut is different: there is one specific spot, running untrusted code in isolation, where you will no longer want to harden a container by hand. There the MicroVM wins. Everywhere else, the container is still king.

The details the docs leave out

Aidan Steele spent launch day poking at the service and found some really interesting things that are not in the official docs.
I read it and figured it was worth bringing here:

You can get a shell into the MicroVM, via the CreateMicrovmShellAuthToken API, with pty as a first-class citizen (Lambda Functions do not have it). Gold for IDE and coding-agent use cases.
Outbound UDP is blocked by default and DNS is a local stub, so DNS inside a container falls back to 8.8.8.8 and fails. The fix is to run with Lambda's DNS: docker run --dns 169.254.169.253, or go via VPC.
Lambda network connectors: a reified VPC config (subnets, security groups, an IAM role for the ENI) with its own lifecycle. The network team creates it, the developer just consumes it.
Performance (his tests): image build 2-3 min; RunMicrovm to RUNNING about 2s, plus 2s to serve; suspend and resume about 1s each.

What you take away

Lambda MicroVMs fills a real gap: VM-level isolation with near-instant launch and per-session state, which no single service delivered together.
It does not replace the Lambda Function, it complements it. Function in the backbone, MicroVM for the untrusted code.
The idle suspend is a deliberate cost lever, design your idle-policy on purpose.
Before locking in architecture: check the region (no São Paulo yet), the limits (ARM64, 16 vCPU, 32 GB, 8h), and the snapshot caveat.

This post was the map. In the next one in the series I actually spin up a MicroVM and we prove the isolation in practice, launching two MicroVMs and testing whether one can reach the other, with the repo on GitHub for you to run along.

Got a case where you run user or AI code that today is duct-taped onto a container or a hand-rolled VM? Does this primitive fit? Drop a like, share it with whoever is building a multi-tenant platform, and let's talk. Cheers! =D

Originally published on willpeixoto.dev.