Nitesh Reddy Challa

Posted on Jun 24

How I Deployed Hermes Agent on AWS

#ai #architecture #aws #security

My EC2 instance has a public IP address. It has zero inbound firewall rules. And yet I can reach my AI agent from my phone on Telegram, pull up a full web workspace in my browser, and run shell commands on it — all without opening a single port, without a VPN, and without SSH.

The latest version also splits storage deliberately: persistent agent data stays on EFS, while the Hermes install and Python venv moved to the root EBS volume. That change keeps pip install / hermes update I/O off EFS and brings always-on infra to a highly predictable ~$35/mo.

That's the setup this post is about.

What is Hermes Agent?

Hermes Agent is an open-source AI agent from Nous Research. It's not a chatbot wrapper. It has persistent memory, skills, a file system, a sandboxed terminal backend, and a full web workspace UI. You point it at a model provider and it runs as a daemon — hermes-gateway — serving an OpenAI-compatible API.

The web workspace looks like a proper IDE: chat panel, file browser, terminal, job queue. The Telegram integration is a long-polling bot that connects to the same gateway — no extra server, no webhook, no public URL.

I wanted this running on AWS, backed by Amazon Bedrock (no API keys to rotate, IAM role handles auth), with my agent's memory surviving instance replacements.

Architecture

Your phone (Telegram)
  └─► Telegram servers ──► hermes-gateway long-poll (outbound HTTPS only)

Your laptop (browser)
  └─► aws ssm start-session ──► SSM port-forward :3000
                                   └─► hermes-workspace (loopback only)

EC2 m7g.medium · public subnet · ZERO inbound SG · dynamic public IP
  │
  ├─ hermes-gateway   :8642  (127.0.0.1 only)
  │     ├─ Bedrock inference via IAM role (no API keys)
  │     ├─ Telegram long-poll (outbound HTTPS)
  │     └─ OpenAI-compatible API
  │
  ├─ hermes-dashboard :9119  (127.0.0.1 only)
  └─ hermes-workspace :3000  (127.0.0.1 only)
  │
  ├── EFS /mnt/efs/hermes  (RETAIN · encrypted · uid=10000 access point)
  │     .env · config.yaml · sessions · skills · SOUL.md · logs · state DBs
  │     ↑ persistent agent data — survives instance replacement
  │
  ├── EBS root volume
  │     /opt/hermes-agent      ← hermes venv (pip I/O stays off EFS)
  │     /opt/hermes-workspace  ← workspace UI
  │
  └── Secrets Manager (hermes/runtime)
        API_SERVER_KEY · TELEGRAM_BOT_TOKEN · TELEGRAM_ALLOWED_USERS

Three CDK stacks, deployed in order:

Stack	What it provisions
`HermesNetworkStack`	VPC (1 AZ), public subnet, IGW, S3 gateway endpoint, security groups
`HermesStorageStack`	EFS (RETAIN, encrypted, uid=10000 access point), Secrets Manager
`HermesComputeStack`	EC2 (m7g.medium), IAM (Bedrock-scoped), bootstrap user-data, systemd units

The Security Trick: Zero Inbound Rules

The instinct when deploying anything on AWS is to reach for a private subnet, a NAT Gateway, and VPC interface endpoints. That's the enterprise posture. It's also ~$88/mo in endpoint costs alone before your instance even starts.

For a personal deployment the actual security boundary is not the subnet type — it's what's listening on the instance.

All three services bind to 127.0.0.1 only. The Security Group has zero inbound rules. The public IP on the instance rejects every connection attempt because there is nothing behind it.

# network_stack.py — the entire inbound surface of the instance
self.instance_security_group = ec2.SecurityGroup(
    self,
    "InstanceSg",
    vpc=self.vpc,
    description="Hermes EC2 - zero inbound; egress via IGW. Admin via SSM.",
    allow_all_outbound=True,
)
# No add_ingress_rule calls. Ever.

Admin access is via AWS Systems Manager Session Manager — outbound HTTPS to the SSM service endpoint, no inbound port required. SSM also handles port-forwarding, which is how the workspace reaches your browser.

Telegram uses long-polling. The gateway opens an outbound connection to Telegram's servers and holds it. Telegram pushes messages down that connection. Again: zero inbound.

The result: there is no attack surface on the public IP. Shodan can scan it all day.

The Memory Trick: EFS for Data, EBS for Code

Persistent agent data — SOUL.md, skills, session history, state DBs, the .env with all secrets, the config.yaml — lives on an EFS volume mounted at /mnt/efs/hermes. The hermes binary and venv live on the root EBS volume at /opt/hermes-agent instead.

Why split? EFS Elastic Throughput charges per GB accessed. Moving the venv to EBS removes that install/update path from EFS, keeping steady-state EFS I/O costs around ~$1/mo instead of paying for heavy throughput during dependency updates. See docs/STORAGE.md for the full reference.

The EFS has RemovalPolicy.RETAIN. The access point locks the path to UID 10000. Automatic backups are on with a 35-day window.

# storage_stack.py — the persistence layer
self.file_system = efs.FileSystem(
    self,
    "HermesEfs",
    vpc=vpc,
    encrypted=True,
    removal_policy=RemovalPolicy.RETAIN,       # survives cdk destroy
    lifecycle_policy=efs.LifecyclePolicy.AFTER_30_DAYS,
    throughput_mode=efs.ThroughputMode.ELASTIC,
    enable_automatic_backups=True,
)

self.access_point = self.file_system.add_access_point(
    "HermesAccessPointUid10000",
    path="/hermes",
    create_acl=efs.Acl(owner_uid="10000", owner_gid="10000", permissions="0750"),
    posix_user=efs.PosixUser(uid="10000", gid="10000"),
)

What this means in practice: if the EC2 instance develops a problem, you run cdk deploy and get a fresh one. The new instance mounts the same EFS, reads the same .env, reinstalls the venv to EBS via user-data, and all three systemd services start with the agent's full memory intact. No manual data migration, no re-configuration.

The EC2 root EBS is flagged delete_on_termination=True. Agent data is on EFS (RETAIN); install artifacts on EBS are recreated automatically on each deploy.

Bedrock: No API Keys, IAM Role Does the Work

Hermes connects to Bedrock via the Hermes Bedrock guide. The EC2 instance has an IAM role scoped to bedrock:InvokeModel, bedrock:Converse, and the streaming variants — on specific inference-profile and foundation-model ARNs only.

No API keys anywhere. No key rotation. If the instance is compromised, the blast radius is bounded to Bedrock inference on two specific models. The role cannot touch S3, DynamoDB, other accounts, or anything else.

Two models run in this stack:

Model	Role	Why
`us.anthropic.claude-sonnet-4-6`	Primary (all main agent tasks)	Best reasoning for the price on Bedrock
`us.amazon.nova-lite-v1:0`	Auxiliary (5 background slots)	~85× cheaper than Sonnet for web extraction, vision, summarisation

The us. prefix is the cross-region inference profile — Bedrock routes to us-east-1, us-east-2, or us-west-2 automatically for throughput. You enable both models once in the Bedrock Model Access console and never touch it again.

Cost Breakdown

Infra (always-on, us-east-1)

Component	Detail	≈ Monthly
EC2 `m7g.medium` (Graviton, 2 vCPU / 4 GiB)	730 hrs × $0.0404/hr	~$29.50
EBS gp3 root (30 GiB, encrypted)	venv + workspace on EBS	$2.40
EFS Standard (~64 MB agent data)	$0.30/GiB-mo storage	~$0.02
EFS Elastic throughput I/O	venv/deps on EBS; steady-state session/state access only	~$1/mo
EFS automatic backups	~$0.05/GiB-mo	~$0.50
Secrets Manager	1 secret × $0.40	$0.40
CloudWatch Logs + metrics	ingestion + custom metrics	~$2
NAT Gateway / VPC endpoints	none	$0
Infra total (always-on)		≈ $35/mo

No NAT Gateway. No interface VPC endpoints. The EC2 routes outbound directly through the Internet Gateway. That single architectural decision — public subnet, zero-inbound SG instead of private subnet + NAT — is 58% cheaper than the equivalent private-subnet setup with six VPC endpoints.

Stop it when you're not using it

aws ec2 stop-instances --instance-ids <InstanceId> --region us-east-1

EC2 compute billing stops immediately, and most EFS data-access I/O should stop with the services. EFS storage, EBS, Secrets Manager, and CloudWatch keep billing at ~$8/mo. When you start it again, SSM is ready in ~60 seconds and all three hermes-* systemd units restart automatically. No re-bootstrapping, no re-configuration, agent memory fully intact.

Floor: ~$8/mo when off. ~$35/mo when always-on.

Bedrock tokens (variable, on top of infra)

Model	Rate	Typical personal use
Claude Sonnet 4.x	~$3/M in · $15/M out	$10–50/mo
Nova Lite (aux slots)	~$0.06/M in · $0.24/M out	< $2/mo

vs. the alternative

ChatGPT Plus is $20/mo. You get no persistent agent filesystem, no terminal backend, no Telegram long-polling, and far less control over where memory and logs live.

The Hermes setup is more infrastructure to own, but that is the point: you own the memory, the skills, the SOUL.md that shapes the agent's persona, the logs, and the conversation history. Stop the instance today, redeploy in six months, and the agent picks up from the same EFS-backed state.

The Setup, Briefly

Enable Bedrock model access — one-time in the console, two models
cdk deploy --all — provisions all three stacks; first boot takes 5–8 min (package installs + workspace build)
Create a Telegram bot via @BotFather, get your user ID via @userinfobot
Add the bot token + your user ID to Secrets Manager (hermes/runtime), sync to EFS, restart gateway
Port-forward `:3000` via SSM to reach the web workspace from your laptop

# Access the workspace from your laptop
aws ssm start-session --target <InstanceId> \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["3000"],"localPortNumber":["3000"]}' \
  --region us-east-1

open http://localhost:3000

After step 4, Telegram just works. Message your bot, get a reply. No additional setup.

What Surprised Me

I started with a private subnet, a NAT Gateway, and VPC interface endpoints for SSM, Bedrock, Secrets Manager, EFS, and CloudWatch. It's what every AWS security guide recommends. It's also ~$88/mo in endpoint costs before a single token is processed.

The insight that unlocked this architecture: the security boundary for a personal agent isn't the subnet — it's what's reachable on the instance. With zero inbound SG rules and all services bound to loopback, the public IP is inert. SSM and Telegram's long-polling handle the two access patterns (admin shell / bot messages) over outbound HTTPS. No VPN, no bastion, no open ports.

The most secure design for this use case turned out to be the simplest one.

Built with Hermes Agent · AWS CDK (Python) · Amazon Bedrock · SSM Session Manager

DEV Community