close

DEV Community

Vinothsingh Elumalai
Vinothsingh Elumalai

Posted on

How an AI Terminal Assistant Became My Team's Most Productive Engineer - Opencode + Claude + MCP

Table of Contents


The Moment That Changed Everything

It was 11pm on a Tuesday. A cache migration in our production environment had just caused thousands of authentication failures for two of our largest enterprise customers. Our VP of Product wanted answers. Our support team was fielding escalations. And our engineers were alt-tabbing between AWS console, Datadog, GitHub, Azure DevOps, and PagerDuty trying to piece together what happened.

Three weeks later, when we needed to attempt the same change again, an engineer typed this into a terminal:

"Review the ADO change ticket, compare the MOP against the actual ElastiCache configuration in prod region, check the K8s config repo for how Redis env vars are wired on the Green cluster, and tell me if this approach avoids the token validation failure that caused the previous customer impact."

Fourteen seconds later, the system had pulled the work item, queried AWS ElastiCache across four regions, read the Kubernetes configuration from GitHub, cross-referenced the deployment patches, and delivered a precise technical assessment including a risk it identified that the team hadn't documented: in-flight tokens during the 30–60 second Global Accelerator propagation window.

That system is OpenCode — an AI-powered CLI assistant connected to our entire operational stack through the MC(Model Context Protocol). And it has fundamentally changed how a 20-person platform engineering team manages infrastructure serving thousands of enterprise tenants and processing millions of authentication requests daily.


What It Actually Is

OpenCode is deceptively simple in concept. A terminal application on an engineer's laptop. You type questions or tasks in plain English. It responds with answers pulled from live production systems.

  Engineer (terminal)
        │
        ▼
    OpenCode (Claude AI)
        │
        ▼
    MCP Servers
   ╱  │  │  │  ╲
  ▼   ▼  ▼  ▼   ▼
 AWS  DD  GH ADO PD  RD

 AWS = Amazon Web Services (prod + non-prod)
 DD  = Datadog (logs, metrics, monitors)
 GH  = GitHub (repos, PRs, code)
 ADO = Azure DevOps (tickets, sprints, wikis)
 PD  = PagerDuty (incidents, schedules)
 RD  = Rundeck (jobs, executions)
Enter fullscreen mode Exit fullscreen mode

The magic is in those MCP servers. Each one is a lightweight connector to a backend platform. When you ask a question, the AI doesn't guess — it makes real API calls against real systems and works with real data.

Ask "what's our AWS spend this month?" — it queries Cost Explorer. Ask "which tenant generates the most provisioning traffic?" — it aggregates Datadog logs. Ask "what did that PR change in the K8s config repo?" — it reads the actual file diff from GitHub. Ask all three in the same sentence and it does them in parallel.

No pre-built dashboards. No saved queries. No runbooks to follow. You just ask.


The Setup Nobody Believes Is This Simple

The entire configuration is a single JSON file. Each MCP server gets a block: here's the server binary, here's the credentials, connect.

{
  "mcpServers": {
    "aws-prod": {
      "command": "aws-mcp-server",
      "env": { "AWS_PROFILE": "prod" }
    },
    "datadog": {
      "command": "datadog-mcp-server",
      "env": {
        "DD_API_KEY": "...",
        "DD_APP_KEY": "..."
      }
    },
    "github": {
      "command": "github-mcp-server",
      "env": { "GITHUB_TOKEN": "..." }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The AI model never sees the credentials. It calls tools by name — "search logs in Datadog" or "describe EKS clusters" and the MCP server handles authentication, pagination, error handling, and response formatting.

Adding a new system takes about ten minutes. Write a config block, provide credentials, restart.


Focused Sessions — One Agent, One Mission

Here's something that changes how you think about AI assistants: you can create focused sessions with a single purpose.

Right now, as I write this article, I have an OpenCode session that's been running for days as a documentation advisor. It's reviewed my architecture docs, drafted technical articles, generated formal roadmap documents, and is tracking project milestones. When I start a new conversation about something unrelated, I can tell the session: "This session is reserved for documentation work only" — and it keeps me focused.

This pattern works for any focused workstream:

Session Purpose Tools Used
Documentation Advisor Article drafting, roadmap generation, technical writing Doc Agent, GitHub, web search
Incident Responder Active incident investigation and RCA Datadog, GitHub, PagerDuty, AWS
Cost Analyst Monthly spend review, waste identification AWS (Cost Explorer, EC2, RDS, S3)
Sprint Planner Ticket creation, backlog grooming, capacity planning Azure DevOps, GitHub
Security Reviewer Code review, vulnerability assessment GitHub, AWS (IAM, SecurityHub)

Each session maintains context across the entire conversation. The AI remembers what you discussed 3 hours ago. It builds on previous findings. It doesn't start from zero every time.


This is the real power: Not one assistant that does everything poorly. Multiple focused sessions, each purpose-built for a specific mission, with the right tools connected and the right context loaded.

Sub-Agents With Specific Skill Sets

Beyond focused sessions, you can create sub-agents — specialized configurations trained for specific domains:

The Doc Agent

Generates formal documents — postmortems, RCA reports, roadmaps, technical specs. It knows document templates, formatting standards, and outputs polished Word/PDF files.

I used this to generate formal migration roadmaps, architecture documents, and execution playbooks — all properly formatted, ready to share with leadership.

The ADO Agent

Creates, updates, and queries Azure DevOps work items. It understands your project structure, sprint cadence, and ticket hierarchy (Epic → Feature → Task).

One prompt: "Create a Feature under the cleanup Epic with 8 tasks — one per batch" — and 9 tickets exist with proper hierarchy, descriptions, and assignments.

The AWS Agent

Queries across all regions, all services. Cost analysis, resource inventory, security posture review. Runs in read-only mode with separate IAM profiles for prod vs. non-prod.

The Incident Agent

Connected to PagerDuty + Datadog + GitHub. When an alert fires, it pulls the monitor definition, searches logs, checks recent deployments, and synthesizes findings. This is the agent that eventually became FRIDAY — but more on that later.

The GitHub Agent

Code review, PR analysis, repository search. It reads actual code and configs, not summaries. When someone asks "what changed in the proxy config last week?" — it reads every commit.

Build Your Own

Any system with an API can become an MCP server. The pattern is:

  1. Find or build an MCP server for the platform (many exist: AWS, Datadog, GitHub, PagerDuty, Slack, Jira, Confluence...)
  2. If one doesn't exist — build a lightweight one. An MCP server is just a program that exposes tools via the MCP protocol. A basic one is ~100 lines of Python.
  3. Add it to your config — one JSON block with the command and credentials
  4. Restart — the AI can now query that system
# A minimal custom MCP server (simplified):
@mcp.tool()
def query_my_system(query: str) -> str:
    """Query our internal API"""
    response = requests.get(
        f"https://internal-api.company.com/search",
        params={"q": query},
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    return response.json()
Enter fullscreen mode Exit fullscreen mode

The principle: If your tech stack has an API, you can make it conversational. DNS provider? MCP server. Internal CMDB? MCP server. Terraform state? MCP server. The AI becomes as capable as the tools you connect to it.

What We've Actually Achieved

This isn't a proof of concept. Here's what production operational work looks like with OpenCode:

The >$100K/Month Cost Discovery

Finance asked: "What does each customer cost us?" In shared infrastructure where a single proxy pod serves all tenants — the conventional answer is "we can't really tell you."

We asked OpenCode. One session:

  • Pulled monthly billing from AWS Cost Explorer: $308,763 across four regions
  • Discovered database Storage IO alone was $40,985/month
  • Switched to Datadog, aggregated 149 million proxy log entries from 7 days
  • Broke down by tenant: top customers = 43% of all platform traffic
  • Identified two accounts consuming 60% of all activity
  • Found $>100,000/month in addressable waste

Output: a 361-line Word document with every number traced to an API response. Not estimates. Not SWAGs. Production telemetry.

The Unused Resources Audit

Across two AWS accounts and four regions:

  • unassociated Elastic IPs (including legacy BYOIP blocks)
  • load balancers attached to decommissioned clusters
  • duplicate NAT gateways in the same subnets
  • A temporary RDS instance someone forgot about
  • Lambda functions on end-of-life Node.js runtimes

Output: an Excel workbook, color-coded, with subtotals. Combined waste: ~$3,200/month plus the $97K overlap.

The Cache Migration Pre-Mortem

When the retry was planned, one prompt produced a full technical assessment:

  • Read the ADO ticket for the method of procedure
  • Queried ElastiCache for current topology
  • Read 20KB of Kubernetes YAML from GitHub
  • Identified a risk the team hadn't documented: in-flight tokens during the 30-60 second traffic propagation window

Total time: 14 seconds.


How It Became a Force Multiplier for Incident Response

The first time I used OpenCode during a live incident, I realized something: the AI was doing incident investigation faster and more consistently than our engineers.

Not because it's smarter — because it doesn't context-switch.

A human investigating an incident opens:

  1. PagerDuty → read the alert
  2. Datadog → search for the service, find the error spike
  3. GitHub → check if someone deployed something
  4. Cross-reference timestamps between all three tools
  5. Form a hypothesis
  6. Drill deeper — check affected tenants, error paths, queue depths
  7. Write up findings

That's 15-45 minutes for an experienced engineer. More for a junior. And the cognitive overhead of switching between tools while sleep-deprived leads to missed signals and wrong conclusions.

With OpenCode, the same investigation is one conversation:

"PagerDuty alert fired on proxy 5xx errors in EU. Check Datadog for error rates by backend and affected tenants. Check GitHub for any recent deployments to the primary EU cluster. What changed?"

90 seconds to 3 minutes. Every time. No context switching. No missed signals. No investigating the wrong region.


The realization: If the AI can do this interactively in my terminal session, it can do it autonomously when triggered by a PagerDuty webhook. It doesn't need me to type the question — it can formulate the question itself from the alert payload.

That realization created FRIDAY.


From OpenCode to FRIDAY — The Agent That Investigates Incidents Autonomously

FRIDAY is essentially OpenCode's incident investigation pattern, extracted into a Lambda that runs without a human typing the questions.

The evolution:

Stage System Human Involvement Response Time
Before 5 dashboards + manual investigation 100% human 15-45 minutes
OpenCode AI-assisted investigation (human asks) Human types the prompt 90 seconds
FRIDAY Autonomous investigation (webhook triggers) Human reads the findings 90 seconds (automated)

Same tools. Same reasoning pattern. Same output format. But no human in the loop for the investigation phase — the on-call engineer wakes up to finished analysis instead of a raw alert.

Results after months in production:

  • 65% MTTR reduction
  • 85% AI tool adoption across the engineering team (up from 20%)
  • ~80% reduction in false escalations

From FRIDAY to JARVIS — Thinking About Write-Path Autonomy

Once FRIDAY proved that an AI agent could reliably investigate production incidents (read-only), the natural question was: can it also fix things?

Not incidents — those require human judgment in the moment. But vulnerability remediation — the routine security fixes that follow a predictable pattern

JARVIS is designed to handle that 80% — the routine fixes where the remediation is well-understood and the verification is automatable. Human approval gates at every stage. Automatic rollback if anything breaks.


The progression: OpenCode showed that AI + MCP tools can reason across multiple systems. FRIDAY proved it can do this autonomously for investigation. JARVIS extends it to autonomous remediation — with guardrails. Each step builds trust for the next.

Deleting 130k Accounts Without Writing Code

Here's something that surprised even me: a complex tenant cleanup operation was largely driven through OpenCode sessions — by someone who didn't write the rake task.

The rake task existed. But executing it required understanding:

  • Which accounts to target (CSV generation from production queries)
  • How to configure the runner (batch sizes, offsets, phase tracking)
  • How to monitor progress (Datadog dashboard interpretation)
  • How to troubleshoot when things broke (a 48K-user account hung the process, a zombie process ran for 35 hours, a metrics API bug caused silent data loss)
  • How to communicate status (ADO tickets, Teams updates, DBA coordination)

OpenCode handled all of this conversationally:

"What's the current status of Phase 2? Check the Rundeck execution and the Datadog dashboard."

"Create a task under the cleanup Feature for Phase 4 execution. Include the batch count, estimated timeline, and dependencies."

"The runner seems stuck. Check processes on the worker for any rake tasks. What's happening?"

"The DBA says the events database CPU spiked. Pull the top queries from the RDS monitoring dashboard. Cross-reference the account IDs with our cleanup CSV."

Each of these would normally require logging into 2-3 systems, running manual queries, and synthesizing results. With OpenCode, it's a conversation.


The insight: OpenCode doesn't just help experts go faster. It enables non-experts to execute complex operations by providing the context and tool access they'd otherwise lack. You don't need to know how to read a Datadog dashboard if you can ask "are there any errors related to our cleanup?"

Learning the Entire Product in Conversations

When I joined the team, understanding the full platform took months. Multiple microservices. AWS regions. EKS clusters. Proxy backends. RabbitMQ queues. Aurora databases. Redis caches.

No single engineer understands all of it. The knowledge is distributed across dozens of people, hundreds of documents, and thousands of configuration files.

OpenCode changed how new team members (and existing ones exploring unfamiliar areas) learn the platform:

"How does the push notification service work? What's its architecture? Where does it run, what does it depend on?"

The AI reads the K8s config repo, checks which clusters the service is deployed to, reads the deployment YAML for dependencies (RabbitMQ queues, SNS topics, Redis), and synthesizes a technical overview — from live configuration, not stale documentation.

"What happens when a user logs in via SAML? Trace the request path from the browser through the proxy to the backend services."

It reads the proxy backend configuration from GitHub, identifies the routing rules, checks which services handle SAML assertions, and traces the dependency chain — all from actual config files and Datadog service maps.

This isn't replacing documentation. It's making the infrastructure self-documenting. The source of truth isn't a wiki page someone wrote 18 months ago — it's the live configuration that the AI reads in real-time.


Why This Is Different From ChatGPT

Every engineer has pasted error messages into ChatGPT. That's not what this is.

ChatGPT OpenCode + MCP
Data source General training data Live production systems via API
Specificity "A 401 error usually means..." "Your API gateway generated 1.3 million of them yesterday"
Infrastructure Doesn't know your systems Queries your actual AWS, Datadog, GitHub
Freshness Training cutoff Real-time data
Hallucination Common for specifics Can't hallucinate API responses
Action Suggests what to do Does it (queries, aggregates, cross-references)

The model doesn't need to be told which tools to use. Ask "is the cache migration approach safe?" and it independently decides to: read the ADO ticket, query ElastiCache, read the K8s config, compare env var wiring, and synthesize. The engineer didn't specify any of those steps.


The Uncomfortable Truth

The uncomfortable truth is that most of what this system does isn't hard. Any senior engineer can query AWS Cost Explorer, aggregate Datadog logs, read a GitHub PR, and review an ADO ticket.

The hard part is doing all of them in the same mental context, in the same hour, without losing the thread.

An engineer investigating the cache migration opens AWS in one tab, Datadog in another, GitHub in a third, ADO in a fourth, terminal in a fifth. Copy cache endpoint addresses, paste into GitHub search, cross-reference with K8s config, check ADO for the deployment timeline, look at Datadog for the error spike. Context switches. Tab switches. Copy-paste. Scroll. Search. Repeat.

The system doesn't eliminate the need for engineering judgment. The engineer still decides whether the approach is safe, whether the risk is acceptable, whether the cost model makes sense. What the system eliminates is the mechanical overhead of gathering the information needed to make those decisions.

That overhead, across a 20-person team managing multi-region production infrastructure for a global identity platform, adds up to something significant.


What's Next

We're six MCP servers in. The gaps are obvious: DNS management, direct Kubernetes cluster access for kubectl operations, Confluence for documentation. Each one is a JSON config block and a credential away from being connected.

But the more interesting trajectory isn't more connectors — it's more autonomy:

┌────────────────────────────────────────────────────────┐
│  THE PROGRESSION                                        │
│                                                         │
│  Stage 1: Answer questions (OpenCode — today)           │
│     "What caused the 401 errors?"                       │
│                                                         │
│  Stage 2: Investigate autonomously (FRIDAY — live)      │
│     PagerDuty webhook → full analysis in 90 seconds     │
│                                                         │
│  Stage 3: Remediate autonomously (JARVIS — designing)   │
│     Vulnerability finding → PR → deploy → verify        │
│                                                         │
│  Stage 4: Predict and prevent (future)                  │
│     Detect anomaly → correlate → alert before impact    │
└────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Each stage builds trust for the next. Read-only first. Then write-path with approval gates. Then proactive monitoring. Then autonomous prevention.

The technology is ready for all of it. The trust model is what needs to catch up. We run AWS in read-only mode for a reason. But the trajectory is clear.


For now, we'll settle for the fact that when Finance asks "what does our biggest customer cost us?" — we answer with a number that came from production telemetry, not a spreadsheet someone made up.

The complete OpenCode + MCP ecosystem
Current MCP Servers (6):
  • AWS Production (IAM profile: prod, read-only)
  • AWS Non-Production (IAM profile: non-prod, read-only)
  • Datadog (logs, metrics, monitors, dashboards)
  • GitHub (org repos, PRs, commits, code search)
  • Azure DevOps (work items, sprints, wikis, pipelines)
  • PagerDuty (incidents, schedules, escalation policies)

Sub-Agents Built:

  • Doc Agent (postmortems, RCAs, roadmaps → Word/PDF)
  • Incident Agent (FRIDAY — autonomous, Lambda-based)
  • ADO Agent
  • AWS Agent
  • PD Agent
  • Github Agent

The beauty of this setup is that you can hook up as many tools as you want and create sub-agents for each of them — or just have one agent connected to everything. There's no right answer. Some engineers on my team prefer a single session with all 6 MCP servers connected — they ask about AWS costs, then pivot to a GitHub PR review, then check a PagerDuty schedule, all in one conversation. Others prefer focused agents: an AWS-only session for cost analysis, a Datadog-only session for incident investigation, a GitHub-only session for code review. The system doesn't impose a pattern and it adapts to how you think. Start with one MCP server. Connect your observability platform, or your ticketing system, or your cloud provider — whichever one you spend the most time context-switching into. Once you see the AI pull live data from it in a conversation, you'll immediately know which system to connect next. Within a week, you'll wonder how you ever operated without it

Articles in the AI-Native SRE Series:

  1. FRIDAY — Autonomous Incident Investigation
  2. JARVIS — Autonomous Vulnerability Remediation (upcoming)
  3. Tenant Cleanup — Live Debugging at Scale (upcoming)
  4. Platform Command Center (upcoming)
  5. Rundeck Migration — Legacy Jobs to Cloud-Native (upcoming)
  6. This article — the origin story


I'm Vinothsingh Elumalai, a Platform Engineering leader building AI-native operations at enterprise scale. I lead infrastructure for a global IAM/SSO platform serving millions of users across multiple AWS regions. This article is the origin story of everything in my AI-Native SRE series.

Connect with me on LinkedIn — I write about the intersection of AI, DevOps, and the future of platform engineering.

Follow for the Full Series

Top comments (2)

Collapse
 
nazar_boyko profile image
Nazar Boyko

Quick question on that 130k account cleanup. You mentioned a metrics bug caused silent data loss partway through, and that's the exact failure the conversational layer is worst at surfacing, since nothing errors out and the agent reports progress as normal. When the person driving doesn't deeply know the systems, how did that one actually get caught? That's the case I'd most want a story about, because it's where "just ask in plain English" stops protecting you.

Collapse
 
hiper2d profile image
Aliaksei Zelianouski

What's underrated here is the cross-signal reading. Everyone talks about agents writing code; nobody talks about how good they are at reading the telemetry no one has time to read. I run something close - an agent that scrapes a few platforms and 8 API providers daily and stitches them into one timeline for problem analysis. Genuinely useful, and I think it's the boring superpower most people are sleeping on.

Two questions from the trenches. How's FRIDAY holding up over months, not launch week? Mine is useful and still breaks at random, no matter what monitoring or self-checks I bolt on - the 65% MTTR and 85% adoption are launch numbers, I'd want to see the maintenance bill six months in. And I wouldn't make the jump from read-only IAM to autonomous remediation, at least not yet. I get nervous handing an agent a dev cluster, never mind letting it act on prod unwatched. What made you comfortable with that?