DEV Community: DevOps AI ToolKit

Why AI Loves Ansible (And You Should Let It Help)

James Joyner — Thu, 25 Jun 2026 04:36:48 +0000

If you compare how well Claude handles Ansible against how well it handles, say, raw bash or kubectl YAML, Ansible wins by a wide margin. The reason isn't subtle: Ansible's shape — declarative, idempotent, modules-with-arguments — happens to map almost perfectly to how LLMs reason. They're good at producing structured output that fills in a known template, and that's what most Ansible tasks are.

This means AI-assisted Ansible work is the highest-leverage automation pairing I know of. If you only adopt AI for one infrastructure tool, make it Ansible.

What makes Ansible AI-friendly

Modules have published contracts

Every Ansible module has a documented argument spec: what's required, what's optional, what the defaults are. The model can fit your intent into the spec with high accuracy because the spec is finite and known.

Compare this to shell: there are a thousand ways to "create a user with a specific UID, member of these groups, with this shell, and a home directory in this location." In bash, every distro is slightly different. In Ansible, you use ansible.builtin.user with named arguments.

The model gets this right every single time.

Idempotency is the default

When a model generates a Python script, it has to think about "what if this is run twice." When it generates Ansible, most modules handle that for free. The model can write the task, ignore the re-run case, and produce something that works.

This means the cognitive load on both sides — model and human — is lower. You're describing the target state, not the procedure.

Roles and structure are predictable

roles/foo/{defaults,vars,tasks,handlers,templates,files,meta}/main.yml — every Ansible role looks the same. The model can scaffold a new role in seconds because the layout is fixed.

If you ask Claude to "create a new role for installing PostgreSQL 16 on Ubuntu 24.04 with default user postgres and a tuned postgresql.conf," you'll get a complete role structure with defaults/main.yml, tasks/main.yml, a Jinja template, and handlers/main.yml — all consistent, all in the right places. The structure is constrained enough that the model rarely improvises.

Use cases where AI shines for Ansible

Generating new roles from scratch

This is the killer app. You can describe a role in two sentences and get a 90%-done implementation. You then refine: add validation, adjust defaults, write a README.

I now treat "draft a new role with Claude" as the default first step. Even if I rewrite half of it, the structure saves me 20 minutes.

Converting shell scripts to playbooks

If you have a legacy bash script that provisions a server, pasting it into Claude with "convert this to an idempotent Ansible playbook using the appropriate modules" produces a usable result. The model knows when to use ansible.builtin.file, lineinfile, template, service, etc.

You'll need to verify the idempotency manually (run twice, expect 0 changes on the second run), but the conversion is mostly mechanical.

Refactoring playbooks to use FQCN

Ansible 2.10+ wants fully-qualified collection names: ansible.builtin.package instead of package. Old playbooks have hundreds of short-form references. AI is a perfect fit for this kind of mass refactoring — it knows the mapping and won't get bored.

Paste a 200-line playbook, ask for it back with FQCN throughout, and you're done in 30 seconds. Verify with ansible-lint.

Writing Molecule tests

Molecule scaffolding is repetitive — same molecule.yml, same converge.yml, same verify.yml structure for most roles. AI is great at generating the boilerplate. You describe what you want to test; the model writes the assertion playbook.

Jinja template generation

Jinja is just structured-enough that AI handles it well — generating templates for config files (nginx, postgres, sshd) from a description of the desired behavior. The model knows the configuration keys and the conditional structure.

Where AI struggles with Ansible

Variable precedence

Ansible's 21-layer variable precedence rules are not intuitive. The model will sometimes suggest putting a variable in vars/main.yml when you really want it in defaults/main.yml (the former overrides the latter). The result: users of your role can't override the variable they expected to.

Check: When the model puts something in vars/, ask "should this be overridable by the role user?" If yes, move to defaults/.

Custom facts and `set_fact` lifetime

The model sometimes uses set_fact for values that need to persist across plays, but doesn't add cacheable: true. The fact is then gone after the play ends, and the next play sees undefined.

Check: When you use set_fact for a value you need later, verify the lifetime is what you expect.

Vault integration

The model will sometimes generate playbooks that reference vault_db_password as a variable but don't include the lookup('community.hashi_vault.hashi_vault', ...) call or the Ansible Vault encrypted file. You have to wire up the secret source separately.

Check: For any sensitive variable in a generated playbook, verify there's an actual source for it (Vault encrypted file, external manager lookup, environment variable).

Distro-specific paths

The model defaults to Debian/Ubuntu conventions. If you run on RHEL, you'll sometimes get apt modules in tasks that should be using the package module (or distro conditionals).

Check: When generating playbooks for non-Debian systems, audit for apt, apt_repository, dpkg_selections, and ask for the abstraction (package) or the distro split.

A workflow that's been working for me

For a new role, my process now looks like this:

Describe the role to Claude in 2-3 sentences (purpose, target distros, key behaviors).
Generate the scaffolding: defaults/main.yml, tasks/main.yml, a template if needed, meta/main.yml with platforms.
Read every task. Look for the failure modes above (precedence, lifetime, Vault, distros).
Add Molecule tests. Have Claude scaffold molecule/default/, then write the assertions yourself or ask for them.
Run ansible-lint and Molecule. Fix what they catch.
Idempotence check. Run the role twice; second run should report 0 changed.
Refine the README. This is the one place I write from scratch — explaining the role to future-me.

This takes maybe 30 minutes for a moderately complex role. Without AI assistance, the same role would take me a couple of hours.

A note on safety

Ansible runs as root on production servers. Whatever the model generates, you are responsible for what it does. Two patterns I follow:

Check --check --diff before any real run. Dry-run the playbook in check mode; verify the diff matches what you expect.
Test on a sandbox host first. Especially for new roles. Don't trust the model with production until the role has run cleanly on a throwaway VM.

These are the same disciplines that apply to any infrastructure change. AI doesn't change the discipline; it just makes you faster at the parts before the change.

Why I think Ansible is the right entry point

If you're new to using AI for infrastructure work and want to pick one tool to start with, Ansible is the safest, highest-leverage choice. The structure makes the AI accurate. The idempotency makes mistakes recoverable. The module ecosystem covers most common cases.

By the time you've used AI to write a dozen Ansible playbooks, you'll have developed the intuition for what AI handles well and what needs human attention. That intuition transfers to harder tools — Terraform, Kubernetes, custom shell — where the cost of AI mistakes is higher.

For our full set of AI-driven Ansible workflows, see the IaC category — including ansible-vault-secrets-management and ansible-molecule-testing.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

AI for GitLab CI Authoring: Save Hours, Avoid Footguns

James Joyner — Sat, 20 Jun 2026 12:15:22 +0000

GitLab CI YAML is one of those formats where you can stare at it for an hour, get it 95% right, and have it fail with yaml: line 12: did not find expected key because of a tab character. AI assistants are very fast at this kind of work. They're also confidently wrong about specific GitLab features in ways that waste a lot of time if you don't know what to check.

After a year of letting Claude write a lot of my pipelines, here's what works and what doesn't.

What AI gets right consistently

Standard job shapes

"Write me a job that builds a Docker image, pushes to the GitLab Container Registry, and tags with the commit SHA and latest on the default branch." Type that into Claude and you get a working job in five seconds. The shape is well-established and the model has seen thousands of variations.

The same is true for:

Test jobs across languages (pytest, jest, go test, etc.)
Standard cache configurations
Standard artifact patterns
Basic rules: for branch / tag / MR pipelines

If you find yourself writing one of these from scratch, you're spending time that you don't need to spend.

Translating from other CIs

GitLab CI has obvious parallels to GitHub Actions, CircleCI, Jenkins declarative pipelines, etc. AI is excellent at translating between them. The structures rhyme; the model knows the dictionary.

If you're migrating from Actions to GitLab CI, paste the workflow and ask for the GitLab CI equivalent. You'll get something 80% right that you can refine.

Reviewing pipelines for inefficiency

This is the underrated use case. Paste your .gitlab-ci.yml and ask: "what's the critical path of this pipeline, and what's making it slow?" The model will spot things like:

"Your test job downloads node_modules from cache, but install-deps doesn't push to cache — your cache key is broken."
"Your build and deploy stages are sequential but build's artifacts aren't used by deploy — they can be parallel with needs:."
"Your rules:changes: doesn't include package-lock.json, so dependency changes don't retrigger tests."

These are real findings I've gotten from Claude on pipelines I thought I'd already optimized. Worth the five-minute review.

What AI gets wrong — and how to catch it

`rules:` vs `only/except` confusion

The model will sometimes mix them in the same job. GitLab silently ignores only: when rules: is also defined. The pipeline runs but the behavior isn't what you expect.

Check: Are you using rules: OR only:/except: in each job? Pick one. (Use rules: — only/except is legacy.)

`$CI_COMMIT_BRANCH` empty on MR pipelines

A common bug: you ask for "this job runs on the default branch" and you get:

rules:
  - if: $CI_COMMIT_BRANCH == "main"

This is correct for branch pipelines. It is empty on MR (merge_request_event) pipelines. If you have MR pipelines enabled, your job silently won't run when developers expect it to.

Check: Does your pipeline target both push events and MR events? If so, you probably want $CI_MERGE_REQUEST_TARGET_BRANCH_NAME or to handle both pipeline sources.

`needs:` referencing hidden jobs

Hidden jobs (prefixed with .) are templates — they don't execute. If you do needs: [".lint"], your job will fail with a confusing error because GitLab thinks you're depending on a job that doesn't exist.

Check: Every needs: entry should be a real job name, not a template.

Auto-apply rules that don't include the right branches

The model loves writing:

rules:
  - if: $CI_COMMIT_BRANCH == "main"
    when: always
  - when: never

This works on main but blocks the job on tags, on schedules, and on MR pipelines. Sometimes that's what you want. Often it's not.

Check: What pipeline sources do you expect this job to run in? List them, then verify your rules cover each.

Imaginary GitLab features

This is the most expensive AI failure mode. The model will sometimes generate syntax for features that don't exist:

A condition: field that's actually OPA/Conftest, not GitLab CI
An auto_retry: block that's GitHub Actions, not GitLab
A before_script: keyword that does exist but with different semantics than the model claims

Check: If you see a keyword you haven't seen before in GitLab docs, verify it exists. The lint endpoint (/api/v4/ci/lint) catches most of these, but some pass lint and just behave weirdly.

A workflow that catches the failures cheaply

I now do this for any non-trivial pipeline change:

Draft with AI. Describe the desired behavior in plain English; let the model write the YAML.
Read every line. Treat the output as a draft you'd write yourself.
Lint via the API.

   curl -s --header "PRIVATE-TOKEN: $TOKEN" \
       --header "Content-Type: application/json" \
       --data "{\"content\": $(cat .gitlab-ci.yml | jq -Rs .)}" \
       "$GITLAB_URL/api/v4/ci/lint" | jq

Run on a sandbox branch. Push to a branch that won't trigger deploys; verify the pipeline runs the jobs you expect, when you expect.
Diff against the existing pipeline. If the AI introduced changes you didn't ask for (a different cache key, a removed interruptible:), revert them.

Step 5 is the one most people skip. The model is good at writing YAML but not at preserving your previous decisions. If you don't diff, you'll lose your old cache strategy.

A practical example

Last month I needed to add a job that runs terraform plan on every MR and posts the output as a comment. Drafted with Claude in two minutes; it produced something like:

terraform-plan:
  image: hashicorp/terraform:1.9
  stage: plan
  script:
    - terraform init
    - terraform plan -out=tfplan -no-color
    - terraform show -no-color tfplan > plan.txt
    - |
      curl -X POST -H "PRIVATE-TOKEN: $GITLAB_API_TOKEN" \
          -d "body=$(cat plan.txt | jq -Rs .)" \
          "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes"
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

This is almost right. Two issues:

PRIVATE-TOKEN as a CI variable — using a personal access token for CI is the old pattern. Modern approach: use $CI_JOB_TOKEN for in-instance API calls. Saves rotation pain.
No terraform init -backend-config — works if the backend is configured in code, but if you have multiple environments using the same module, you'd want to specify which backend.

Both fixes are 30 seconds. Without the AI I'd have spent 15 minutes writing the curl invocation alone.

The bottom line

AI doesn't replace knowing GitLab CI. It removes the typing and the boilerplate so you can spend your attention on the parts that matter — the rules: logic, the cache keys, the secrets, the environment promotion.

Once you've internalized the failure modes above, the workflow becomes mostly automatic. You stop reading the boilerplate and start reading the rules. That's where the bugs live.

For the prompt set we use on GitLab CI specifically, see the GitLab CI/CD category — particularly gitlab-pipeline-optimization and gitlab-ci-rules-debugging.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

Securing AI-Generated Bash Scripts Before You Run Them

James Joyner — Thu, 18 Jun 2026 15:51:56 +0000

Bash is the easiest language for AI to write and the easiest language to get devastating output from. A 20-line script that "just cleans up old files" can recursively delete a home directory because the model assumed a variable would always be set. A "simple log shipper" can write your secrets to a remote server because the model used set -x for debugging and forgot to remove it.

I have run AI-generated bash that I should not have. Most engineers I know have too. After enough close calls, there's a short checklist that catches the worst of it. This is that checklist.

The five things to check before running any AI-generated bash

1. Does it start with a strict pragma?

The first lines of any non-trivial bash script should be:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

What each does:

set -e — exit on any command failure. Without this, a failure in line 5 doesn't stop the script from happily running lines 6-50.
set -u — error on undefined variables. This is the one that saves you from rm -rf $UNDEFINED/.
set -o pipefail — propagate failures through pipes. Without it, failing-command | grep something succeeds because grep succeeds.
IFS=$'\n\t' — sane field splitting. Defends against word-splitting bugs in filenames.

If the AI-generated script doesn't have these, add them and re-read the script. You'll often discover bugs the pragma now flags.

2. Is every variable expansion quoted?

# Wrong
rm -rf $TARGET_DIR

# Right
rm -rf "$TARGET_DIR"

The wrong version is what causes the "I deleted the root directory" stories. If $TARGET_DIR is empty or contains a space, the command becomes rm -rf (delete current directory) or rm -rf foo bar (delete two unintended things).

Models default to the wrong version about half the time because the right version is harder to write in chat ("escape the quotes!") and the wrong version is what most blogs show.

Fix: When reading AI bash, mentally check every $VAR for quotes. Add them if missing. This is the single biggest source of bash disasters.

3. What happens if a step fails partway through?

The AI will cheerfully write:

mkdir -p /opt/new-app
cd /opt/new-app
tar xzf $TARBALL
rm $TARBALL

What happens if tar xzf fails (corrupt tarball, full disk)? With set -e, the script stops. Good. Without set -e, it continues to rm $TARBALL and deletes your tarball with no backup.

For any state-changing script, ask yourself: at each step, what's the recovery path if the step fails? If the answer is "nothing automated," the script should at least not delete data before verifying the previous step succeeded.

The AI almost never thinks about this on its own.

4. Are secrets visible in logs?

The most common way AI-generated bash leaks secrets is via set -x:

set -x  # debugging
curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com/...

With set -x, every command is printed including the expanded variables. Your API token is now in the script's output, which is in your CI logs, which are visible to anyone with project access.

The fix is selective:

set +x  # disable trace
curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com/...
set -x  # re-enable

Or simply remove set -x once debugging is done. The model frequently leaves it in.

5. Does it run as root unnecessarily?

The AI will sometimes write sudo into every command, even ones that don't need it. Or it'll assume the script runs as root and use absolute paths that require root to write.

The principle: if a command can run as a non-root user, it should. The smaller the privileged surface, the smaller the blast radius.

This is especially important for scripts that download and execute code. A common pattern:

# Dangerous: privileged download + execute
sudo bash -c 'curl https://example.com/install.sh | bash'

# Safer: review then run
curl https://example.com/install.sh > install.sh
# READ install.sh
sudo bash install.sh

If the model generates the first pattern, replace it with the second. Always.

A real example

Last month I asked Claude to write a script that cleans up Docker images older than 30 days on a CI runner host. The first draft was:

#!/bin/bash

DOCKER_IMAGES=$(docker images --format '{{.ID}} {{.CreatedAt}}')
CUTOFF=$(date -d '30 days ago' +%s)

echo "$DOCKER_IMAGES" | while read ID DATE; do
    CREATED=$(date -d "$DATE" +%s)
    if [ $CREATED -lt $CUTOFF ]; then
        docker rmi $ID
    fi
done

Walking the checklist:

No strict pragma. Missing set -euo pipefail.
Unquoted $DOCKER_IMAGES, $ID, $DATE. Each one is a potential bug.
Failure handling. docker rmi fails if an image is in use. The script continues, marches through, and silently fails on every in-use image. We never know which were cleaned and which weren't.
No secrets (docker doesn't expose them here), but the script also doesn't log what it's doing, so you can't audit afterward.
No sudo, good — assumes the user has Docker socket access, which is reasonable.

The hardened version:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

CUTOFF=$(date -d '30 days ago' +%s)
REMOVED=0
SKIPPED=0

# Use --format with safer parsing
docker images --format '{{.ID}}|{{.CreatedAt}}' | while IFS='|' read -r ID DATE; do
    CREATED=$(date -d "$DATE" +%s)
    if [ "$CREATED" -lt "$CUTOFF" ]; then
        if docker rmi "$ID" 2>/dev/null; then
            echo "Removed: $ID"
            REMOVED=$((REMOVED + 1))
        else
            echo "Skipped (in use): $ID"
            SKIPPED=$((SKIPPED + 1))
        fi
    fi
done

echo "Cleanup complete. Removed: $REMOVED, Skipped: $SKIPPED."

This took two minutes of editing. Without the checklist, I might have run the original and noticed days later that disk usage hadn't really dropped because half the images were in use.

A small note on bash linting

shellcheck catches most of these issues automatically. If you adopt one tool from this article, make it shellcheck:

shellcheck cleanup-images.sh

It will flag unquoted variables, missing strict mode, and a dozen other patterns. AI-generated bash usually has at least one shellcheck warning.

I now run shellcheck on every script before I run the script itself. It's two seconds and catches things I'd miss.

When the AI gets it right

To be fair: the model is often perfectly capable of producing safe bash. If you prompt it explicitly — "write this with set -euo pipefail, quote every variable, fail loudly on errors" — you'll get a clean script.

The problem is that "write me a script that does X" without that prompt gets you the common form of the script, which is the unsafe form. So the rule of thumb:

Always include the safety requirements in the prompt. Or: always treat the output as a draft that needs hardening. Don't run any bash the AI wrote without one of those two disciplines.

The bottom line

Bash from AI is fast to produce and easy to read incorrectly. The checklist is short — strict pragma, quoted expansions, failure paths, secrets in logs, unnecessary privilege — and applying it takes a couple of minutes per script. The downside of skipping it is on the spectrum of "minor cleanup mistake" to "career incident." There's no excuse not to do the check.

For our prompts on bash specifically, see bash-script-code-review and the related linux-server-hardening prompt — both of which cover related territory.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

The Best AI Tools for DevOps Engineers in 2026

James Joyner — Wed, 17 Jun 2026 20:59:44 +0000

If you spend your day in a terminal, a YAML editor, or a Grafana tab — AI assistants in 2026 are no longer a curiosity. They're a real productivity layer. But not every tool is good at infrastructure work. After a year of daily use across Linux administration, OpenStack operations, Prometheus alert authoring, and Kubernetes debugging, here's the honest shortlist.

The criteria

We're not ranking on benchmark scores. We're ranking on infrastructure usefulness:

Reasoning over command output — can it actually read top, kubectl describe, or journalctl and find the real problem?
Safety — does it warn before suggesting destructive commands?
Long context — can it hold a 1,000-line .gitlab-ci.yml plus failing logs without losing track?
Terminal integration — can you use it without leaving your workflow?
Privacy and self-host options — for the engineers whose employers care.

The shortlist

1. Claude (Anthropic)

The current best general assistant for infrastructure reasoning. Long context handles enormous log dumps and Kubernetes manifests in one shot. It is consistently more cautious about destructive commands than alternatives — which matters when you're tired at 2am and tempted to copy-paste straight into prod.

Best for: Linux/OpenStack/Kubernetes troubleshooting, postmortem drafting, code review on infrastructure-as-code.

2. ChatGPT (OpenAI)

The broadest ecosystem. Strong code generation, plug-in support, and the largest community of shared prompts and patterns. For Ansible and Terraform generation, output quality is excellent. Slightly less cautious by default — you'll want to add safety constraints in your prompts.

Best for: Ansible/Terraform generation, ad-hoc scripting, learning new tools.

3. Cursor

If you live in an IDE, Cursor is what your IDE should have been. Native multi-file context, agent mode for repo-wide refactors, and tab-completion that actually understands your codebase. Especially strong for IaC repositories with many interconnected files.

Best for: Editing real codebases (Helm charts, Terraform modules, Python operators).

4. GitHub Copilot

The lowest-friction option. Inline completion just works, and the chat sidebar is genuinely useful for "explain this regex" or "what's this PromQL doing?" If your org already pays for GitHub, Copilot is essentially free upside.

Best for: Inline completion while editing YAML, Bash, Python.

5. Warp Terminal (with AI features)

The only entry on this list that isn't an AI assistant per se — it's a terminal that has AI built in. The killer feature: natural-language command suggestions in your shell, with safety previews. For Linux admins who don't want to alt-tab to a chat window every five seconds.

Best for: Terminal-native workflows where context-switching kills focus.

What we don't recommend (yet)

Generic LLM wrappers that promise "DevOps AI." Most are thin layers over the same APIs above, sometimes with worse safety defaults. Use the underlying tools directly.
Anything that requires uploading your ~/.ssh directory or production credentials. Be skeptical of "AI agents that run commands for you" without a clear sandbox model.

How to combine them

A pattern that works well in practice:

Claude or ChatGPT in a browser for deep diagnosis sessions (paste logs, walk through hypotheses, draft postmortems).
Cursor or Copilot in your editor for actually writing the fix.
Warp in the terminal for quick command lookups without switching context.

You don't need one perfect tool. You need a workflow where each tool plays to its strengths.

Auditing Kubernetes Manifests With AI: A Practical Workflow

James Joyner — Tue, 16 Jun 2026 04:31:15 +0000

A senior K8s engineer I work with audits manifests faster than I read them. He's seen so many patterns that "missing readinessProbe on a Deployment that takes 45 seconds to start" jumps off the page. Most of us don't have that pattern library memorized — and increasingly, we don't need to. AI assistants have read more Kubernetes manifests than any human ever will.

The catch: a generic "review this YAML" prompt produces generic noise. You need to direct the model toward the categories of issues that actually matter in your environment.

The two mistakes everyone makes

Mistake 1: Asking for "a security review." You'll get a bullet list of every possible concern, ranked alphabetically, with no signal about which matter. You'll skim, dismiss, and learn nothing.

Mistake 2: Pasting one manifest. Real Kubernetes problems live in the interaction between resources — a Deployment's readiness probe and a Service's selector, a NetworkPolicy and the actual app traffic. One YAML in isolation hides most of the bugs.

The fix for both is the same: give the model a bounded scope and enough context to reason about interactions.

A workflow that works

Step 1: Pick the audit dimension

Pre-decide what you're checking for. Different prompts for different dimensions:

Resource limits & QoS — are requests/limits set, does QoS match intent, are limits realistic
Probes & lifecycle — readiness, liveness, startup, preStop, terminationGracePeriodSeconds
Security context — runAsNonRoot, capabilities, readOnlyRootFilesystem, seccomp
Network exposure — NetworkPolicy, Service type, Ingress rules
Reliability — PodDisruptionBudget, topology spread, replica count
State & storage — PVC access modes, retention policies, backup tags

Mixing dimensions in one review produces wishy-washy output. Pick one, get a clean answer, move on.

Step 2: Paste the manifest + related context

For a workload review, paste:

The Deployment / StatefulSet / DaemonSet
Its Service(s) and Ingress
Any NetworkPolicies that match its labels
The HPA if relevant
The ConfigMaps and Secrets it references (sanitize first)

For YAML this is usually under 500 lines, well within any model's context window. The model can now reason about interactions, not just isolated fields.

Step 3: Use a directive prompt

The big difference between "tell me about this YAML" and a useful review is the instruction format. Compare:

Review this Kubernetes manifest.

versus:

You are reviewing a production Deployment + Service + NetworkPolicy bundle. For each finding, give: (1) severity (critical/high/medium/low), (2) the exact field path that's wrong, (3) one sentence on why it matters, (4) the corrected YAML snippet. Focus only on probes, lifecycle, and graceful shutdown. Ignore documentation/comments.

The first prompt produces an essay. The second produces a list of fixable issues.

Step 4: Verify before applying

This is where most reviews go wrong. The model is right most of the time. It's wrong some of the time, often in ways that look correct.

Common AI failure modes in K8s review:

Hallucinated field names — spec.template.spec.terminationGracePeriod (it's terminationGracePeriodSeconds)
Outdated API versions — policy/v1beta1 PodDisruptionBudget (removed in 1.25)
Wrong defaults claimed — claiming failureThreshold defaults to 1 when it's 3
Misreading the use case — recommending runAsNonRoot: true for a workload that legitimately needs root

For every "fix" the model suggests, glance at the official K8s docs for that field. This adds 30 seconds per finding and catches the wrong ones. Without this step, you will apply changes that break things.

A real example

Here's a Deployment I reviewed last week:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments
spec:
  replicas: 2
  selector:
    matchLabels: { app: payments }
  template:
    metadata:
      labels: { app: payments }
    spec:
      containers:
      - name: app
        image: registry.example.com/payments:v3.1.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_URL
          value: postgres://payments-db:5432/payments
        resources:
          limits:
            cpu: "2"
            memory: "2Gi"
        readinessProbe:
          httpGet: { path: /healthz, port: 8080 }
          initialDelaySeconds: 5

I asked Claude to review for probes and graceful shutdown only. The findings:

No requests, only limits → pod gets BestEffort QoS, first to be evicted under pressure. Set requests equal to or below limits.
initialDelaySeconds: 5 → Java/Spring apps typically need 30-90 seconds to start. Add startupProbe with longer threshold.
No livenessProbe → kubelet won't restart if the app deadlocks. Mirror readinessProbe with looser thresholds.
No terminationGracePeriodSeconds → defaults to 30s; for a payment service with in-flight requests, this is borderline. Set to 60s.
No preStop hook → SIGTERM hits immediately; load balancers may still send traffic for ~10s after pod marked Terminating. Add sleep 15 preStop.

All five were real, all five were fixable in two minutes of YAML editing. The model didn't tell me about anything irrelevant. That's because I scoped the prompt to "probes and graceful shutdown only."

The big one — #5 — is something I've personally been bitten by twice. The model wouldn't have prioritized it without the directive prompt.

What about Kyverno / OPA / Pod Security Admission?

Yes, you should run those too. They catch consistent issues at admission time. They don't catch issues that require judgment: "is 30 seconds enough graceful shutdown for this specific service?" Policy enforcement is a floor; AI review is a directed second opinion above that floor.

I run both. Kyverno catches "no securityContext at all" before it ever lands. AI review catches "readinessProbe path doesn't match what the app exposes" — something only a human (or an AI imitating one) would notice.

A starter prompt

If you want a template, here's the one I use most:

You are reviewing a Kubernetes workload bundle for production readiness. Focus only on: probes (readiness, liveness, startup), terminationGracePeriodSeconds, preStop hooks, and rolling update strategy. For each finding produce: severity, exact field path, why it matters in one sentence, corrected YAML. Ignore everything else (security context, network policies, resource limits — those are separate reviews). The workload is [serves HTTP at /api on port 8080 / consumes from a queue / batch processor that runs N hours].

The bracketed context at the end is what makes the review accurate for your workload. Without it, the model assumes a generic web service.

For our full prompt library on Kubernetes review, see the Kubernetes & Helm category — especially kubernetes-yaml-security-review and kubernetes-resource-limits-tuning.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

How to Use Claude to Troubleshoot Linux Servers

James Joyner — Sun, 14 Jun 2026 21:15:59 +0000

Claude is genuinely useful for production Linux troubleshooting — when you use it right. Here's the workflow that works, after a year of using it on real incidents across Ubuntu, RHEL, and Rocky.

The mental model: Claude is a senior pair, not an oracle

The mistake most engineers make on day one: they paste a 5-line error message and expect a fix. Claude can do better than that — but only if you give it the same context you'd give a senior engineer joining your incident bridge.

A senior engineer would want:

What OS and version?
What does this server do?
What changed recently?
What's the actual symptom?
What command output have you already gathered?

Give Claude that, and the quality of analysis changes completely.

The workflow

Step 1: Establish context with a system prompt

Use our Linux Server Troubleshooting Prompt as your system prompt, or paraphrase: "You are a senior Linux sysadmin. Rank root-cause hypotheses by probability. Recommend safe diagnostics first. Label destructive commands as DANGEROUS."

Step 2: Paste structured context, not noise

Good:

OS: Ubuntu 22.04, kernel 5.15
Role: production MySQL replica, 64GB RAM, 16 cores
Recent changes: kernel upgrade 6 hours ago
Symptom: server load average 40+, MySQL replication lag growing, queries timing out

$ uptime
 14:22:01 up 6:02,  4 users,  load average: 41.23, 38.51, 35.04

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi        58Gi       1.2Gi       128Mi       3.1Gi       1.8Gi

$ iostat -xz 2 3
[...]

Bad:

my server is slow can you help

Step 3: Let it ask follow-up questions

The good prompts in our library tell Claude to ask for missing data before guessing. When it asks "can you share dmesg | tail -50 and vmstat 1 5?" — that's a feature, not a flaw. Give it the data.

Step 4: Validate suggested commands before running

Claude will sometimes suggest a command with subtly wrong syntax, a destructive flag, or a path that doesn't exist on your distro. Read every suggestion before running. Never paste straight into a root shell.

Step 5: Keep the conversation alive

Claude's long context means you can run a 30-minute diagnostic session in one thread, paste new output as you gather it, and the model retains the full diagnostic context. This is the single biggest workflow win versus older AI tools.

What Claude is good at

Reading command output you don't fully understand (strace, perf, tcpdump summaries).
Drafting awk/sed/grep one-liners for log analysis.
Explaining why a specific kernel parameter or sysctl is set.
Suggesting what to look at next when you're stuck.
Drafting the incident summary after you've fixed it.

What Claude is not good at

Real-time anything — it can't see your live metrics.
Distinguishing between two plausible root causes when both fit the symptoms (it'll guess).
Telling you what's normal for your environment. You have to provide that baseline.

A real-world example

A production server's load average suddenly spiked. Pasting top, iostat -xz 2 3, and dmesg | tail -50 into Claude with our prompt template, it immediately flagged: "%iowait is 78%, await on /dev/sda is 320ms, and dmesg shows 'task X blocked for more than 120 seconds.' The disk subsystem is saturated, not CPU. Investigate which process is doing heavy I/O: iotop -oP -d1 will show the writer in 1-second intervals."

That's exactly the diagnosis we wanted, framed with the evidence — in seconds.

Companion resources

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

How to Choose the Right DevOps as a Service Provider

James Joyner — Sat, 13 Jun 2026 19:26:40 +0000

I've spent 25 years building, breaking, and scaling production infrastructure — long enough to watch "DevOps" go from a conference buzzword to a thing companies now rent by the month. That shift is real, and for a lot of teams it's the right call. But the gap between a great DevOps as a Service provider and a bad one is enormous, and the marketing pages all read the same.

So this is the article I wish more buyers had: what DevOps as a Service actually means, when it beats hiring, and how to tell — before you sign — whether the people you're talking to have ever been on-call at 3am.

What DevOps as a Service actually means

DevOps as a Service (DaaS) is outsourcing the engineering function that builds and runs your delivery pipeline and infrastructure, rather than hiring that function in-house. A provider takes ownership of some or all of: your CI/CD, your cloud environments, your observability, your automation, and the on-call response when something breaks.

It is not a single tool, and it is not a one-time project. A consultancy that drops a Terraform repo and disappears is not DaaS. The "as a Service" part means there's an ongoing operational relationship — someone is responsible for your systems on Tuesday at 2am, not just during the engagement.

Done well, you get the output of a seasoned platform team — Linux fundamentals, Kubernetes, Docker, infrastructure-as-code, pipelines, monitoring — without carrying that whole team on payroll.

Why companies outsource DevOps instead of hiring

Hiring a full in-house DevOps team is the "right" answer that's often the wrong answer in practice. Here's why teams rent it instead.

Cost. A single senior DevOps/SRE hire in a competitive market is expensive — and you need more than one for real on-call coverage. Add recruiting time, ramp-up, benefits, and the risk of a bad hire, and the fully-loaded number gets large fast. A provider amortizes senior talent across clients, so you pay for the expertise without paying for the bench.

Speed to maturity. A good provider has already built the Terraform modules, the GitLab CI templates, the Prometheus alert libraries, the backup runbooks. You're buying an opinionated, battle-tested baseline instead of inventing it. That can compress a year of platform work into weeks.

On-call coverage. Sustainable 24/7 on-call needs roughly six to eight engineers in a healthy rotation. Most companies under a certain size simply cannot staff that without burning people out. Providers spread the rotation across a larger team, so nobody's carrying a pager every single night.

Hard-to-hire seniority. The engineers who can debug a gnarly Kubernetes networking issue, reason about etcd, and also write clean Terraform are rare and they know it. They're hard to attract and harder to retain at a non-tech company. DaaS is often the only realistic way for a mid-sized business to get that caliber of person near its infrastructure.

What's usually included

Scope varies, but a full-spectrum provider should be able to own all of these. When you evaluate one, map their offering against this list and find out exactly where the lines are.

CI/CD — pipeline design, build/test/deploy stages, and crucially, a real rollback path.
Cloud infrastructure — provisioning and managing your environments as code (Terraform or equivalent), with sane network and IAM design.
Monitoring and observability — Prometheus, Grafana, logs, and alert rules that page a human only when a human is actually needed.
Automation — configuration management with Ansible, scripted runbooks, and elimination of manual toil.
Security — secrets management, least-privilege access, patching, and image scanning baked into the pipeline.
Incident response — a defined process, on-call rotation, and blameless postmortems, not just "we'll look at it."
Backups and disaster recovery — and, more importantly, tested restores. A backup you've never restored is a rumor.
Cost optimization — right-sizing, autoscaling, spot/reserved strategy, and killing the zombie resources nobody owns.

Questions to ask before you hire a provider

This is the part that separates the real operators from the slide decks. Don't ask "do you do Kubernetes?" — everyone says yes. Ask for specifics and watch how fast and how concretely they answer.

"Show me your Terraform module structure and how you handle state." Real teams have an opinion about remote state, locking, workspace-vs-directory layout, and blast-radius isolation. Vague answers here mean they're winging your infrastructure.
"Walk me through a real GitLab CI pipeline you run, including the rollback path." A deploy story with no rollback story is half a pipeline. I want to hear how they revert a bad release in minutes, not hours.
"How do you wire Prometheus alert rules to avoid pager fatigue?" The right answer involves symptom-based alerting, for: durations, severity routing, and ruthless deletion of noisy alerts. If every blip pages everyone, nobody responds to the one that matters.
"What does your on-call rotation look like, and what's your real response time?" Get the rotation size, escalation policy, and the SLA in writing. "We're very responsive" is not an SLA.
"How do you manage secrets and access?" Listen for a vault, short-lived credentials, and least privilege — not secrets in environment files or a shared password manager.
"When did you last test a restore from backup, and how long did it take?" The hesitation tells you everything.
"How do you handle configuration drift?" Ansible, immutable images, drift detection — there should be a system, not heroics.
"What happens to our infrastructure if we leave you?" A confident provider hands you clean, documented IaC and walks away gracefully. Lock-in is a choice they make, and you should know it up front.
"Who specifically will be on our account, and what's their production background?" You're buying judgment. Find out whose judgment.

Red flags to avoid

A few patterns that, in my experience, reliably predict pain.

Buzzword density with no specifics. If they can't move from "we leverage cloud-native synergies" to "here's how we structure a Helm chart" in one question, walk.
No rollback story. Anyone can deploy. Operators can un-deploy under pressure.
ClickOps in the cloud console. If they're configuring your production environment by hand instead of in code, you have no reproducibility and no audit trail.
Everything is "automated by AI." AI helps. AI does not own your incident at 2am. A provider hiding thin staffing behind AI claims is a serious risk (more on this below).
Alert noise as a feature. Hundreds of alerts is not observability; it's a team that's trained itself to ignore the dashboard.
No postmortems, or blame-heavy ones. A team that doesn't write honest postmortems isn't learning, and you'll pay for the same outage twice.
They won't show you anything real. Sanitized examples are fine. "We can't show you any of our work" usually means there isn't much to show.
Deep lock-in by design. Proprietary wrappers around standard tools, undocumented infra, contracts that punish leaving — all signs they're protecting revenue, not your uptime.

Why real production experience beats buzzwords

Here's the thing the marketing won't tell you: tools are easy, judgment is hard. Anyone can terraform apply. The value is in the engineer who knows not to apply at 4:55pm on a Friday, who recognizes the failure mode three layers down, who's restored a database under pressure and remembers exactly how it went wrong last time.

That judgment only comes from having run real production systems and felt the consequences. When you evaluate a provider, you're not really buying their Kubernetes skills — those are table stakes. You're buying scar tissue. You want the team that's debugged the keepalived VIP flap, the etcd disk-pressure cascade, the Docker layer that quietly doubled image size and blew out the build cache. Ask for war stories. The good ones light up; the pretenders get vague.

How AI fits — and where it doesn't

I'm bullish on AI in DevOps, and I build with it daily. Used right, it's a genuine force multiplier: it can summarize a wall of logs faster than any human, draft Terraform and Ansible boilerplate, propose PromQL, correlate a timeline of "what changed," and write the first pass of a postmortem. That's real leverage, and a modern provider should be using it.

But there's a hard line, and it's the same one I draw on my own systems: AI reads and reasons; humans run commands. During an active incident, AI proposes a risk-classified, safest-first plan and a human executes every step. The model never touches production. If a provider tells you their AI auto-remediates your prod environment unattended, that's not maturity — that's an outage waiting for a confident-but-wrong suggestion.

The right framing is AI as a very fast, very well-read junior engineer sitting next to a senior who owns the keyboard. It compresses the slow parts of the work without replacing the judgment that keeps you up. If you want to see what that looks like in practice, our AI incident-response workflows and prompt library are built around exactly that human-in-the-loop principle.

So when you evaluate a provider's AI claims, ask the same question you'd ask about any tool: where's the human, and what's the blast radius if the AI is wrong?

How a good provider actually pays for itself

The reason this model works isn't just cheaper labor — it's better outcomes in three places that show up directly on your books.

It saves money. Cost optimization is continuous work most teams never get to: right-sizing nodes, tuning autoscaling, buying reserved capacity, deleting orphaned volumes and idle environments. A provider doing this routinely often saves more on cloud spend than they cost. The infrastructure-as-code discipline also prevents the expensive mistakes — the hand-clicked resource nobody can reproduce, the security group left wide open.

It reduces downtime. Better alerting means you catch degradation before customers do. Tested restores mean a disaster is an inconvenience, not a company-ending event. A defined incident process with real on-call coverage means the response starts in minutes. Downtime is one of the most expensive things a business buys without meaning to, and maturity here directly buys it back.

It speeds up deployments. A solid GitLab CI pipeline with automated testing and a clean rollback path turns deploys from a scary quarterly event into a boring daily one. Teams that deploy confidently ship faster, and shipping faster is usually the whole point. The fastest way to slow down engineering is to make every release terrifying; good DevOps makes it dull.

Where to go from here

Be honest with yourself about where your infrastructure actually stands. Can you deploy and roll back in minutes, or does a release ruin someone's afternoon? Do your alerts mean something, or has your team learned to ignore them? If your primary database died right now, do you know — not hope, know — that you can restore it? Is there a real on-call rotation, or one exhausted person who's secretly the single point of failure?

If those questions made you wince, you're not behind — you're normal. Most teams are running far less maturity than they think, and trying to close that gap by hiring slowly, one expensive senior at a time, while production keeps moving. DevOps as a Service exists precisely so you don't have to win that hiring war before you can move fast.

Take an honest inventory this week. Score yourself on pipelines, observability, incident response, and recovery. Wherever you find a gap that's quietly costing you money, downtime, or velocity, that's where a good provider earns their fee many times over. The teams that move fastest aren't the ones with the most engineers — they're the ones who got serious about maturity before the outage forced the conversation. Decide which kind you want to be, and move while it's still your choice.

Evaluate any provider against your own systems and constraints. The right answer depends on your scale, your risk tolerance, and how much production maturity you already have in-house.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

ChatGPT vs Claude for Infrastructure Engineers

James Joyner — Thu, 11 Jun 2026 15:21:51 +0000

Both ChatGPT and Claude are excellent. But they have different strengths, and infrastructure engineers feel those differences more than most users — because we deal with long logs, multi-file configurations, and operations where being almost right can mean being very wrong.

Here's a side-by-side from a year of daily use on real infrastructure work.

Long-context reasoning over logs and manifests

Winner: Claude.

Claude's long context window means you can paste a 2,000-line kubectl describe pod, the full Deployment manifest, and your last 50 events without losing fidelity. ChatGPT can handle long contexts too, but in practice it's more likely to summarize or "forget" earlier details mid-conversation.

For diagnostic workflows where you keep pasting more output as you gather it, Claude's behavior is meaningfully better.

Safety with destructive commands

Winner: Claude (slightly).

Without explicit prompting, Claude is more likely to flag destructive commands (rm -rf, DROP TABLE, nova reset-state, kubectl delete) with caveats. ChatGPT will too — but is more likely to just hand you the command without extra emphasis.

If you use either tool in production troubleshooting, bake the safety constraints into your prompt (our prompt library does this). Don't rely on default behavior.

Code generation: Ansible, Terraform, Bash, Python

Roughly tied. Different defaults.

ChatGPT tends toward more "modern" Terraform (newer providers, recent syntax) and is slightly faster to produce a working playbook from scratch.
Claude tends toward more cautious, conventional output with better comments and more attention to idempotency.

For infrastructure-as-code review, Claude usually catches more subtle issues. For first-draft generation, ChatGPT is often a hair faster.

PromQL and observability queries

Roughly tied.

Both can write correct PromQL with rate(), histogram_quantile(), and label aggregation. Both occasionally hallucinate metric names if you don't paste your /metrics output. The deciding factor is your prompt quality, not the model.

Postmortem drafting

Winner: Claude.

Claude's prose is consistently more readable, less marketing-flavored, and more naturally blameless. ChatGPT tends to slip into corporate phrasing that engineers find grating ("leveraged our learnings to enhance reliability").

Ecosystem and integrations

Winner: ChatGPT.

Far larger ecosystem of plugins, GPTs, and shared prompts. If you want a tool that integrates with everything else you use, ChatGPT wins.

Pricing

Both are roughly comparable for individual use. Both offer free tiers with rate limits. Teams pricing varies by org needs.

Which should you use?

The honest answer: both, for different tasks.

Claude for diagnostic sessions, postmortems, sensitive prod work, and IaC review.
ChatGPT for fast scaffolding, plugin-heavy workflows, and broad community templates.

If you can only pick one and you do mostly production troubleshooting, pick Claude. If you can only pick one and you do mostly greenfield IaC scaffolding, ChatGPT is fine — your prompt quality matters more than the model.

Companion resources

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

How DevOps Engineers Can Use AI to Triage Production Incidents Faster

James Joyner — Mon, 08 Jun 2026 19:49:29 +0000

The pager goes off at 02:14. Checkout latency is up, error rate is climbing, and you have three dashboards, a wall of logs, and a half-awake brain. The fix, once you know what's wrong, is usually fast. The expensive part is the triage — the first fifteen minutes of "what is actually broken, and what changed?"

That triage window is exactly where AI helps most, and exactly where it's most dangerous if you let it run commands. This is how to use it to go faster without handing it the keys to production.

The rule that makes AI safe during an incident

AI reads and reasons. Humans run commands.

During an active incident you are sleep-deprived and time-pressured — the worst possible state to paste a command you don't fully understand. So draw a hard line: AI is allowed to look at evidence and propose a plan. It is never allowed to execute anything. Every command it suggests goes through your eyes and your hands.

In practice that means you treat the model like a very fast, very well-read junior SRE sitting next to you: it can summarize, correlate, hypothesize, and draft commands — but you're the one with the keyboard, and you read each command before it runs.

If you only take one thing from this article, take that.

Step 1: Turn the firehose into a summary

The first thing AI is genuinely great at is reading more text than you can at 2am. Paste in the raw material and ask for structure, not answers:

The firing alerts (name, severity, labels, duration)
A representative slice of error logs
Recent deploy / change history
The relevant dashboard values (p99 latency, error rate, saturation)

Then prompt it deliberately:

"Here are the alerts, logs, and recent changes for an active production incident. Summarize what's happening in 5 bullets, list the top 3 hypotheses ordered by likelihood, and for each hypothesis give me the single read-only command that would confirm or rule it out. Do not suggest any command that changes state."

That last sentence matters. Left unconstrained, models love to suggest kubectl rollout restart as step one. You want the diagnostics first.

Step 2: Make it order commands by blast radius

A good incident AI prompt forces a risk classification on every suggested command. Ask it to label each one:

safe — pure read-only: kubectl get, journalctl, ss, ip, cat, grep, promtool query
caution — shells in or makes a small change: kubectl exec, docker exec, editing non-prod config
destructive — restarts, deletes, scale-to-zero, firewall changes, migrations, restores

Then it must order them safest-first. You work top-down and you stop the moment you have a diagnosis. The number of incidents that get worse because someone reached for a destructive "fix" before confirming the cause is depressingly high — a forced safest-first ordering is a cheap guardrail against that.

Tip: keep your standard incident prompt in a snippet manager or a prompt library so you're not authoring it at 2am. We keep a set of AI incident-response prompts for exactly this.

Step 3: Correlate "what changed" automatically

Most incidents are caused by a change. The model is good at lining up a timeline if you give it the raw inputs: the alert start time, the last few deploys, config changes, and infra events. Ask:

"The latency spike started at 02:09 UTC. Here is the deploy log and the config-change history for the last 6 hours. What changed closest to 02:09, and what's the mechanism by which it could cause this symptom?"

This is where AI routinely beats a tired human: it doesn't get tunnel vision on the service you think is the problem. It will notice the keepalived VIP change, the connection-pool tweak, or the cert that rotated — the boring change three layers down that you'd have found 20 minutes later.

Step 4: Draft comms while you investigate

Incident comms are a tax you pay in attention you don't have. Hand them to the model:

"Write a status-page update for a degraded-checkout incident, customer-facing, no internal jargon, no root cause speculation, ~3 sentences. Then write a one-line internal update for the incident channel with current severity and what we're checking."

You get a customer update and an internal update in seconds, both in the right register. You skim, adjust a word, post. The investigation never stops to write prose.

Step 5: Let it draft the postmortem from the timeline

When the incident is resolved, the timeline is freshest and you're most likely to actually write it down. Paste the incident-channel scrollback and the command history and ask for a blameless postmortem draft: summary, timeline, root cause, impact, what went well, what to improve, and action items. You're editing a draft instead of facing a blank page — which is the difference between a postmortem that gets written and one that doesn't.

What NOT to do

A few failure modes worth naming:

Don't paste secrets. Scrub tokens, passwords, internal hostnames, and customer data before anything goes into a model. Treat the prompt like a screenshot you might accidentally post in a public channel.
Don't let it invent metrics. If you ask for PromQL and you haven't given it your real metric names, it will confidently make them up. Give it your metric names or tell it to use clearly-marked placeholders.
Don't trust a confident command. "Confident" and "correct" are unrelated in language models. The safest-first ordering exists precisely so a wrong-but-confident suggestion is read-only.
Don't skip the human review for "obvious" fixes. The obvious fix at 2am is how the incident gets a second act.

Where this fits in your workflow

You don't need a platform to start — a saved prompt and a scratch buffer get you most of the value tonight. The structure is what matters: summarize the firehose, hypothesize with read-only confirmations, correlate the timeline, draft the comms, and let the human run every command.

If you want the structured version of this flow — paste your symptoms and logs, get a risk-classified, safest-first plan plus a postmortem draft — that's exactly what we built the AI Incident Response Assistant for. But the technique stands on its own. Steal the prompts, keep the human on the keyboard, and reclaim the first fifteen minutes.

Generated incident plans and commands are assistive, not authoritative. Always verify recommendations against your own systems before running anything in production.

This article was originally published on DevOps AI ToolKit — practical AI workflows for cloud engineers.

DEV Community: DevOps AI ToolKit

Why AI Loves Ansible (And You Should Let It Help)

What makes Ansible AI-friendly

Modules have published contracts

Idempotency is the default

Roles and structure are predictable

Use cases where AI shines for Ansible

Generating new roles from scratch

Converting shell scripts to playbooks

Refactoring playbooks to use FQCN

Writing Molecule tests

Jinja template generation

Where AI struggles with Ansible

Variable precedence

Custom facts and set_fact lifetime

Vault integration

Distro-specific paths

A workflow that's been working for me

A note on safety

Why I think Ansible is the right entry point

AI for GitLab CI Authoring: Save Hours, Avoid Footguns

What AI gets right consistently

Standard job shapes

Translating from other CIs

Reviewing pipelines for inefficiency

What AI gets wrong — and how to catch it

rules: vs only/except confusion

$CI_COMMIT_BRANCH empty on MR pipelines

needs: referencing hidden jobs

Auto-apply rules that don't include the right branches

Imaginary GitLab features

A workflow that catches the failures cheaply

A practical example

The bottom line

Securing AI-Generated Bash Scripts Before You Run Them

The five things to check before running any AI-generated bash

1. Does it start with a strict pragma?

2. Is every variable expansion quoted?

3. What happens if a step fails partway through?

4. Are secrets visible in logs?

5. Does it run as root unnecessarily?

A real example

A small note on bash linting

When the AI gets it right

The bottom line

The Best AI Tools for DevOps Engineers in 2026

The criteria

The shortlist

1. Claude (Anthropic)

2. ChatGPT (OpenAI)

3. Cursor

4. GitHub Copilot

5. Warp Terminal (with AI features)

What we don't recommend (yet)

How to combine them

Further reading

Auditing Kubernetes Manifests With AI: A Practical Workflow

The two mistakes everyone makes

A workflow that works

Step 1: Pick the audit dimension

Step 2: Paste the manifest + related context

Step 3: Use a directive prompt

Step 4: Verify before applying

A real example

What about Kyverno / OPA / Pod Security Admission?

A starter prompt

How to Use Claude to Troubleshoot Linux Servers

The mental model: Claude is a senior pair, not an oracle

The workflow

Step 1: Establish context with a system prompt

Step 2: Paste structured context, not noise

Step 3: Let it ask follow-up questions

Step 4: Validate suggested commands before running

Step 5: Keep the conversation alive

What Claude is good at

What Claude is not good at

A real-world example

Companion resources

How to Choose the Right DevOps as a Service Provider

What DevOps as a Service actually means

Custom facts and `set_fact` lifetime

`rules:` vs `only/except` confusion

`$CI_COMMIT_BRANCH` empty on MR pipelines

`needs:` referencing hidden jobs