close

DEV Community

Cover image for Why I Didn't Use an AST Parser in My AI Code Commenter (And Why That's the Right Call)
Wahaj
Wahaj

Posted on

Why I Didn't Use an AST Parser in My AI Code Commenter (And Why That's the Right Call)

Every AI code documentation tool I looked at had the same problem: it sent your entire file to an LLM and asked for it back with comments added.

That means the model is rewriting your code. Every single line. Including the ones it wasn't supposed to touch.

I built devsplain to solve this. It adds JSDoc and inline comments to 22 programming languages using any LLM you choose — without touching a single line of your executable code. If it can't guarantee that, it aborts.

Here's the architecture decision that made that possible, and why the "obvious" solution would have destroyed the tool.


The Problem With Every Other Tool

Tools like ai-docs and jsdoc-builder solve the safety problem correctly: they use an AST parser. Instead of asking the LLM to reproduce your whole file, they parse your code into a syntax tree, extract just the function signatures, ask the LLM only for comment text, and splice it back in programmatically.

The LLM never sees your logic. It can't corrupt what it can't touch.

There's one catch: AST parsers are language-specific.

  • ai-docs → JavaScript/TypeScript only
  • jsdoc-builder → JavaScript/TypeScript only
  • GenAIScript (Microsoft) → TypeScript only Every single one of them. Because Babel and Acorn only understand JS/TS. If you want Python, you need a Python AST library. If you want Java, you need a Java AST library. If you want to support 22 languages the way devsplain does — JavaScript, TypeScript, Python, Go, Rust, Swift, Kotlin, C, C++, Java, C#, Ruby, PHP, Dart, and more — you'd need tree-sitter with native C++ grammar binaries compiled for every language and every platform.

That turns a zero-dependency CLI that installs in under a second into a native compilation nightmare that fails on half the machines you try it on.

So: use an AST and lose the polyglot story, or don't use an AST and lose the safety guarantee.

Neither was acceptable.


The Third Option: Line-Splicing With Round-Trip Verification

The insight was this: you don't need an AST to guarantee code safety. You need a verification step.

Here's what devsplain does instead:

1. Prepend line numbers to the source before sending it

1: function calculateBackoff(attempt) {
2:   return Math.min(1000 * 2 ** attempt, 30000);
3: }
Enter fullscreen mode Exit fullscreen mode

2. Ask the LLM to return only a JSON array — no source code

[
  { "line": 1, "comment": "// [ds] Calculates exponential backoff with a 30s ceiling" }
]
Enter fullscreen mode Exit fullscreen mode

The LLM acts as a pure inference engine. It reads your code, decides where comments belong and what they should say, and returns structured data. It never touches the source.

3. The CLI does the actual splicing

// Sort descending so earlier insertions don't shift later indices
comments.sort((a, b) => b.line - a.line);
for (const { line, comment } of comments) {
  lines.splice(line - 1, 0, comment);
}
Enter fullscreen mode Exit fullscreen mode

4. Round-trip verification before any write

This is the part that makes the safety claim checkable:

const reconstructed = lines.filter(line => !isDevsplainComment(line));
const original = originalLines;

if (!arraysEqual(reconstructed, original)) {
  console.error("Integrity check failed. Aborting.");
  process.exit(1);
}
Enter fullscreen mode Exit fullscreen mode

Filter out every inserted comment. What remains must be byte-for-byte identical to the original source. If a single line has shifted, been modified, or gone missing — for any reason — the write is aborted.

This verification works for every language without any language-specific logic. It's just array comparison.


The [ds] Tag: Making AI Comments Fully Reversible

Every comment devsplain generates is forcibly prefixed with a [ds] tag:

// [ds] Validates that the input array is non-empty before processing
function processItems(items) {
  // your original comment here — untouched
  if (!items.length) throw new Error("Empty input");
  ...
}
Enter fullscreen mode Exit fullscreen mode

This gives you a local, deterministic scrubber with zero API calls:

devsplain lib/ --clean
Enter fullscreen mode Exit fullscreen mode

The lexer scans for [ds]-prefixed lines and removes only those — your manual comments are completely preserved. No LLM needed, no internet required, no API key. Just a state machine and string matching.

This also means devsplain is fully reversible. Run it on your codebase, decide you don't like the output, run --clean, and you're back to exactly where you started — verified by the same round-trip diff.


String Literal Guardrails

One edge case that breaks naive line-splicing: inserting a comment inside a multi-line string or template literal.

query = """
    SELECT *        ← inserting a comment here would corrupt the string
    FROM users
"""
Enter fullscreen mode Exit fullscreen mode

devsplain tracks lexical state across:

  • Template literals (`)
  • Single and double quoted strings
  • Python triple-quote docstrings (""" and ''') If the LLM targets a line that falls within a string region, the insertion is silently skipped for that line. No crash, no corruption — just a skipped comment with a warning logged.

Git Hook Automation

This is the use case where devsplain actually beats interactive AI editors on speed: batch documentation in CI and git hooks.

devsplain --setup-hook
Enter fullscreen mode Exit fullscreen mode

This installs a two-stage git hook:

  1. Pre-commit: Runs your test suite. Blocks the commit on failure.
  2. Post-commit: Detects modified files, runs devsplain on them, and commits the generated comments automatically as docs: auto-generated comments by devsplain. The post-commit runs with --no-verify to prevent recursive invocation. If the network is down or the API key is missing, it logs a warning and exits cleanly — your original commit is never blocked.

You can bypass it for any commit:

SKIP_DEVSPLAIN=1 git commit -m "my message"
Enter fullscreen mode Exit fullscreen mode

Zero Dependencies in Production

devsplain relies entirely on Node.js built-ins: fs, path, os, readline. No SDK lock-in, no supply chain risk, no native compilation step. The only devDependency is Jest.

This was a deliberate constraint. Adding the OpenAI SDK would have been convenient, but it would have also pulled in a dependency tree and introduced exactly the kind of install friction the tool was designed to avoid. Native fetch handles the HTTP calls. A plain JSON config file in ~/.devsplainrc handles state.


Supported Languages

JavaScript, JSX, TypeScript, TSX, HTML, CSS, SCSS, Vue, Svelte, Python, Java, C, C++, C#, Go, Ruby, PHP, Rust, Swift, Kotlin, Dart, Shell


Try It

# Run without installing
npx devsplain src/index.js --dry-run

# Install globally
npm install -g devsplain

# Document an entire directory
devsplain src/ --full

# Undo everything devsplain added
devsplain src/ --clean
Enter fullscreen mode Exit fullscreen mode

The setup wizard will prompt you for your LLM provider (Groq, Gemini, OpenAI, or any OpenAI-compatible endpoint like Ollama or LMStudio).


The Architecture Decision in One Sentence

AST tools are 100% safe for one language. devsplain uses round-trip diff verification to achieve the same guarantee for every language, without a single native dependency.

That tradeoff is the whole project. If you're working in a polyglot codebase — a TypeScript frontend, Python backend, and Swift iOS app in the same repo — devsplain is currently the only tool that can document all three in a single pass.


GitHub · npm

Top comments (3)

Collapse
 
frank_signorini profile image
Frank

I'm curious about the security implications of not using an AST parser, did you consider any alternative methods for parsing code structure? I'd love to hear more about your approach.

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
mwahaj36 profile image
Wahaj • Edited

yeah so initially i was using strict output constraints in the prompt to force the model into a schema, which worked but the security model was still weak since the LLM was reproducing the full file, one hallucination and your code is silently corrupted. switched to the line-splicing + round-trip diff approach which flips the problem entirely: LLM only returns JSON comment pairs, never touches executable lines, and before anything writes to disk we strip the inserted comments and diff what's left against the original byte-for-byte. If anything doesn't match it just aborts. for AST i did consider it but you literally can't have a single AST parser for 22 languages, you'd need tree-sitter with native C++ binaries for every language which destroys the zero-dependency architecture. so the round-trip diff seems like the right call here, same safety guarantee, no native deps, works for every language