The past seven days pushed three quiet shifts forward. AI coding moved from single prompts toward long-running work that keeps its own context. The plumbing under AI agents got a major upgrade as the Model Context Protocol locked in a stateless core. And the hardware story spread out, with new deals and new silicon chipping away at any single vendor owning the whole stack.
None of these arrived as a flashy product launch, and that is the point. The agentic stack is past its demo phase and into its operational one. The work now is about budgets, recovery, statelessness, security, and hardware choice, the unglamorous machinery that decides whether AI holds up under real production load. Here is what mattered between June 17 and June 24, 2026, across coding tools, processing hardware, and the standards that tie agents to the rest of the software world.
AI Coding Tools: Codex Reaches for Long-Running Work
The biggest coding-tool story of the week came from OpenAI, and it was less about a new model than a new way of working. On June 22, OpenAI published a whitepaper by Jason Liu titled Codex-maxxing for long-running work. The argument is direct. Teams are using AI for work that runs past a single prompt, and Codex now functions as a persistent workspace rather than a one-shot assistant.
The whitepaper lays out a handful of concrete practices. The first is the durable thread, a long-running conversation where work keeps accumulating context. Instead of starting fresh each time, a user returns to the same thread and picks up where the project left off. That fits workstreams that span days, like a multi-stage refactor or an ongoing research task. The tradeoff is honest. Longer threads carry more context and cost more to run than short, fresh ones. For work that matters, the continuity earns its keep.
Memory is the second piece. OpenAI describes it as a notebook that holds context for actions. As threads grow long, raw message history stops being enough, so useful project information becomes something a user can open, edit, review, and reuse. A memory system stores project notes, decisions, the people involved, open loops, and task state. The point is that long-running AI work should record what changed and make it visible for human review rather than quietly hoarding vague context.
Steering is the third idea. It means giving Codex direction while it is already working, correcting the approach, adding context, approving the next step, or queuing another action after a tool call. That keeps a person close to the process without forcing them to perform every step by hand. The whitepaper also stresses verifiable goals. A weak goal says "implement the plan." A strong goal gives a clear definition of done, with expected behavior, review criteria, constraints, or tests that must pass. The more testable the goal, the easier it is for Codex to know when work is ready for review.
The last piece is the side panel, where artifacts live and get reviewed while the work is still moving. That can hold Markdown files, spreadsheets, CSVs, PDFs, slides, Jupyter notebooks, and small web apps. Output becomes part of the loop instead of a file attached at the end. Read together, these practices describe a shift in how senior engineers and operators run AI. The work gets somewhere to live, and the human stays in the loop as a reviewer and director rather than a prompt typist.
Codex Ships a Heavy Changelog
The whitepaper landed alongside a busy week of actual product changes. The Codex changelog and release trackers logged a long list of updates through June 23. The most user-visible change addressed a sore point. The /usage command can now show and redeem earned usage-limit reset credits, with confirmation, retry, and refreshed availability states. Developers who had been draining weekly caps on a single long refactor get a real release valve.
Plugin discovery got an overhaul. The /plugins view now organizes remote plugins into OpenAI Curated, Workspace, and Shared-with-me sections, and eligible turns can recommend and install relevant plugins on the fly. Codex also added configurable rollout token budgets that track usage across agent threads, surface remaining-budget reminders, and stop a turn when the budget runs out. That gives teams a guardrail against an agent quietly burning through tokens on a runaway task.
Several changes pointed at multi-agent work. The release notes reference inter-agent messaging primitives and improved progress visibility for subagents, tasks, and worktree creation. Codex also replaced its older skills manager with a dedicated skills service and added repair logic for invalid skill frontmatter, which matters for anyone building a library of reusable agent skills. On the reliability side, the team shipped fixes for long-thread loading, Linux terminal-interface stability, and recovery of execution sessions after a disconnect. Smaller touches included per-host personality settings with Friendly and Pragmatic options and the ability to edit goals directly in the composer.
None of these are headline features on their own. Together they show where agentic coding is going. The work is now about budgets, recovery, multi-agent coordination, and reusable skills, not just better autocomplete. The tools are growing the operational machinery that long-running, semi-autonomous work demands.
Enterprise Adoption and the Billing Backdrop
Adoption news tracked the product push. On June 22, OpenAI noted that Samsung Electronics is bringing ChatGPT and Codex to its employees, a sizable enterprise rollout that puts agentic coding in front of one of the largest electronics workforces in the world. Deals like this matter because they move AI coding from individual developer choice to company-wide standard, which changes how the tools get evaluated and governed.
The billing backdrop stayed contentious. GitHub Copilot moved all plans to usage-based billing with AI Credits on June 1, and the developer reaction kept rippling through June. Under the new model, sticker prices hold as monthly credit allowances rather than spending ceilings, and premium model selections draw from the credit pool. The practical effect is that a single complex agentic session can consume a meaningful slice of a monthly allowance, and running many sessions a day adds real cost on top of the subscription. Cursor adjusted its team pricing on June 1 for similar reasons, trying to make spend more predictable for heavier seats. The throughline is that the expensive part of AI coding is no longer autocomplete. It is agents reading repositories, running tools, and reasoning across long tasks, and the pricing models are racing to reflect that.
Agents Move Into Team Surfaces
A related shift showed up in where these agents run. Coding assistants started in the editor, moved to the terminal, and are now reaching into the places where teams already coordinate. Agents that open pull requests, review changes, and run in continuous-integration pipelines put AI-written work directly where humans merge and ship. Tools that let a team summon an agent inside a chat channel, assign it a task, and get back a tracked piece of work blur the line between asking a colleague and asking an agent.
The trend matters because it changes who touches these tools. When an agent lives in the editor, it serves the individual developer. When it lives in the shared pull-request flow or the team chat, it becomes part of how the whole group works, which raises new questions about review, attribution, and control. Reported adoption figures suggest the internal use is already heavy at the companies building these tools, with large shares of new code at some AI labs now drafted by their own agents under human review. Whatever the exact numbers, the direction is clear. Agentic coding is moving from a personal productivity tool toward a team-level capability, and the operational features shipping this week, budgets, recovery, multi-agent messaging, and reusable skills, are what make that team-level use safe to adopt.
This is also why knowledge work beyond code keeps coming up in the same breath as coding tools. The agentic model that runs a multi-file refactor is the same model that drafts a report, cleans a spreadsheet, or works through a research task across several steps. The companies that built terminal coding agents are extending the same pattern into general computing for non-developers, and the long-running-work practices from the Codex whitepaper apply just as well to an analyst running a multi-day project as to an engineer shipping a feature. The boundary between coding agent and work agent is thinning, and the tooling investments in continuity and context serve both.
One Instruction File to Rule the Agents
A quieter standard kept gaining ground inside the repository itself. The AGENTS.md convention turns a repo into an agent's onboarding guide, holding how to run tests, what style to follow, and where not to touch. OpenAI started it, Google, Cursor, and Sourcegraph joined, and since December 2025 it has sat under the Agentic AI Foundation at the Linux Foundation alongside the Model Context Protocol. Codex, Cursor, Copilot, and Windsurf all read AGENTS.md natively. Convergence stops short of total, since some tools still read their own instruction files, and the direction is clear. A single, portable instruction file lets an agent's behavior travel across tools, which lowers the cost of switching and raises the value of writing good agent guidance once.
The Three-Horse Race Tightens
The week's product moves play out against a market that has stopped being a one-vendor story. Two years ago the AI coding conversation was Copilot and everyone else. That is over. The JetBrains Developer Ecosystem Survey for 2026, run across more than 10,000 developers, put GitHub Copilot at 29 percent, Cursor at 18 percent, and Claude Code at 18 percent. The Stack Overflow Developer Survey told a similar story, with Copilot's share among professional developers sliding from 67 percent to 51 percent over roughly a year. The headline is not that any one tool won. It is that the field tightened into a real contest.
The reason the gap closed is convergence. By mid-2026, the leading agentic coding tools share one basic blueprint. An agent takes a task description, plans the work, edits across many files, runs tools and tests, and iterates until the job is done. The differences that remain are about interface and habit rather than raw capability. Some tools live in the terminal, some build a full editor around the agent, and some weave the agent into an existing platform. On standard coding benchmarks, the top scores now sit within a narrow band of each other, which quietly demoted the model itself as the deciding factor. When every serious tool can route to a strong model, the choice comes down to workflow fit, context reliability, cost, and how well the tool plays with the rest of a team's stack.
That convergence is exactly why the week's Codex updates matter. When models are close, the winners differentiate on operational features: budgets, recovery, multi-agent coordination, reusable skills, and long-running context. The Codex-maxxing whitepaper and the changelog both push on that axis. The same logic explains why portable standards like AGENTS.md and the Model Context Protocol gain so much weight. If a developer can carry their instruction files, their tool connections, and their skills across tools, then lock-in weakens and the tools compete on merit. For teams, the practical move is to treat the choice as reversible, standardize on the portable pieces, and run more than one tool where the workflows differ.
AI Processing: The Compute Stack Spreads Out
The hardware story this week was about diversification. No single vendor is being allowed to own every layer, and the past seven days brought fresh evidence on three fronts: who runs inference, who builds the chips, and who supplies the memory.
Anthropic Looks at Microsoft's Maia Silicon
The most telling inference story was a reported deal in the making. Anthropic is in early-stage talks with Microsoft to run Claude inference workloads on Microsoft's custom Maia 200 AI chips through Azure, with CNBC confirming the discussions. The Maia 200 launched in January 2026 on a TSMC 3-nanometer process, built specifically for inference, and Microsoft claims it delivers more than 30 percent better performance per dollar than rival silicon.
The strategic read is straightforward. Anthropic already spreads its compute across Nvidia GPUs, Amazon's Trainium accelerators, and Google's TPUs. Adding Microsoft's Maia widens that base further and cuts dependence on any one supplier. It also validates Microsoft's homegrown chip program, which has trailed the custom-silicon efforts at Amazon and Google. For the broader market, the signal is that frontier labs increasingly treat compute as a portfolio, matching workloads to whichever accelerator gives the best cost and availability rather than standardizing on one vendor.
Qualcomm Eyes Tenstorrent, SK Hynix Eyes Wall Street
Two more processing stories underscored the spread. Qualcomm is in early talks to acquire the AI-chip startup Tenstorrent for somewhere between 8 and 10 billion dollars. Tenstorrent designs AI chips on the open RISC-V instruction set and counts veteran chip architect Jim Keller among its leadership. A deal puts Qualcomm at a table currently dominated by Nvidia and AMD, and it pushes the open RISC-V architecture deeper into AI accelerators. The bet reflects a wider belief that the AI hardware market is large enough to support credible challengers built on open standards rather than proprietary designs.
On the memory side, SK Hynix said it is seeking to raise roughly 29.4 billion dollars in a United States listing, with trading expected to start on July 10 and the proceeds aimed at building additional capacity. Memory is the unglamorous bottleneck of the AI boom. High-bandwidth memory sits next to every serious AI accelerator, and demand has outrun supply for two years. A raise of this size signals that memory makers expect the appetite for HBM to keep climbing as inference workloads scale, and it puts real money behind expanding the supply that every chip vendor depends on.
The Inference Shift Favors the CPU Again
A structural theme ran under these deals. Agentic inference is changing the balance of compute. The training era leaned on dense clusters of GPUs, with roughly one CPU for every four accelerators. Agentic inference, with its many small steps, tool calls, and orchestration, pulls that ratio back toward one CPU per GPU or tighter, according to analyst Ben Bajarin. That shift gives the CPU renewed importance in the data center and opens room for disaggregated designs that split a workload across different chips for different stages.
Recent rackscale work shows the pattern in practice. Vendors have demonstrated disaggregated inference that uses general-purpose CPUs for orchestration and execution, dataflow accelerators for the decode stage, and GPUs for the prefill stage, all coordinated as one system. The idea is to match each phase of inference to the silicon that runs it most cheaply rather than forcing the whole job onto one expensive chip. As agentic workloads grow, expect more of this phase-by-phase splitting, because it directly attacks the cost of serving long, tool-heavy agent runs at scale.
Open Models Train Off the Beaten Path
Hardware diversity is also showing up in how open models get built. A new open-weight model released under an Apache 2.0 license this month, Zyphra's ZAYA1-8B, was trained from scratch on AMD Instinct hardware. It uses a sparse routing design with 8 billion total parameters and only about 760 million active parameters per token. The detail that matters is the training hardware. A capable model trained end to end on AMD silicon shows that developers are no longer locked into Nvidia-only pipelines for high-throughput training. That widens the field for anyone building models outside the largest labs and adds competitive pressure across the accelerator market.
Nvidia, for its part, kept extending its own model and tooling stack with new open models aimed at physical AI and robotics, alongside edge-focused variants built for low-latency inference on local hardware. The company's strategy is to be present at every layer, from data-center training silicon to edge inference to the open models that run on it. The week's broader lesson is that the compute stack is fragmenting in a healthy way. More vendors, more architectures, and more training paths mean more choices for the teams that build and serve AI.
Why the Memory Story Is the Real Story
It is worth pausing on the SK Hynix listing, because memory sits at the heart of AI economics in a way that gets less attention than the chips themselves. Every serious AI accelerator pairs with high-bandwidth memory, and the speed at which a model reads and writes that memory often decides real-world performance more than raw compute. For two years, demand for high-bandwidth memory has outpaced supply, and the squeeze ripples through the whole market as pricing and availability for finished accelerators.
A raise near 30 billion dollars aimed at new capacity is a strong signal about where the supply side sees demand heading. Inference workloads, and agentic inference in particular, lean hard on memory because long contexts, large key-value caches, and tool-heavy runs all need fast access to a lot of state. As agents run longer and hold more context, the memory footprint per request grows. That is the same trend showing up in the coding tools, where long-running threads carry more context and cost more to run. The hardware and the software are describing the same shift from one angle each. Work is getting longer and more stateful, and the memory layer is where that shift gets paid for.
Disaggregation Targets the Cost of Serving Agents
The move toward disaggregated inference deserves a closer look, because it is a direct response to the economics of agentic work. A single agent run is not one uniform computation. It has a prefill stage that processes the prompt, a decode stage that generates output token by token, and orchestration that coordinates tool calls and steps. Each stage stresses different parts of the hardware. Forcing the whole job onto one expensive accelerator wastes capacity, since the chip that excels at prefill sits idle during decode and the reverse.
Disaggregated designs split those stages across the silicon that runs each one most cheaply. A general-purpose CPU handles orchestration and execution, a dataflow accelerator handles decode, and a GPU handles prefill, all stitched into one system. The payoff is cost. As agentic workloads scale into the millions of runs, shaving the cost of each phase compounds into large savings. This is why the renewed importance of the CPU is more than a talking point. The agentic era rewards systems that route each piece of work to the right chip, and that favors flexible, mixed architectures over monolithic GPU clusters. Expect the next year of data-center design to keep moving in this direction, because the math of serving agents demands it.
What the Hardware Week Adds Up To
Step back from the individual deals and a single picture comes into focus. A frontier lab studying custom silicon from a cloud provider, a mobile chip giant in talks to buy a RISC-V startup, a memory maker raising tens of billions for new capacity, and an open model trained end to end on non-Nvidia hardware are four faces of the same shift. The industry spent the first phase of the AI buildout pouring money into a narrow set of training chips. The next phase spreads that spending across a wider stack, because the workload itself changed. Training a model once is a fixed cost. Serving agents that run for minutes and call many tools is a recurring cost that scales with usage, and recurring costs reward optimization in a way that one-time costs do not.
For data and platform teams, the takeaway is practical rather than abstract. The accelerator a workload runs on is becoming a choice rather than a default, and that choice now has real cost consequences. A team that builds its inference pipeline to assume one vendor's chips bakes in a dependency that the market is actively trying to break. The safer posture is to design for portability at the model-serving layer, keep an eye on price-per-token across providers, and treat the hardware mix as something to revisit each quarter. The companies making these moves are betting that no single chip wins every job. The teams that buy their compute should plan as if that bet pays off, because the early signs say it will.
Standards & Protocols: MCP Locks In a Stateless Core
The standards story of the week was the maturation of the Model Context Protocol, the open standard that connects AI agents to tools, data, and services. A release candidate for the next version, dated 2026-07-28, is the largest revision since the protocol launched, and this week brought fresh community analysis of what it changes and why it matters.
Stateless by Design
The headline change is that MCP becomes stateless at the protocol layer. In earlier versions, a client using the Streamable HTTP transport established a session first, the server returned a session identifier, and the client carried that identifier into later requests. That worked during experimentation and grew painful in production, where remote MCP servers run behind load balancers, gateways, and rate limiters that all had to understand sessions and route traffic with sticky behavior.
The release candidate removes that burden. A request like a tool call can now be self-contained, carrying the protocol version, client info, and capabilities with it. New headers let infrastructure route traffic without inspecting the request body. The result is ordinary HTTP behavior. A remote MCP server that once needed sticky sessions, a shared session store, and deep packet inspection at the gateway can now run behind a plain round-robin load balancer. As the Agentic AI Foundation put it in a post this week on the release, scaling and debugging MCP servers starts to look like scaling and debugging any other web service. That is a practical win, because a protocol that demands special infrastructure too early gets avoided or wrapped in fragile workarounds.
The release also draws a clean line between protocol state and application state. Real workflows still need memory, since a server often needs to know which repository an agent is analyzing or which browser session an automation is driving. The change moves that state out of hidden transport metadata and into explicit handles. A tool returns a handle, and the model passes that handle back in later calls. The state becomes visible, loggable, and safe to move across server instances. That visibility is the kind of design choice that separates systems people can operate from clever demos nobody wants to maintain.
Extensions, Tasks, and Apps
Beyond the stateless core, the release candidate ships an extensions framework that lets new capabilities evolve on their own timeline rather than waiting on the base specification. Extensions are identified by reverse-DNS names, negotiated through a capabilities map on both client and server, and maintained in their own repositories with independent versioning. A formal track moves an extension from experimental to official. This is how the protocol grows without bloating the core for everyone.
Two capabilities ride on that framework. The Tasks extension handles long-running work, letting a server run a background workflow, report progress, and deliver results asynchronously. Early production use surfaced concrete gaps to close, like retry behavior when a task fails and expiry policies for how long results stick around after completion. MCP Apps, which became the first official extension earlier in 2026, extends the protocol beyond text into interactive interfaces. Servers return rich interfaces that render in sandboxed frames inside host applications like Claude and ChatGPT. Together these shift MCP from a pure command-execution protocol toward a collaboration framework, where servers can think, ask questions, run multi-step work, and present real interfaces.
Authorization and Observability Grow Up
The release hardens authorization to align more closely with OAuth and OpenID Connect deployments, which is the top concern for any organization weighing MCP for production. It also formalizes distributed tracing by locking down trace-context propagation, so a trace that starts in a host application can follow a tool call through the client, the server, and whatever the server calls downstream, then show up as one connected view in standard observability tooling. A formal deprecation policy rounds out the release, giving implementers confidence that what they build on this version keeps working as the protocol evolves.
The timeline gives teams room to prepare. The release candidate is available now, the final specification is set for July 28, 2026, and the ten-week window exists for SDK maintainers and client builders to validate the changes against real workloads. The release contains breaking changes, so the advice from maintainers is blunt. If your agentic system depends on MCP servers, test the parts that fail first. Find any hidden session dependency now, before the change forces it into the open. For teams running agents in production, this is the most important infrastructure read of the season.
Security Catches Up to Capability
The push toward stateless transport and tighter authorization is not happening in a vacuum. MCP grants language models access to real systems, and that power has drawn real scrutiny. Security researchers have flagged prompt injection and poisoned tools as live risks, where a malicious tool description or a hostile input steers an agent into leaking data through another connected tool. The protocol's evolving authorization model, which classifies servers as OAuth resource servers and requires clients to scope tokens carefully, exists to shrink that attack surface.
The new release continues that hardening by aligning authorization with established OAuth and OpenID Connect deployments, which is what enterprise security teams already know how to reason about. The lesson of the past year is that an agent protocol cannot treat security as an afterthought. Every new capability, from server-side reasoning to interactive interfaces to long-running tasks, adds surface that an attacker probes. The protocol getting more careful about identity, token scope, and observability is the standard maturing into something a regulated business can actually deploy. For any team running MCP servers that touch sensitive systems, the auth changes in this release are not optional reading.
Enterprise Readiness Is the Next Frontier
The 2026 roadmap for the protocol named enterprise readiness as a top priority, and the threads this week showed why. Companies deploying MCP keep hitting the same set of problems: audit trails, single-sign-on integrated authentication, gateway behavior, and configuration that travels cleanly between environments. These are not glamorous features. They are the difference between a protocol that survives a security review and one that gets blocked at the door.
The maintainers have signaled that most enterprise work lands as extensions rather than changes to the core, which keeps the base protocol light for everyone while letting large deployments add what they need. That design choice is the same instinct behind the stateless core and the extensions framework. Keep the foundation small and stable, and let capabilities grow on their own tracks. The trace-context propagation in this release is a concrete example. By fixing the names of the trace headers, a request that starts in a host application can be followed all the way through the client, the server, and downstream calls, then viewed as one connected trace in standard observability tooling. That kind of end-to-end visibility is exactly what an operations team needs before it trusts agents with production work.
Adoption Keeps Compounding
The release matters because the adoption underneath it is large and still growing. The Python and TypeScript software development kits together draw tens of millions of monthly downloads. Official and community servers cover a long list of services, from Slack, GitHub, and Google to Stripe, Notion, Linear, Sentry, and Figma. The central registry that indexes those servers has grown into the thousands of entries. Coding tools, integrated development environments, and code-intelligence platforms have adopted the protocol to give their agents real-time access to project context.
That scale is why the stateless rework is the right work at the right time. A protocol that thousands of teams depend on cannot keep demanding sticky sessions and special gateway logic. Trading early-stage flexibility for the boring reliability of ordinary web infrastructure is how a standard graduates from promising to permanent. The community spent this week reckoning with that trade, and the direction is healthy. The protocol is being shaped by the people running it in production, which is the surest sign that it has crossed from experiment into infrastructure.
The Agent Interoperability Layer Fills In
MCP does not stand alone. It now sits inside a growing stack of open standards under the Agentic AI Foundation at the Linux Foundation, which Anthropic, Block, and OpenAI co-founded when the protocol was donated in December 2025, with backing from Amazon, Google, Microsoft, Cloudflare, GitHub, and others. MCP handles how an agent reaches tools and data. The Agent2Agent protocol handles how agents talk to each other. The AGENTS.md convention handles how an agent learns a codebase. Each solves a distinct slice of the interoperability problem, and putting them under one neutral foundation reduces the risk that any single vendor steers the standards toward its own products.
The adoption numbers behind MCP keep growing. The Python and TypeScript software development kits together see tens of millions of monthly downloads, official and community servers exist for a long list of services from Slack and GitHub to Stripe and Figma, and the central registry indexing those servers has grown to thousands of entries. The protocol has moved from an interesting experiment to infrastructure that teams treat as a standard feature request when they evaluate AI development tools. The work this week is the protocol catching up to that reality, trading early-stage flexibility for the operability that serious deployments need.
Test-Time Compute Becomes a Product
One model-side development tied the standards and processing themes together. Google launched a new flagship Gemini Pro model with an extended reasoning mode and a 2-million-token context window on June 22, according to reporting. The reasoning mode is the notable part for this section. It lets the model spend significantly more compute on a hard problem before answering, which makes test-time compute a visible product feature rather than a hidden implementation detail. The pattern matters because it changes how developers budget for AI. The cost of a task now depends not just on the model but on how much thinking the model does, which connects directly to the token-budget controls that coding tools like Codex shipped this same week. The industry is converging on a model of work where compute scales with problem difficulty, and the tooling is racing to give developers control over that dial.
The 2-million-token window deserves its own note. A context that large changes what an agent can hold in working memory at once. A coding agent can read an entire mid-sized repository, a research agent can keep a stack of long documents in view, and a data agent can carry a full schema and a run of query history without dropping older context. That capacity feeds straight back into the week's other themes. Bigger contexts need more memory bandwidth, which is the hardware story, and they make long-running agent sessions more capable, which is the coding-tools story. The pieces keep pointing at the same destination. Models that think longer, hold more, and run longer are the direction of travel, and every layer of the stack is being rebuilt to support that pattern at a price teams can sustain.
The Week in One Line
Coding tools learned to run long and keep their own context. The protocol under the agents shed its session baggage and learned to scale like a normal web service. And the silicon underneath all of it spread across more vendors, more architectures, and more training paths. The common thread is maturation. The pieces of the agentic stack are growing the operational machinery, budgets, recovery, statelessness, observability, and hardware choice, that production work actually requires.
Notice how tightly the three stories connect. Long-running agent work needs more context, which needs more memory, which is exactly what the chip and memory deals are racing to supply. That same long-running work needs a protocol that scales cleanly across servers, which is what the stateless MCP core delivers. And the budget controls shipping in coding tools answer the same cost pressure that disaggregated inference attacks at the hardware level. The agentic era has a consistent shape across every layer. Work is getting longer, more stateful, and more autonomous, and each layer of the stack is adapting to carry that weight without falling over or breaking the bank.
What to Watch Next
A few threads are worth following into next week and beyond. The Model Context Protocol release candidate moves toward its final specification on July 28, and the validation window is the time for any team running MCP servers to test for hidden session dependencies before the breaking changes land. Watch for software development kits to ship support and for early reports on how the stateless transport behaves under real load.
On the hardware side, watch whether the Anthropic and Microsoft inference talks firm up into a deal, since a frontier lab running on Microsoft's custom silicon marks a real validation of the homegrown-chip strategy. Watch the Qualcomm and Tenstorrent discussions for the same reason, as a sign of how much room open RISC-V designs have in AI. The SK Hynix listing on July 10 gives a fresh read on how the supply side sees memory demand.
In coding, the question is whether the operational features, long-running threads, budgets, multi-agent coordination, and reusable skills, become the new battleground now that raw model quality has converged. The tools that make long, team-level agentic work safe and predictable have the edge. For developers and the teams that run them, the smartest move this season is to invest in the portable pieces, write good agent instructions once, standardize on open protocols, and keep the tool choice itself flexible while the field keeps moving.
Resources to Go Further
The AI landscape changes fast. Here are tools and resources to help you keep pace.
Try Dremio Free. Experience agentic analytics and an Apache Iceberg-powered lakehouse. Start your free trial
Learn Agentic AI with Data. Dremio's agentic analytics features let your AI agents query and act on live data. Explore Dremio Agentic AI
Join the Community. Connect with data engineers and AI practitioners building on open standards. Join the Dremio Developer Community
Book: The 2026 Guide to AI-Assisted Development. Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. Get it on Amazon
Book: Using AI Agents for Data Engineering and Data Analysis. A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. Get it on Amazon
Top comments (2)
The stateless MCP change is the most underrated thing in this post. Sticky sessions behind a load balancer is exactly as painful as it sounds. Making it behave like a normal web service sounds boring that's the point.
AI is evolving quickly, and articles like this make complex concepts much easier to understand.