With the launch of GLM 5.2 this week, I see everyone asking "have open models caught up to closed models?"
The more interesting question that's getting missed: what can you do with an open model that you can't do with a closed one?
You can specialize them. And when you do, the
What is AI inference engineering, why is it such an in-demand skill, and how do you break into the field?
With author of Inference Engineering @philipkiely and head of training at Baseten @oneill_c
0:00: What is inference?
2:47: History of inference
4:59: Downstream effects
"Frontier models for the hardest general intelligence and post-trained open source for high-volume and specialized workloads... Many specialized models, serving many specialized workflows, inside many specialized products."
Thank you, Apoorv, for taking the time to write about
You can now access our GLM-5.2 API through the Merge Gateway!
GLM-5.2 matches frontier model intelligence while running 4x+ faster and at 1/5th the cost.
Try it out: merge.dev/gateway
"That's when they come to open-source models, that's when they come to Baseten, that's when they come to post-train models on Baseten, to be able to do it better, faster, and cheaper. That's when you get both intelligence everywhere and unit economics that make sense for your
Thanks to @EdLudlow for having us on Bloomberg Tech yesterday to talk about our latest fundraise and the growing number of companies owning their open and specialized models.
Excited to be a day 0 launch partner for BioNeMo, NVIDIA's new, fully-open agent toolkit for scientific workflows!
All 10 BioNeMo NIMs are available in our model library. Learn more in our announcement: baseten.co/blog/nvidia-bi…
Science is entering a new era - one where AI agents can do scientific work.
🧬 Today NVIDIA is launching the BioNeMo Agent Toolkit - an open, agent-ready toolkit that gives any AI agent callable tools for protein structure prediction, molecular docking, generative chemistry,
Tutorial on how to use GLM-5.2 in Claude Code (bookmark this)
~4.5x faster & ~5x cheaper compared to Opus 4.8!
1. Install the latest Claude Code
npm install -g @Anthropic-ai/claude-code
2. Create an account at baseten.co.
3. Grab an API Key from
We have the fastest GLM-5.2 deployment on the market: >280 tok/s and <0.8s ttft, according to Artificial Analysis. This same performance carries across all post-trained variants.
These aren’t vanity metrics. Optimizations like these save our customers tens of millions of dollars
We closed our Series F today at a $13B valuation.
Our inference business grew 20x in the last year. I want to explain why:
The growth comes from a shift I think is permanent: companies want to own their intelligence layer. Instead of relying exclusively on closed models, teams
The GLM moment is going to be bigger than the DeepSeek moment.
Baseten has the fastest inference on the best open-weight model. >280 tps and <0.8 ttft.