Structured Outputs: The Building Blocks of Reliable AI

7 min readApr 9, 2025

“The real reason that you use structure generation is because you can performantly generate the same thing every time. And what that means is you can build giant programs… You could build an automated hedge fund… Every single one of those agents can be specialized to their particular task. You know, they’re always gonna pass back JSON. You can actually build very large programs with this… And that’s ultimately the thing that you get from structured generation is extreme reliability.” — Cameron Pfiffer

Weaviate Podcast #119 with Will Kurt and Cameron Pfiffer from dottxt.ai! Hosted by Connor Shorten

Bridging the Gap Between Raw AI Output and Structured Data

In a landscape flooded with AI tools promising to revolutionize workflows, the open-source library Outlines stands apart by solving a fundamental challenge: ensuring language models produce outputs in precisely the format you need. This seemingly simple capability unlocks entirely new applications while making existing ones more reliable.

The latest Weaviate Podcast features Will Kurt and Cameron Pfiffer from dottxt.ai, the creators of Outlines, one of the world’s most popular open-source libraries for structured outputs with constrained decoding. Their approach to controlling token sampling at the logit level has fundamentally changed how developers can implement LLMs in production environments, enabling everything from reliable JSON generation to zero-shot classifiers and multi-task inference patterns that were previously unstable or impossible.

This conversation is particularly relevant to Weaviate users working with vector databases, as structured outputs provide the reliability needed when connecting AI-generated content to database operations, a critical requirement for enterprise applications.

The Journey to dottxt.ai: Bayesian Roots and Structured Thinking

Both Will Kurt and Cameron Pfeiffer bring intriguing backgrounds to their work at dottxt.ai. Will, a published author on Bayesian statistics, first encountered the structured generation concept through a paper by Normal Computing founders. Initially impressed by the approach’s elegance, he later joined dottxt.ai as employee number two after the founders established the company.

Cameron, with a background in probabilistic programming languages, discovered Outlines through Twitter. After completing his postdoc, he joined the team, bringing his economics expertise into a space where statistical precision meets practical application.

Their shared Bayesian background proves surprisingly relevant to structured generation. The probabilistic foundations of constrained decoding align well with Bayesian principles of managing uncertainty within defined boundaries.

Structured Outputs for Beginners: Beyond Just JSON

For developers new to the concept, structured outputs provide guarantees about the format an LLM will produce. While JSON is the most obvious application, the team emphasizes that structure exists in virtually all communication formats:

“Structure isn’t just JSON. It’s really everything has some structure. Like when you write an email, there’s a format to that you have. When you tweet, there’s a format to that. LinkedIn posts have a format. Even just things like zero-shot classifiers, people often forget,” Will explains.

The core innovation is controlling which logits (token probabilities) are allowed during the inference process based on a defined structure. This approach unlocks entirely new applications:

  • Knowledge graph construction with predefined ontologies
  • Data annotation at scale with guaranteed output formats
  • Information extraction with consistent formatting
  • Function calling capabilities for smaller models

Particularly noteworthy is the ability to implement function calling even with smaller open models that don’t natively support it — a significant advantage for developers working with resource constraints or open-source tools.

Metadata Extraction: Turning Unstructured Documents into Structured Data

One of the most powerful applications of structured outputs is extracting consistent information from unstructured documents. Consider parsing SEC 10K documents or extracting phone numbers from data with inconsistent formatting:

“Even for information extraction, imagine you wanna get phone numbers out of a dataset, but your users have the phone numbers in all kinds of different formats, but you want them in a consistent output… When you think, okay, well, how many different formats do people write phone numbers in? Now, if that’s your task, you’re gonna start going, oh, this is a pain. I have to do all of these fuzzy matches. But LLMs are good at pulling it out,” Will notes.

With structured generation, not only can you guarantee you get just a phone number, but you can also specify the exact format you want it returned in. This capability extends to complex document processing, where predefined schemas guide extraction to ensure consistency across thousands of documents.

For businesses using Weaviate, this means confidently extracting and storing structured information from documents without worrying about format inconsistencies breaking downstream processes.

Structured Reasoning: Controlling How Models Think

Perhaps the most surprising discovery is that structured outputs can actually improve model reasoning capabilities rather than constrain them. This counterintuitive finding emerged from extensive testing:

“In my first pass at this, I was like, oh, like it’s close, but it’s not quite performing as well as the baseline, which is not structured. So I was like, that’s a bummer. But then I realized I had a very tight constraint on how many characters the model could think for… So by simply changing how many characters it could think for, the performance like exceeded what was the benchmark.”

This allows developers to define what the reasoning process looks like, including how many characters it can use — effectively giving fine-grained control over the thinking process itself. The team has conducted fascinating experiments, including:

  • Removing specific characters (like “R”) to see how models adapt their reasoning
  • Restricting mathematical operators to test problem-solving flexibility
  • Enforcing specific formats within “thinking” blocks

These capabilities allow developers to create more predictable reasoning patterns while actually improving performance on tasks like GSM-8K mathematical problem-solving.

Report Generation: From Structured Data to Rich Presentations

Taking structured outputs to their natural conclusion, the team demonstrated how structured generation can produce complete reports that render directly to different formats. Cameron described a holiday gift recommendation project:

“What I built was a super simple web app… you type in some information about the person. And then it’ll pass it to the model and the model has a very specific output that fills out a report… This is my understanding of this person. This is like a style of gifts that we might wanna provide and then a list of gifts.”

The structured output from the model contained HTML divs and formatting tags that rendered directly into a complete webpage — demonstrating how structured generation can eliminate traditional backend requirements for simple applications.

Will expanded on this potential: “There’s a whole nother layer where like as a developer, how, I think we’ve all written like the, I’ll have to regenerate a PDF at the end of this. And it’s annoying… what if the model could just reliably literally output to file name.pdf and it’s a valid PDF document.”

This suggests future capabilities where models could generate:

  • Valid PDF documents
  • PowerPoint presentations
  • Complete application interfaces
  • Structured database inputs

For Weaviate users, this represents an opportunity to generate complete visualization layers directly from query results without intermediate processing steps.

Multi-Task Inference: Getting More from a Single Call

Counter to traditional software engineering principles that favor breaking problems into smaller components, the team discovered that models often perform better when handling multiple related tasks simultaneously:

“When you do this, you’re forcing the model to start contextualizing itself. So the farther it gets, as it prints out the actual JSON, can look at the stuff that it’s already said, and it can kind of reinforce it in the same way that chain of thought works.”

This approach produces better results while reducing costs by eliminating multiple API calls and repetitive context loading. Tests with document processing showed improved performance when extracting multiple fields in a single inference compared to separate, focused extractions.

For Weaviate users orchestrating complex queries, this insight suggests opportunities to combine related operations into cohesive requests that leverage this contextual advantage.

Hidden Gems: The Technical Magic Behind Outlines

The technical implementation of Outlines hinges on formal language theory — specifically, deterministic finite automata that map regular expressions to state machines:

“Everything is driven by a regular expression under the hood… any regular expression can be mapped to like a finite state machine… once you have this finite state machine, it’s actually very fast to keep track of where you are as a regular expression goes and which are allowable tokens from that state.”

This implementation provides several surprising benefits:

  1. Minimal Inference Cost: The computation overhead is negligible — “microseconds” according to Will — making it practical even for high-performance production environments.
  2. Potential Performance Gains: The team has identified opportunities for “coalescence” where known structural elements can be fast-forwarded through, potentially offering 2–3x speed improvements for highly structured outputs.
  3. Integration Flexibility: Outlines is being integrated across the ecosystem, including VLLM, TGI (Hugging Face’s Text Generation Interface), and NVIDIA’s Inference Microservices.

Particularly intriguing is the emerging research around token selection strategies. Early findings suggest models may be biased toward smaller tokens when larger tokens might produce better outcomes — potentially unlocking performance gains without additional training.

Practical Takeaways: Implementing Structured Outputs

For Beginners

Start with simple, well-defined structures like JSON outputs or classification tasks. Ensure your prompt naturally resembles the structured output you’re seeking, and include examples of the expected format directly in your prompt.

For Intermediate Users

Experiment with multi-task inference by combining related extractions into a single structured output. Consider how reasoning steps can be incorporated into your structure definition to improve performance on complex tasks.

For Advanced Users

Explore dynamic structure generation, where code creates structure definitions on the fly based on function definitions or schema information. For reasoning-intensive tasks, carefully tune the balance between structure and flexibility in thinking spaces.

Where AI Structure Is Heading

The conversation reveals a vision of AI applications that goes beyond today’s relatively simple implementations. As Will notes: “When you think about the way code bases work in large tech companies now… the size of the code and the need to understand it is actually a blocker in sort of efficiently building larger and larger systems.”

Structured outputs enable the reliable composition of AI components into Compound AI Systems, potentially orchestrating thousands of specialized agents working together on complex tasks. This approach could fundamentally change how we build software, allowing for complexity beyond what human developers could directly manage.

Weaviate Podcast #119

Available on YouTube and Spotify:

YouTube: https://www.youtube.com/watch?v=3PdEYG6OusA

Spotify: https://creators.spotify.com/pod/show/weaviate/episodes/Structured-Outputs-with-Will-Kurt-and-Cameron-Pfiffer---Weaviate-Podcast-119-e31apoq

--

--

No responses yet