← Catalog

"Constrain AI code generation to preserve or resolve boundary conditions — never expand them. Controls what is representable in the generation context rather than relying on instructions. The prompt defines the open interior, seeds shape continuation, types enforce closure, and delta-epsilon characterizes the generation boundary. Respects the domain partial order: generation context must reflect which domains are closed and which boundary conditions remain open."

MIT Source

Boundary: Generation Control

Constrain AI code generation to produce output that preserves or resolves d(S) — never expands it.

The Problem

LLMs generate code by sampling from a distribution shaped by context. That distribution does not respect semantic domain boundaries. The model may emit padding: 16px instead of padding: var(--spacing-md) — not because it can’t use the token, but because both are equally representable in its output space.

Natural language instructions cannot fix this. “Use design tokens” is imprecise and unenforceable. Each failure reopens a resolved boundary condition.

The constraint: No a priori requirements on humans that they can’t reliably satisfy. No reliance on prose instructions to LLMs.

The principle: Control what is representable in the generation context, not what the model is instructed to do.

Framework: Generation as d(S) Management

The generation context is itself a domain with boundary conditions. What the model can see determines what it can produce:

Prompt (intent)       →  identifies which d(S) the model should resolve
Seed (existing code)  →  shapes continuation toward closed patterns
Schema (types)        →  constrains output to the closed state space
Context boundary      →  which domains are visible determines where drift can occur

Where the context has unresolved boundary conditions, the model fills from its training distribution — which does not respect your domain boundaries. Generation drift occurs precisely at unresolved boundary conditions in the context.

Noise, Confidence, and Reproducibility

Generation is stochastic. The same context produces different outputs. The noise model (see boundary-noise-model) provides three tools:

Noise floor (η): Expected syntactic variance between two in-domain generations. Not all differences are drift — filter noise before judging.
Domain confidence (κ): P(output stays in-domain | context). Determines whether to generate, tighten context, or close prerequisites first.
Reproducibility equivalence (≈): Two outputs are equivalent if they resolve the same d(S) components regardless of syntactic differences. “Same result” means same boundary effect, not same tokens.

Context closure directly controls κ. More closed domains in context → higher κ → more reliable generation. This is the actionable link: improving context closure improves generation confidence measurably.

Respecting the Partial Order

The generation context must reflect the domain dependency structure:

Closed prerequisite domains → include their seeds and types (model preserves closure)
The target open boundary → the d(S) the model should resolve (this is the work)
Other open boundaries → exclude or minimize (these are drift vectors)

If prerequisite domains aren’t closed, generation for dependent domains is unreliable — the model has no stable foundation to build on.

When to Use This Skill

Structuring a codebase so LLMs generate consistent code by default
Choosing what context to include/exclude for a generation task
Reviewing LLM output for boundary violations
Designing projects where code generation is a regular workflow
Diagnosing recurring drift patterns in generated code

Generation Control Strategies

Strategy 1: Seed with references, never literals

The model’s continuation distribution is shaped by what it sees. Include files that demonstrate reference patterns. Exclude files containing raw literals for seeded values.

Context file selection is a d(S) management decision — you are resolving the generation context’s boundary conditions before the model runs.

Strategy 2: Type-constrain the output shape

Provide type definitions, interfaces, and schemas. The narrower the type, the smaller the boundary condition on the output space. Types are structural (part of the representation); instructions are semantic (require interpretation). The model conforms to visible types far more reliably.

Strategy 3: Reflect the partial order in context

Structure context to show:

Which domains are closed (include seeds, types, references)
Which boundary condition the model should resolve (the target d(S))
What the closed domains establish (the foundation the model builds on)

Exclude:

Files from unrelated domains (reduces cross-domain drift)
Raw values that should only be accessed via seeds
Patterns you don’t want (negative examples expand the representable set)

Strategy 4: Estimate confidence before generating

Before generating, estimate domain confidence κ by assessing context:

Context Property	κ Effect
All prerequisite domains closed, seeds visible	↑ high
Narrow type constraints visible	↑ high
Literals present alongside references	↓ low
Open prerequisite domains	↓ unreliable
Large target d(S)	↓ moderate

κ Level	Action
High (>0.95)	Generate and verify once
Moderate (0.8–0.95)	Generate, verify, may retry once
Low (0.5–0.8)	Tighten context before generating
Unreliable (≤0.5)	Do not generate — close prerequisite domains first

Strategy 5: Post-generation boundary verification

After generation, apply boundary-drift-detection with noise filtering:

Filter noise-floor variations (naming, formatting, equivalent expressions)
Did the remaining signal stay in the intended domain?
Did it use seeds/references or introduce new literals?
Does it type-check against the domain’s closure?
Did any resolved boundary conditions reopen?
Are prerequisite domains still intact?

For reproducibility: if you need consistent results, verify that multiple generations are reproducibility-equivalent — same d(S) effect, not same tokens. See boundary-noise-model.

Strategy 6: Iterative context tightening

If drift is detected:

Identify which boundary condition in the context the model exploited
Tighten the seed (add more reference examples)
Tighten the schema (narrow the types)
Reduce context scope (exclude cross-domain files)
Regenerate and re-verify

Each iteration resolves a context boundary condition that permitted drift, increasing κ for the next attempt. If κ remains low after tightening, the prerequisite domains may not be sufficiently closed — step back in the partial order.

The Generation Boundary Model

d(S) of context = ∅ for closed domains  →  model preserves closure
d(S) of target  ≠ ∅                     →  model works to resolve it
d(S) of other   ≠ ∅                     →  drift vectors — minimize these

Generation quality ∝ (closed context domains) / (total visible domains)

The ideal: all context domains are closed except the one the model is working on. The model can only drift through boundary conditions that exist in the context.

Guidelines

Never rely on prose instructions alone. They are semantic, not structural — the model interprets them probabilistically. Structure the context instead.
The best generation control is invisible. A well-seeded, type-closed codebase naturally constrains generation. Invest in closure; generation control follows.
Control the context, not the model. You cannot change the model’s weights. You can change what it sees. The generation boundary is a function of the input representation.
Respect the partial order. Don’t ask the model to generate in D5 when D2 is open. The model needs closed prerequisites to produce reliable output.
d(S) ≠ ∅ in the target is expected. That’s the boundary condition the model is resolving. The problem is when d(S) ≠ ∅ in other domains — those are drift vectors.
Estimate κ before generating. If confidence is low, tighten context or close prerequisites rather than generating and hoping. Prevention is cheaper than detection.
Reproducibility means same d(S) effect, not same tokens. Two syntactically different outputs that resolve the same boundary conditions are equivalent. Don’t chase token-level determinism — it’s not achievable and not necessary.
Measure drift, don’t assume compliance. Always verify with boundary-drift-detection, with noise filtering from boundary-noise-model.
The full pipeline: enumerate boundaries → encode seeds → close domains (in order) → estimate confidence → control generation context → detect drift (with noise filtering). Each skill reinforces the others.

boundary-generation-control