AI Agent Security

Why SQL-Executing AI Agents Need Systematic Prompt Testing, Not Guesswork

A DSPy-driven experiment on Datasette Agent's SQL system prompt shows how ad hoc prompt tuning produces fragile, unpredictable guardrails for agents that touch live data.

PyramidLedger Research2 July 20264 min read

Datasette creator Simon Willison used DSPy, a Stanford NLP framework for programmatically optimizing LLM prompts, to evaluate the system prompt behind Datasette Agent's read-only SQL query feature.
The exercise surfaced a concrete failure mode: an instruction telling the model not to re-check table schemas caused it to guess column names and fall into error-retry loops.
The finding is a reliability bug, not a disclosed vulnerability — but it illustrates why the instructions steering a query-executing agent deserve the same rigorous, measurable testing as any other security control.
Manual, intuition-based prompt tuning does not scale to agents with tool access; systematic evaluation harnesses like DSPy give teams a repeatable way to catch brittle guardrail logic before it reaches production.

Datasette creator Simon Willison ran an experiment using DSPy, an open-source framework from Stanford NLP for programmatically optimizing language model prompts, to evaluate and improve the system prompt behind Datasette Agent's read-only SQL query feature. He fired the task off as an asynchronous research run in Claude Code, using Claude Fable 5 to install the relevant packages, build a DSPy evaluation harness, and probe the agent's behaviour.

What the experiment found

Datasette Agent, hosted at agent.datasette.io, can execute read-only SQL queries against a database to answer natural-language questions. That is a narrow, sandboxed capability — no writes, no schema changes — but the model still has to decide, from its system prompt, how to explore an unfamiliar schema and construct correct queries.

The DSPy harness ran the actual Datasette Agent implementation against a live in-process Datasette instance, scored its behaviour with custom metrics, and worked from an auto-generated gold-standard dataset rather than a handful of hand-picked examples. That process exposed a specific weak point: the system prompt's schema listing gave only table names, while a separate instruction discouraged the model from re-checking table details it supposedly already had. In practice, that combination pushed the model toward guessing column names, which triggered failed queries and retry loops in the traces DSPy captured.

Why this belongs on a security beat

Nothing here is a disclosed vulnerability, a CVE, or an incident — it is a reliability bug in a hobby-scale research tool, caught before it mattered. What makes it worth flagging is the pattern it exposes. Any agent wired to execute queries, call tools, or take action based on a system prompt is only as predictable as that prompt's weakest instruction. A single line meant to save a redundant tool call quietly degraded the model's accuracy and made its failure mode harder to anticipate.

That is the same class of problem security teams care about when they red-team LLM agents: prompt instructions that look reasonable in isolation but interact badly under real usage, opening the door to unreliable outputs, unnecessary tool calls, or — in agents with broader permissions than read-only SQL — genuine data exposure if an attacker can influence what the model believes it already knows.

Agent system prompts are effectively unversioned, untested code paths unless someone builds a harness like this one.
Manual prompt tweaking does not surface interaction effects between instructions the way a scored evaluation set does.
Read-only scoping (as in Datasette Agent's SQL feature) limits blast radius but does not remove the need to test prompt robustness — guessing behaviour and retry loops are themselves a reliability and cost risk.

The takeaway for teams building agents

DSPy is built for optimizing task accuracy, not for adversarial security testing, but the underlying discipline transfers directly: define metrics, generate a representative evaluation set, run the agent against it, and treat the system prompt as something that gets tested and versioned rather than tuned by feel. Teams shipping agents with any query or tool-execution capability — SQL, file access, API calls — should apply the same rigor to guardrail instructions that they apply to authentication or input validation, because an untested prompt is an untested control.

Frequently Asked Questions

What is DSPy?

DSPy is an open-source framework from Stanford NLP for programmatically optimizing language model prompts and pipelines. Instead of manually writing and tweaking prompts, developers define input/output signatures and a scoring metric, and DSPy's optimizer searches for prompt variants that perform better against that metric.

Is the Datasette Agent issue a security vulnerability?

No. It is a reliability bug — a prompt instruction that caused unnecessary column-name guessing and retry loops — surfaced during a systematic evaluation exercise, not a disclosed vulnerability or exploit.

Why does prompt testing matter for AI agent security?

Any agent that executes queries or calls tools behaves according to its system prompt. Untested instructions can interact in ways that degrade reliability or predictability, which is the same category of risk security teams probe for when red-teaming LLM agents, even when the underlying capability is scoped to read-only actions.

Sources

1Using DSPy to evaluate and improve Datasette Agent's SQL system prompts — Simon Willison
2stanfordnlp/dspy: The framework for programming—not prompting—language models — GitHub / Stanford NLP
3DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines — arXiv