Agent Design Patterns: What Makes a Good AI Agent

TL;DR: I’ve built 28 agents across infrastructure, development, content, and research domains. Along the way, I discovered that the difference between a mediocre agent and a great one comes down to a few key patterns. This post breaks down what actually works.

Why This Matters

You can give an AI a vague system prompt like “You are a helpful coding assistant” and it’ll work… kind of. It’ll be inconsistent, sometimes overstep, sometimes under-deliver, and you’ll spend more time correcting it than it saves you.

Or you can invest time in proper agent design and get something that feels like a reliable team member.

After building agents for storage management, security auditing, code review, fact-checking, and more, I’ve landed on a pattern that works. Here’s the breakdown.

The Golden Standard Structure

Every good agent I’ve built follows this structure:

1. Frontmatter (metadata)
2. One-line description
3. Role definition
4. Critical Rules
5. Detailed Workflow
6. Output Templates
7. What You CAN Do
8. What You CANNOT Do
9. Communication Style
10. Integration Notes

Let me explain why each section matters.

1. Start with Identity, Not Instructions

Bad:

You are an AI assistant that helps with security.

Good:

# Security Auditor

Expert auditor of infrastructure security: firewall configuration, 
system hardening, SSH security, user accounts, and file permissions. 
Analyzes against NIST and CIS benchmarks.

**This agent is READ-ONLY** - it analyzes and reports only, never executes changes.

The difference? Specificity and constraints. The good version tells the agent:

What it’s an expert in (not everything)
What frameworks it references (NIST, CIS)
Its operational mode (read-only)

This immediately narrows scope and sets expectations.

2. Critical Rules: The Non-Negotiables

Every agent needs explicit rules it cannot break. I wrap these in a <critical_rules> block to make them stand out:

<critical_rules>
- ALWAYS read ALL relevant config files before assessment
- NEVER execute any commands that modify system state
- ALWAYS categorize findings by severity (CRITICAL/HIGH/MEDIUM/LOW)
- ALWAYS provide specific file paths and line numbers for issues
- ALWAYS include remediation steps for each finding
</critical_rules>

Notice the pattern: every rule starts with ALWAYS or NEVER. No ambiguity.

Without these, agents will:

Skip verification steps when they “feel confident”
Make changes when they should only report
Give vague answers when you need specifics

3. Workflow: Show the Process

Don’t just tell an agent what to do—show it how to think through problems:

## Audit Workflow

### 1. GATHER DATA (Read-Only)
[specific commands to run]

### 2. ANALYZE
[what to check against]

### 3. ASSESS RISKS
For each finding, document:
- **What**: The specific issue
- **Where**: File path and line number
- **Impact**: What could happen if exploited
- **Likelihood**: How probable is exploitation
- **Severity**: CRITICAL/HIGH/MEDIUM/LOW

### 4. RECOMMEND
[what to include in recommendations]

### 5. REPORT
[output format]

This is like giving someone a checklist. They’re less likely to skip steps or invent their own (worse) process.

4. Output Templates: Be Prescriptive

One of the biggest improvements I made was adding actual output templates. Instead of hoping the agent formats things well, I show exactly what I want:

┌─────────────────────────────────────────────────────┐
│ VM/Container: [name] (ID: [id])                     │
├─────────────────────────────────────────────────────┤
│ Status: Running/Stopped                             │
│ Uptime: X days Y hours                              │
│ Health: 🟢/🟡/🔴                                    │
├─────────────────────────────────────────────────────┤
│ CPU:    ██████░░░░ 60% (2/4 cores)                 │
│ Memory: ████████░░ 80% (3.2/4 GB) ⚠️               │
│ Disk:   █████░░░░░ 50% (25/50 GB)                  │
└─────────────────────────────────────────────────────┘

When the agent sees this template, it produces consistent, scannable output every time.

5. The CAN/CANNOT Pattern

This is crucial for preventing scope creep and hallucination.

## What You CAN Do
- Read all configuration files
- Analyze security posture
- Assess risks and severity
- Provide specific recommendations
- Reference compliance standards

## What You CANNOT Do
- Modify any configurations
- Execute remediation commands
- Restart services
- Change user accounts
- Apply fixes (reporting only)

Anthropic’s research confirms this: explicit capability boundaries make agents more reliable. They’re less likely to overpromise or attempt things outside their scope.

The key insight: be specific. “Don’t do bad things” is useless. “Never execute commands that modify system state” is actionable.

6. Communication Style: Good vs Bad Examples

Agents learn from examples. I include both good and bad:

Good (specific and actionable):

CRITICAL: SSH allows password authentication from any source. Location: /etc/ssh/sshd_config line 58 Risk: Attackers can brute-force passwords. Your server likely receives thousands of attempts daily. Fix: Set ‘PasswordAuthentication no’ and restart sshd. Verify: sshd -T | grep passwordauthentication

Bad (vague):

SSH could be more secure.

This single pattern eliminated most of my “the agent gave a useless response” moments. When it knows what good looks like, it produces good.

7. Integration Notes: Playing Well with Others

Agents don’t work in isolation. I document how each agent complements others:

## Integration Notes

This agent works well with:
- **VM Monitor**: For correlating security with resource usage
- **Storage Manager**: For permission and encryption checks
- **DevOps Helper**: For secure deployment configurations

This serves two purposes:

Helps me remember which agents to use together
Gives the agent context about the broader system

Patterns That Emerged

After building 28 agents, some meta-patterns became clear:

Specialization Beats Generalization

A “Code Review Agent” that tries to do security, performance, style, and architecture review does all of them poorly.

Instead, I have:

Security Scanner (OWASP focus)
Performance Reviewer
Code Smell Detector
API Design Reviewer

Each one goes deep on its domain.

Tables for Reference Data

Whenever there’s categorical information, I use tables:

| Severity | Definition | Response Timeline |
|----------|------------|-------------------|
| CRITICAL | Active exploitation risk | Fix within 24 hours |
| HIGH | Significant weakness | Fix within 1 week |
| MEDIUM | Best practice deviation | Fix within 1 month |
| LOW | Minor improvement | Next maintenance |

Agents reference these consistently, which means consistent output.

Checklists for Verification

Instead of trusting the agent to remember everything:

**SSH Hardening Checklist:**
- [ ] Protocol 2 only
- [ ] PermitRootLogin no
- [ ] PasswordAuthentication no
- [ ] PubkeyAuthentication yes
- [ ] PermitEmptyPasswords no

The agent works through each item systematically.

The Minimum Viable Agent

If you’re starting out, here’s the minimum structure that works:

# [Agent Name]

[One sentence: what this agent does and its key constraint]

## Role

You are an expert in [specific domain] specializing in:
- [Specialty 1]
- [Specialty 2]
- [Specialty 3]

## Critical Rules

<critical_rules>
- ALWAYS [key behavior]
- NEVER [key constraint]
- ALWAYS [verification step]
</critical_rules>

## What You CAN Do
- [Capability 1]
- [Capability 2]

## What You CANNOT Do
- [Constraint 1]
- [Constraint 2]

## Communication Style

**Good:** [Example of ideal output]

**Bad:** [Example of what to avoid]

That’s maybe 50 lines. Start there, then expand based on what breaks.

What I Learned

Invest upfront, save later: A well-designed agent takes hours to build but saves days of frustration
Explicit beats implicit: If you want specific behavior, specify it. Models don’t read minds.
Templates > instructions: Showing the output format you want works better than describing it
Constraints prevent disasters: The CANNOT section is as important as the CAN section
Test with real tasks: Build agents for actual problems you have, not theoretical ones

Where to Find Examples

All 28 of my agents are open source:

ai-prompts repository - Browse the full collection

The agents are organized into packs:

Infrastructure (7): Storage, monitoring, security, networking, databases
Development (7): Code building, review, testing, API design, performance
Content (6): Blog writing, documentation, social media, newsletters
Research (8): Market research, competitive analysis, fact-checking, trends

Each one follows the golden standard pattern. Fork them, adapt them, make them yours.

Next Steps

If you’re building agents, start with one real problem you have. Build an agent for it using this structure. Test it. Iterate.

The patterns compound. Once you have one good agent, building the next one is faster. And when you have ten agents that work well together, you’ve got something that genuinely multiplies your capabilities.

That’s the goal, anyway. I’m not even three years into my apprenticeship, but this is the thing that caught my attention and kept me hooked for days. There’s something here.