Skip to content
System Prompt Best Practices: Write Better AI Instructions

System Prompt Best Practices: Write Better AI Instructions

Why System Prompts Matter

The system prompt is the most important text in your AI application. It is the instruction set that shapes every response the model generates. A well-written system prompt produces consistent, high-quality, on-brand outputs. A poorly written one produces unpredictable results and costs you more in tokens and user frustration.

These system prompt best practices apply to OpenAI’s GPT models, Anthropic’s Claude, Google’s Gemini, and any other LLM that supports system-level instructions.

Start With a Clear Role

Tell the model exactly what it is and what it does. A specific role produces more consistent behavior than vague instructions.

Weak:

You are a helpful assistant.

Strong:

You are a senior Python developer who reviews code for a fintech startup. You focus on security vulnerabilities, performance bottlenecks, and PEP 8 compliance. You explain issues clearly and suggest specific fixes with code examples.

The strong version gives the model a concrete identity, domain focus, and output expectations. This anchors every response it generates.

Define the Output Format

If you need responses in a specific format, specify it explicitly. Do not assume the model will guess correctly.

Specify Structure

Respond in this format:
## Summary
[1-2 sentence summary of the issue]

## Severity
[Low / Medium / High / Critical]

## Details
[Explanation of the issue]

## Fix
[Code example showing the fix]

Specify Length

Keep responses under 200 words unless the user asks for more detail.

Specify Style

Write in a professional but approachable tone. Use second person ("you"). Avoid jargon unless the user uses it first. Never use emojis.

Being explicit about format eliminates the need for follow-up prompts asking the model to restructure its response — saving tokens and improving user experience.

Use Positive Instructions

Tell the model what to do, not what not to do. Positive instructions are more effective and more token-efficient.

Weak (negative):

Don't use technical jargon. Don't write long responses. Don't make assumptions about the user's skill level. Don't include code unless asked.

Strong (positive):

Use simple, everyday language. Keep responses concise — under 150 words for simple questions. Ask the user's experience level before giving technical advice. Include code only when the user requests it.

Both versions communicate similar constraints, but positive framing gives the model a clear action to take rather than a minefield to avoid.

Provide Examples

Few-shot examples are one of the most powerful tools in prompt engineering. They show the model exactly what good output looks like.

Here are examples of how to respond to common questions:

User: "What's the difference between let and const?"
Assistant: "Both declare variables in JavaScript. Use `const` when the value won't change — it prevents reassignment. Use `let` when you need to reassign the variable later, like a counter in a loop. Default to `const` and switch to `let` only when needed."

User: "How do I center a div?"
Assistant: "The modern approach uses Flexbox. Add these styles to the parent container:
```css
display: flex;
justify-content: center;
align-items: center;

This centers the child both horizontally and vertically.”


Two or three examples establish a pattern more effectively than paragraphs of instruction. The model picks up on tone, length, formatting, and depth from examples.

## Set Boundaries

Define what the model should and should not handle. This prevents off-topic responses and reduces hallucination risk.

You answer questions about our inventory management software. If the user asks about topics unrelated to inventory management, say: “I can only help with inventory-related questions. For other inquiries, please contact support@example.com.”

You do not have access to user account data. If asked about specific account details, direct the user to log in to their dashboard.


Clear boundaries keep the model focused and prevent it from fabricating information about topics outside its defined scope.

## Optimize for Tokens

System prompts are sent with every request. A system prompt that is 500 tokens costs you 500 input tokens on every single API call. At scale, this adds up significantly. Here are practical ways to keep your system prompt token-efficient:

### Cut Redundancy

If you say the same thing two different ways for emphasis, remove one version. The model understands the instruction the first time.

### Use Concise Language

// Verbose (32 tokens) When the user provides you with a piece of code and asks you to review it, you should carefully analyze the code for any potential issues.

// Concise (15 tokens) When reviewing user code, analyze it for potential issues.


### Move Static Context to the User Message

If certain context only applies to specific requests, do not include it in the system prompt. Pass it in the user message instead so you only pay for it when needed.

### Measure Your Prompt

Use the [Token Counter](/tools/token-counter) to check your system prompt's token count. Even a 10% reduction saves money on every request. For an application making 50,000 requests/day, reducing a 1,000-token system prompt by 100 tokens saves ~$22.50/month on GPT-4o input costs alone.

## Handle Edge Cases

Think about what happens when users do unexpected things, and include instructions for those scenarios.

If the user’s message is empty or contains only whitespace, respond: “It looks like your message is empty. Could you try again?”

If the user’s message is in a language other than English, respond in the same language they used.

If the user pastes an error message without context, ask what they were trying to do before suggesting solutions.


Handling edge cases in the system prompt prevents broken user experiences and reduces the need for application-level error handling.

## Iterate Based on Real Usage

The best system prompts are not written in one sitting. They evolve through testing and real user interactions.

### Log and Review

Review actual API calls in production. Look for responses where the model deviated from your expectations. Each deviation is an opportunity to improve your system prompt.

### A/B Test

Try different versions of your system prompt and measure output quality. Small changes in wording can produce measurably different results.

### Version Control

Treat your system prompt like code. Keep it in version control, document changes, and roll back if a new version degrades quality.

## Common Mistakes

### Being Too Vague

"Be helpful and professional" is not a useful instruction. Every model is already trying to be helpful. Add specifics about what "helpful" means for your application.

### Being Too Long

A 3,000-token system prompt full of edge cases and disclaimers is expensive and can actually confuse the model. Prioritize the instructions that have the highest impact on output quality.

### Contradicting Yourself

"Keep responses short" followed later by "Always provide thorough explanations with examples" creates a conflict. The model will randomly favor one instruction or the other. Resolve contradictions before deploying.

### Ignoring the Model's Strengths

Different models respond differently to the same system prompt. Claude tends to follow long, detailed instructions well. GPT-4o responds well to concise, structured prompts. Optimize your system prompt for the specific model you are using.

## Conclusion

System prompt best practices boil down to clarity, specificity, and efficiency. Define a clear role, specify the output format, provide examples, set boundaries, and optimize for tokens. Then iterate based on real usage data.

Start by measuring your current system prompt with the [Token Counter](/tools/token-counter). See exactly how many tokens it uses, identify opportunities to trim, and test the results. For cost projections based on your optimized prompt, use the [Pricing Calculator](/tools/pricing) to see how prompt optimization translates to real savings.