Claude Certified Architect Foundations: The Complete Guide - Part 2

CCA-F exam prep - domains, code examples, and practice questions. All in one place.

May 20, 2026

We started this guide with Part 1. We covered agentic architecture, MCP integration, and Claude Code configuration. If you have not read part 1 yet, start there.

This part covers the remaining domains and how to register for the exam. Here is what we will cover.

Prompt Engineering & Structured Output
Context Management & Reliability
How to register for the exam

I have also added more practice questions to the CCAF GitHub repo. Six sets now, covering all domains.

1. Prompt Engineering & Structured Output

Explicit Criteria
Few-shot Prompting
Structured Output
Validation-Retry Loops
Multi-pass Review
Schema Design

Explicit Criteria

We have to give exact instructions to Claude instead of vague conditions. This is called explicit criteria. Telling Claude like “be careful” or “report high security issues” does not work. Claude does not know what that means in practice.

Tell Claude exactly what to do and what not to do. When you are more specific, Claude makes fewer mistakes.

# Bad — vague, 
system_prompt = """
Review this code and flag any security issues.
Only report high confidence findings.
"""

# Good — categorical, Claude knows exactly what to flag
system_prompt = """
Review this code for security issues.
Flag a finding ONLY when:
- User input is passed directly to a SQL query without parameterization
- A secret or API key is hardcoded as a string literal
- User-supplied data is rendered in HTML without escaping

Do not flag theoretical risks or best practice suggestions.
"""

Few-shot Prompting

Give Claude examples before asking it to do the task. This is called few-shot prompting. Two to four examples is enough. More than that wastes tokens.

One thing to keep in mind. Few-shot prompting is probabilistic. Claude will follow the examples most of the time but not always. If you want something to run without fail, put it in a hook.

system_prompt = """
Extract the customer name and order ID from the message below.

Example 1:
Message: "Hi, I am John and my order 1234 has not arrived."
Output: {"name": "John", "order_id": "1234"}

Example 2:
Message: "This is Sarah. Order #5678 is damaged."
Output: {"name": "Sarah", "order_id": "5678"}

Now extract from this message:
"""

Structured Output

If you ask Claude to “output as JSON” in the prompt, it will try. But it is not guaranteed. The reliable way is to use structured outputs.

Method 1 — JSON outputs

Define the schema in output_config of the client.messages.create() call.
Claude returns the answer as JSON text in response.content[0].text.
Parse it with json.loads() to get the data.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract order details from: John's order 1234 is missing"}],
    output_config={ # tells Claude to return structured JSON
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "customer_name": {"type": "string"},
                    "order_id": {"type": "string"},
                    "issue": {"type": "string"}
                },
                "required": ["customer_name", "order_id", "issue"]
            }
        }
    }
)
# Claude returns guaranteed JSON in content[0].text
result = json.loads(response.content[0].text)
# {"customer_name": "John", "order_id": "1234", "issue": "missing"}

Method 2 — Strict tool use

This method is used to pass structured input to your tools with guaranteed schema compliance.

When Claude calls a tool in an agentic workflow, it passes parameters to that tool. Without strict mode, those parameters may not match your schema exactly. With strict: True on the tool definition, Claude is forced to pass parameters that exactly match your input_schema. The structured data Claude fills in is available in block.input on the tool call response.

tools = [{
    "name": "extract_order",
    "description": "Extract order details",
    "strict": True,          # Claude must fill in all fields exactly as defined
    "input_schema": {
        "type": "object",
        "properties": {
            "customer_name": {"type": "string"},
            "order_id": {"type": "string"},
            "issue": {"type": "string"}
        },
        "required": ["customer_name", "order_id", "issue"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "extract_order"},  # force Claude to call this tool
    messages=[{"role": "user", "content": "John's order 1234 is missing"}]
)

for block in response.content:
    if block.type == "tool_use":
        print(block.input)   # structured data Claude filled in
        # {"customer_name": "John", "order_id": "1234", "issue": "missing"}

Validation-Retry Loops

Structured output guarantees the structure. It does not guarantee the values are correct. This is where the validation-retry loop comes in. When Claude returns wrong values, you send the error back with specific details about what went wrong. Claude fixes it and tries again.


MAX_RETRIES = 3
for attempt in range(MAX_RETRIES):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        tool_choice={"type": "tool", "name": "extract_order"},
        messages=messages
    )

    for block in response.content:
        if block.type == "tool_use":
            result = block.input

            # Validate the result
            if not result.get("order_id").startswith("ORD-"):
                # Send specific error back — not a generic "try again"
                messages.append({"role": "assistant", "content": response.content})
                messages.append({"role": "user", "content": [{
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": "order_id must start with ORD-. You returned: " + result["order_id"]
                }]})
                break  # go to next attempt

            return result  # valid — done

raise RuntimeError("Failed to get valid output after max retries")

Always send specific error details. Tell Claude which field failed and what you expected. A generic "try again" does not give Claude enough context to fix the issue.

Retry loops fix format errors. But if the data is simply not there in the source, retrying will not help. Claude cannot return something that does not exist.

Multi-pass Review

When Claude generates something and then reviews it in the same session, it already knows how it was built. It will miss its own mistakes. That is the problem with self-review.

Multi-pass review fixes this. You use a separate Claude session to review the output. The reviewer session has no memory of how the output was generated. It looks at the result with fresh eyes.

# Pass 1 — generate the code
generation_response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Write a FastAPI endpoint for processing refunds"}]
)
generated_code = generation_response.content[0].text

# Pass 2 — review in a separate session
review_response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": f"Review this code for security vulnerabilities:\n\n{generated_code}"}]
)

Never use the same session for generation and review. Always use a separate session.

Schema Design

When you define a JSON schema for structured output, each field tells Claude what to extract and in what format. One of those fields can be an enum. It is a fixed list of allowed values. For example, if you have an issue field you can restrict it to "missing", "damaged", or "wrong_item", you define it as an enum. Claude can only pick from that list.

Three things to know in Schema Design:

Enums - always add "other" to the list. If Claude sees something that does not fit any of the defined values, it will pick the closest one instead of telling you it does not know. With "other" in the list, Claude has a safe option to fall back to.

Nullable fields — if a field may not always be present, use ["string", "null"] instead of "string". Never mark optional fields as required. Claude will make up a value just to fill it.

Flat structures — avoid deeply nested objects in your schema. Deep nesting makes it harder for Claude to fill the schema accurately. Keep your schema flat. If the data is complex, split it across multiple tool calls.

tools = [{
    "name": "extract_order",
    "description": """Extract order details. Issue must be one of: missing, damaged, wrong_item, other.
    Use other when the issue does not fit the above categories.""",
    "strict": True,  # Claude must follow the schema exactly
    "input_schema": {
        "type": "object",
        "additionalProperties": False,  # no extra fields allowed
        "properties": {
            "customer_name": {"type": "string"},
            "order_id": {"type": "string"},
            # issue must be a string 
            # issue can only be one of these four values — always include "other" in enum.
            "issue": {
                "type": "string",
                "enum": ["missing", "damaged", "wrong_item", "other"]
            },
            # fill this when issue is "other" — gives Claude somewhere to put unexpected values
            "issue_detail": {"type": "string"},
            # nullable — Claude returns null when delivery date is not mentioned in the source
            "delivery_date": {"type": ["string", "null"]}
        },
        # only these fields are required — issue_detail and delivery_date are optional
        "required": ["customer_name", "order_id", "issue"]
    }
}]

2. Context Management & Reliability

This domain carries 15% of your exam score. It is the smallest domain. But it matters. If you do not manage context well, your agents become unreliable in production.

Below are the key concepts.

Lost-in-the-Middle Effect
Progressive Summarization
Tool Result Trimming
Escalation and Ambiguity
Error Propagation
Prompt Caching
Message Batches API

Lost-in-the-Middle Effect

Claude can process up to 1 million tokens in a single request. But more context does not always mean better results.

When you pass a long input, Claude processes the beginning and end reliably. What is in the middle often gets missed. This is called the lost-in-the-middle effect.

Two things to know

Place critical information at the beginning or end. Never place it in the middle.
For large context tasks, delegate to subagents. Each subagent starts fresh with only what it needs.

Progressive Summarization

As an agent runs through multiple turns, the conversation history keeps growing. At some point it gets too large to pass in full on every API call.

Progressive summarization solves this. Instead of passing everything, you summarize the older turns and keep only the recent ones in full. The summary replaces the old turns in the messages array.

# When history gets too long, summarize older turns
summary_response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Summarize this conversation history into key facts only:\n\n{old_turns}"
    }]
)

summary = summary_response.content[0].text

# Replace old turns with the summary
messages = [
    {"role": "user", "content": f"Previous context: {summary}"},
    *recent_turns  # keep the last few turns in full
]

When we summarize, we may lose details. Dates, percentages, and numeric values often become vague. If you want anything to stay precise, extract it into a structured state object before summarizing.

Tool Result Trimming

Every tool call adds output to the context. If a tool returns 40 fields but you only need 5, the remaining 35 are wasting tokens. Over multiple turns this adds up fast and fills your context with noise.

Trim tool results before appending them to the next API messages array. Always keep your context clean. Pass only what Claude needs. Learn more about managing tool context here.

# Bad — full tool result, 40+ fields appended to messages
result = get_customer(customer_id="C-42")
messages.append({
    "role": "user",
    "content": [{
        "type": "tool_result",
        "tool_use_id": block.id,
        "content": json.dumps(result)
    }]
})

# Good — trim before appending, only what Claude needs
result = get_customer(customer_id="C-42")
trimmed = {
    "id": result["id"],
    "name": result["name"],
    "status": result["status"]
}
messages.append({
    "role": "user",
    "content": [{
        "type": "tool_result",
        "tool_use_id": block.id,
        "content": json.dumps(trimmed)
    }]
})

Escalation and Ambiguity

Not every task should be handled autonomously. Knowing when to escalate to a human is just as important as knowing how to complete a task.

Three valid reasons to escalate:

When the user explicitly requests a human, escalate immediately. Do not attempt to investigate first.
When the agent does not have clear instructions for the situation, escalate. Do not guess.
When the agent cannot make meaningful progress, escalate.

When a tool returns multiple matching results, ask the user for more details to narrow it down. Never pick one based on a guess.

# Tool returns escalation signal
@mcp.tool()
def process_refund(order_id: str, amount: float) -> dict:
    if amount > 500:
        return {"requires_human": True, "reason": "Refund above $500 needs approval"}
    return {"success": True, "refund_id": "REF-001"}

# Agent loop checks for it
result = tool_map[block.name](**block.input)

if result.get("requires_human"):
    escalate_to_human(result)
    return

Two common mistakes to avoid.

Do not escalate just because the user seems frustrated. Frustration does not mean the task needs human intervention. Acknowledge it, try to resolve it, and only escalate if the user explicitly asks for a human.

Do not use the agent’s own confidence score as an escalation signal. Models tend to be overconfident even when they are wrong. It is not reliable.

Error Propagation

In a multi-agent system, when a subagent fails, the coordinator needs to know what went wrong. If the subagent returns a generic error, the coordinator cannot decide what to do next.

Subagents should always return structured error context. Not a generic failure status.

# Bad — coordinator has no idea what went wrong
return {"error": True}

# Good — coordinator knows what failed and what to do next
return {
    "error": True,
    "type": "tool_failure",
    "tool": "lookup_order",
    "reason": "Order not found in database",
    "retryable": False
}

Never suppress errors silently. If a subagent fails without sending a signal, the coordinator assumes success and moves on. That produces wrong results in the next steps.

Not all errors are the same. A timeout can be retried. A missing record cannot be retried. Tell the coordinator what went wrong so it can decide what to do.

Also remember, subagents do not share context automatically. If one fails, the coordinator has to pass that information forward explicitly.

Prompt Caching

Every time you call the Claude API, you send the full context. You send system prompt, conversation history, tool definitions. Claude processes all of it from scratch on every single call. That adds up fast. It becomes more tokens with higher cost.

With prompt caching we can reduce the cost. You mark the static parts of your prompt with cache_control. Claude caches them on the first call. Instead of paying full price on every call, you pay only 10% for the tokens that are already cached.

There are two ways to enable it.

Automatic caching - You just add cache_control at the top level of your request. Claude scans your prompt from top to bottom and places the cache breakpoint at the end of the last block that can be cached, which is typically the end of your system prompt or tool definitions.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},  # automatic — Claude picks the breakpoint
    system="You are a customer support agent.................  ",
    messages=[{"role": "user", "content": user_query}]
)

Explicit caching - you mark specific content blocks with cache_control directly inside the system array. Claude caches everything before and up to that block. Use this when you want to control exactly where the cache ends.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are a customer support agent."},
        {
            "type": "text",
            "text": large_document,
            "cache_control": {"type": "ephemeral"}  # cache everything up to here
        }
    ],
    messages=[{"role": "user", "content": user_query}]
)

Always cache static content only. You can put system prompts, tool definitions, and large documents in cache. Never put dynamic content in cache.

Every API call checks whether the cached prefix matches exactly. Even a single character difference causes a cache miss and Claude reprocesses everything from scratch.

TTL and pricing

TTL stands for Time to Live. It is how long the cache stays active. By default the cache expires after 5 minutes. Every cache hit resets the timer. If your requests come in less frequently, you can extend the TTL to 1 hour at 2x the cache write price.

The first call writes to cache and costs 1.25x normal input tokens.
Cache read costs 0.1x normal input tokens.
You can place a maximum of 4 cache_control breakpoints in a single request.
If you use the Batch API and prompt caching together, both discounts apply.
The first call costs more because it writes to cache. From the 4th read onwards you start saving money.

To check if caching is working, look at the response usage.

print(response.usage.cache_read_input_tokens)     # tokens read from cache
print(response.usage.cache_creation_input_tokens)  # tokens written to cache

If cache_read_input_tokens is greater than 0, the cache was hit.

Message Batches API

When you need to process a large number of API requests without needing an immediate response, use the Message Batches API. For example, processing thousands of customer feedback forms overnight, or doing bulk content generation.

You can submit up to 100,000 requests in one batch. Claude processes them asynchronously. Most batches finish within 1 hour, but they can run up to 24 hours. If they do not complete within 24 hours, they expire.

The cost is 50% off standard API pricing. The output quality is the same. Only the timing is different.

import anthropic
import time

client = anthropic.Anthropic()

# Create a batch with multiple independent requests
batch = client.messages.batches.create(
    requests=[
        {
            # This custom id will be returned with the result, 
            # so we know this output is for feedback_001
            "custom_id": "feedback_001",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 500,
                "system": "You are a support analyst. Summarize customer feedback in 2 short bullet points.",
                "messages": [
                    {
                        "role": "user",
                        "content": "The app is useful, but the login takes too much time."
                    }]
            }},
        {
            # This custom id will be returned with the result
            # so we know this output is for feedback_002
            "custom_id": "feedback_002",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 500,
                "system": "You are a support analyst. Summarize customer feedback in 2 short bullet points.",
                "messages": [
                    {
                        "role": "user",
                        "content": "The dashboard is clean, but export to PDF is missing."
                    }]
            }}
    ])

print(f"Batch ID: {batch.id}")

# Wait until the batch completes
while True:
    current_batch = client.messages.batches.retrieve(batch.id)

    if current_batch.processing_status == "ended":
        break

    time.sleep(30)

# Read the results
for result in client.messages.batches.results(batch.id):
    if result.result.type == "succeeded":
        print(f"{result.custom_id}: {result.result.message.content[0].text}")

    elif result.result.type == "errored":
        print(f"Error in {result.custom_id}: {result.result.error}")

    elif result.result.type == "canceled":
        print(f"Canceled: {result.custom_id}")

    elif result.result.type == "expired":
        print(f"Expired: {result.custom_id}")

Three things to know.

Use custom_id to tie each result back to your input. Without it you cannot match results to requests.
You can combine the Batch API with prompt caching. Both discounts apply together, so if your requests share the same system prompt, use both for maximum savings.
Use the Batch API for async workloads only. If you need a real-time response, use the standard Messages API.

3. How to Register for the exam

The exam is currently available to employees of companies in the Anthropic Partner Network. The Partner Network is free to join. Any company building with Claude qualifies for it.

Step 1 — Join the Partner Network

Your company needs to be part of the Anthropic Partner Network. If you are not in the partner network then apply for Claude Partner Network.

Step 2 — Request exam access

If you are already in the partner network, then login to the network and take up the exam.

Step 3 — About exam

60 questions. 120 minutes. No notes, no AI tools, no external resources are allowed.

Free prep resources

Free resources are available on Anthropic Academy and open to everyone. Start there regardless of whether you have exam access yet.

Certificate validity

The certificate is valid for 6 months. After that you need to recertify.

Conclusion

I hope this gave you a clear picture of what the exam covers. We went through all the domains across both parts.

I have also added 6 sets of practice questions to the GitHub repo.

These concepts are not just for the exam. Knowing them well will make you a better engineer when building with Claude.

Happy Learning!

Thanks for reading Dev Shorts! This post is public so feel free to share it.

Dev Shorts

Discussion about this post

Ready for more?