第 4 课：Chain-of-Thought（CoT Agent）

必须掌握：

3.1 CoT 推理模板（zero-shot vs few-shot）
3.2 CoT 如何与 tool-use 结合
3.3 让模型自己验证推理
3.4 让模型复查自身答案（self-consistency）
3.5 如何减少错误 CoT（多路径推理 self-consistency）

（一）什么是 CoT Agent（工程视角）

普通 LLM的决策流程是：Input → 黑盒 → Output

改成 CoT Agent 之后：

Input → 推理步骤 → 决策 → 输出
            ↑
        可校验 / 可回滚

在 Agent 中，CoT 的作用主要有 4 个：

提升复杂任务成功率
暴露中间推理供 Validator 使用
支持 self-check / replan
为 Planner 提供依据

CoT = 把“隐式推理”显式化 + 可验证化 + 可控制化 真正的价值不是“让模型想”，而是： 👉 让你知道它怎么想、能不能纠错、要不要信

（二）CoT 推理模板

在非 CoT 下回答为：

QUESTION = """You are scheduling a 60-minute meeting tomorrow.
Constraints:
- Must start on the hour (e.g., 10:00, 11:00).
- Must be after 09:00 and end by 14:00.
- You have existing events: 09:00-09:30, 10:00-11:00, 12:00-12:30, 13:00-14:00.
Task:
1) List all possible start times.
2) Choose the earliest valid start time.
Answer concisely.
"""

messages = [
    {"role": "system", "content": "You are a helpful assistant. Answer the user."},
    {"role": "user", "content": QUESTION},
]
print(chat(messages, temperature=0.2))

1) Possible start times: 11:00.
2) Earliest valid start time: 11:00.

1. Zero-shot CoT（最基础）

Prompt 核心技巧

Let's think step by step.

示例

# Zero-shot CoT：只加 “Let's think step by step”
# 注意：为了更稳定体现“步骤化推理”，我同时要求最后给出简洁答案
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": QUESTION + "\nLet's think step by step, then give the final answer."},
]
print(chat(messages, temperature=0.2))

To determine the possible start times for a 60-minute meeting, we need to consider the constraints and existing events:

1. The meeting must start on the hour.
2. It must start after 09:00 and end by 14:00.
3. We need to avoid conflicts with existing events: 09:00-09:30, 10:00-11:00, 12:00-12:30, and 13:00-14:00.

Let's evaluate each potential start time:

- **09:00**: Not possible, as it conflicts with the 09:00-09:30 event.
- **10:00**: Not possible, as it conflicts with the 10:00-11:00 event.
- **11:00**: Possible, as it does not conflict with any events. The meeting would end at 12:00, which is before the 12:00-12:30 event.
- **12:00**: Not possible, as it conflicts with the 12:00-12:30 event.
- **13:00**: Not possible, as it conflicts with the 13:00-14:00 event.

Based on this analysis, the only possible start time is 11:00.

Final answer:
1) Possible start times: 11:00.
2) Earliest valid start time: 11:00.

工程上：只适合低风险任务

2. Few-shot CoT（工程常用）

Prompt 结构

Q1: ...
Thought: ...
Answer: ...

Q2: ...
Thought: ...
Answer: ...

Q3: <用户问题>
Thought:

示例（简化）

Q: 3 apples cost $6. How much do 5 apples cost?
Thought: One apple costs $2. 5 apples cost $10.
Answer: $10

Q: A car travels 120km in 2 hours. How far in 5 hours?
Thought:

说白了，就是给了一个例子，并且规定了思维轨迹

# Few-shot CoT：给 1-2 个示例，规范 Thought/Answer 轨迹
# 示例同样是“时间段约束”类任务，保证迁移有效
FEW_SHOT = """Q1:
You are scheduling a 30-minute call today.
Constraints:
- Must start on the hour or half-hour.
- Must be after 10:00 and end by 12:00.
- Existing events: 10:30-11:00, 11:30-12:00.
Task:
1) List all possible start times.
2) Choose the earliest valid start time.
Thought:
We need 30-min slots between 10:00 and 12:00. Candidate starts: 10:00, 10:30, 11:00, 11:30.
Check conflicts:
- 10:00-10:30 free (ends at 10:30, ok).
- 10:30-11:00 conflicts.
- 11:00-11:30 free.
- 11:30-12:00 conflicts.
So possible starts: 10:00, 11:00. Earliest is 10:00.
Answer:
Possible start times: 10:00, 11:00. Earliest: 10:00.

Q2:
A 60-minute workshop today.
Constraints:
- Must start on the hour.
- Must be after 13:00 and end by 17:00.
- Existing events: 14:00-15:00, 16:00-16:30.
Task:
1) List all possible start times.
2) Choose the earliest valid start time.
Thought:
Hourly starts between 13:00 and 16:00 (since 60 min): 13:00, 14:00, 15:00, 16:00.
Check:
- 13:00-14:00 free.
- 14:00-15:00 conflicts.
- 15:00-16:00 free.
- 16:00-17:00 overlaps 16:00-16:30, so conflict.
Possible: 13:00, 15:00. Earliest: 13:00.
Answer:
Possible start times: 13:00, 15:00. Earliest: 13:00.
"""
messages = [
    {"role": "system", "content": "You are a helpful assistant. Follow the demonstrated format strictly."},
    {"role": "user", "content": FEW_SHOT + "\nQ3:\n" + QUESTION + "\nThought:"},
]
chat(messages, temperature=0.2)

We need 60-minute slots starting on the hour between 09:00 and 13:00 (since it must end by 14:00). Candidate starts: 09:00, 10:00, 11:00, 12:00, 13:00.

Check conflicts:
- 09:00-10:00 conflicts with 09:00-09:30.
- 10:00-11:00 conflicts.
- 11:00-12:00 is free.
- 12:00-13:00 overlaps with 12:00-12:30.
- 13:00-14:00 conflicts.

Possible start time: 11:00. Earliest: 11:00.

Answer:
Possible start time: 11:00. Earliest: 11:00.

（三）CoT 如何与 Tool-Use 结合

在第三课和第二课，我们已经尝试构建了一个模型的思考流程。

模型决策 → （如果要调用工具）调用工具 → 获取反馈 Observation → 输入下一轮对话作为判断依据 → 循环至模型判断final

但是现在的整个流程是我们在代码中手动定义的。

如果想让模型自行决定是否调用工具、调用几次、合适停止，怎么在 CoT 中的例子里给出调用工具的示范步骤呢？

一个典型错误模式

Thought: 我搜索了一下，发现……

这会让模型在“假装用工具”**

正确模式: CoT + Tool 分离

Thought: I need external information about X.
Action: tool_call
Action Input: {"query": "..."}

工具返回后（Observation）

Observation:
<tool output>

Thought: Based on the observation, I can conclude...
Final: ...

关键部分代码

前置

from typing import Any, Dict, Optional, Tuple
# -------------------------
# 工具与 schema（关键：tool_input 字段名在这里显式规定）
# -------------------------
TOOL_SCHEMAS = {
    "search": {
        "description": "Search for external facts.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"],
            "additionalProperties": False
        }
    }
}
TOOLS = {
    "search": lambda query: f"[SEARCH_RESULT] query={query}\n- Agent: LLM decides actions, may call tools, uses observations to update state."
}
ALLOWED_ACTIONS = {"tool_call", "finish", "replan", "ask_user"}
ALLOWED_TOOLS = set(TOOLS.keys())

把 CoT 与 tool-call 分离的协议（JSON schema），其实就是要求模型的回答格式中把 thought 和 tool 分开

DECISION_SCHEMA_TEXT = """
{
  "thought": "string",                 // reasoning (no tool facts allowed)
  "action": "tool_call|finish|replan|ask_user",
  "tool": "string|null",
  "tool_input": "object|null",
  "final": "string|null"
}
""".strip()

Few-shot：示范“Thought -> tool_call -> Observation -> Thought -> finish”

FEW_SHOT_EXAMPLE = """
Example 1 (needs tool):

User goal: "Who is the current CEO of ExampleCorp?"
Assistant decision JSON:
{
  "thought": "I don't have reliable up-to-date info. I should search for 'ExampleCorp current CEO'.",
  "action": "tool_call",
  "tool": "search",
  "tool_input": {"query": "ExampleCorp current CEO"},
  "final": null
}

[System provides Observation after tool runs]
Observation:
[SEARCH_RESULT] ExampleCorp CEO is Alice Zhang (2025).

Assistant decision JSON:
{
  "thought": "The observation states the CEO is Alice Zhang. I will answer using only that observation.",
  "action": "finish",
  "tool": null,
  "tool_input": null,
  "final": "ExampleCorp's current CEO is Alice Zhang."
}

Example 2 (no tool needed):

User goal: "Explain what a hash table is in one sentence."
Assistant decision JSON:
{
  "thought": "This is general knowledge; no external facts needed.",
  "action": "finish",
  "tool": null,
  "tool_input": null,
  "final": "A hash table stores key–value pairs and uses a hash function to enable fast average-case lookup, insert, and delete."
}
""".strip()

综合的系统提示词

SYSTEM_PROMPT = f"""
You are an agent decision engine.

You MUST output exactly one JSON object. No markdown, no extra text.

Allowed actions: {sorted(ALLOWED_ACTIONS)}
Allowed tools (only if action=tool_call): {sorted(ALLOWED_TOOLS)}

Decision schema:
{DECISION_SCHEMA_TEXT}

Tool schemas:
{json.dumps(TOOL_SCHEMAS, ensure_ascii=False)}

Rules (critical):
1) "thought" is for reasoning ONLY. It must NOT include any tool results or external facts.
2) External facts can ONLY appear if they are present in the latest Observation injected by the system.
3) If you need external facts, choose action="tool_call".
4) If action="tool_call": final must be null.
5) If action="finish": tool/tool_input must be null.

Follow the examples exactly in style and structure.
{FEW_SHOT_EXAMPLE}
""".strip()

部分流程函数，流程和之前一样。

def call_llm(messages, model="gpt-4o", max_tokens=300, temperature=0.2) -> str:
   	# 单次调用代码，和之前一样
    return r.json()["choices"][0]["message"]["content"]

def validate_decision(obj: Dict[str, Any], tools_available: set) -> Tuple[bool, str]:
		# 检查代码，和之前一样
    return True, "ok"

def decide_with_retry(state: Dict[str, Any], tools_available: set, last_observation: Optional[str] = None, max_retries: int = 2):
    # 循环尝试，获取答案（Obj），和之前一样

def run_agent(goal: str, max_steps: int = 6):
	# 循环尝试，设置允许最大工具尝试使用次数，直到模型判断 final

if __name__ == "__main__":
    print(run_agent("Explain what an agent is. Use search if you need external facts."))

（四）让模型自己验证推理（Self-Verification）

这是 把 CoT 变成“可控组件” 的第一步。

Prompt 模板

Solve the problem step by step.

Then verify your reasoning.
If there is an error, correct it before giving the final answer.

Output format:
Thought:
Verification:
Final:

Validator 升级

if "Verification" not in obj:
    return False, "Missing verification step"

（五）让模型复查自身答案（Self-Consistency）

思想核心 同一个问题 → 多条推理路径 → 投票选答案

基本流程

Input
 ├─ CoT path 1 → Answer A
 ├─ CoT path 2 → Answer A
 ├─ CoT path 3 → Answer B
 ↓
 Majority vote → Answer A

代码示意

answers = []
for _ in range(5):
    out = call_llm(prompt, temperature=0.7)
    answers.append(parse_final(out))

final = majority_vote(answers)

适合场景

高风险回答
事实性问题
决策型 Agent

(六）CoT 在 Agent 系统中的正确位置

重要分工原则

模块	是否需要 CoT
Planner	✅ 强烈需要
Tool selection	⚠️ 简短 CoT
Executor	❌ 不需要
Final answer	❌ 可隐藏

工程实践常见做法

Planner：显式 CoT
Executor：无 CoT
User-facing answer：不展示 CoT

（防泄露 + 降噪）

必须掌握：​

（一）什么是 CoT Agent（工程视角）​

（二）CoT 推理模板​

1. Zero-shot CoT（最基础）​

2. Few-shot CoT（工程常用）​

（三）CoT 如何与 Tool-Use 结合​

关键部分代码​

（四）让模型自己验证推理（Self-Verification）​

Prompt 模板​

Validator 升级​

（五）让模型复查自身答案（Self-Consistency）​

(六）CoT 在 Agent 系统中的正确位置​