这周末把手里的 LLM 接入了 Agent，谈谈 ReAct 模式的实战

周末的突发奇想

作为一名被 LLM (Large Language Model) 浪潮裹挟的开发者，每天都在调用 OpenAI 的 API 这里问问那里聊聊。但你会发现，单纯的 Chatbot 很傻：由于它没有联网，无法获取实时信息；由于没有执行环境，它无法帮你跑代码、查数据库。

这周末闲得无聊，我决定不再做“调包侠”，而是尝试手写一个 AI Agent。目标很简单：给我一个自然语言指令（比如“查询当前 QixYuan 博客有多少篇文章，并计算平均阅读时长”），让 AI 自己去分解任务、调用工具、最终给出答案。

核心技术栈锁定了 ReAct (Reasoning + Acting) 模式。虽然 LangChain 已经封装得很好，但为了搞懂原理，我决定赤手空拳用 Python 原生代码实现一遍。

1. 什么是 ReAct？

ReAct 并非 React.js，而是 Reasoning (推理) + Acting (行动)。
这是 Google DeepMind 提出来的一种 Prompt Engineering 范式。

传统的 Chain-of-Thought (CoT) 只是让模型“想清楚再回答”，而 ReAct 让模型：

Thought: 思考现在该干嘛。
Action: 决定调用哪个工具，传什么参数。
Observation: 观察工具返回的结果。
Loop: 拿着结果继续思考，直到问题解决。

一图胜千言：

sequenceDiagram
    participant User
    participant LLM as Agent (Brain)
    participant Tool as Tools (Google/Python/etc)

    User->>LLM: "现在几点了？"
    LLM->>LLM: Thought: 用户在问时间，我需要用时钟工具。
    LLM->>Tool: Action: Use Clock
    Tool-->>LLM: Observation: 2024-02-02 20:30
    LLM->>LLM: Thought: 我拿到时间了。
    LLM-->>User: Final Answer: 现在是晚上八点半。

2. 核心架构设计

为了实现这个 Agent，我们需要定义三个核心组件：

Tools: AI 可以调用的函数。
Prompt Template: 告诉 LLM “你是一个 Agent，你有这些工具，请按 ReAct 格式输出”。
Executor Loop: 一个死循环，负责解析 LLM 的输出并执行工具。

第一步：定义工具 (Tools)

我们定义两个简单的工具：一个模拟谷歌搜索，一个模拟计算器。

import math

def google_search(query):
    """模拟搜索，实际项目可接入 SerpAPI"""
    print(f"[System] Searching for: {query}")
    if "QixYuan 博客" in query:
        return "QixYuan 的博客共有 2 篇文章，第一篇 2000 字，第二篇 3000 字。"
    return "未找到相关信息"

def calculator(expression):
    """安全的计算器"""
    print(f"[System] Calculating: {expression}")
    try:
        return str(eval(expression, {"__builtins__": None}, {"math": math}))
    except Exception as e:
        return str(e)

TOOLS = {
    "Search": google_search,
    "Calculator": calculator
}

第二步：核心 Prompt

这是 Agent 的灵魄。我们需要 meticulously 地设计这个 Prompt，强迫 LLM 输出特定格式。

SYSTEM_PROMPT = """
You are a smart assistant. You have access to the following tools:

{tool_descriptions}

Use the following format strictly:

Question: the input question you must answer
thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!
"""

第三步：Agent Executor (主循环)

这是最精彩的部分。我们需要编写代码来解析 LLM 的输出，提取 Action 和 Action Input。

import re
from openai import OpenAI # 假设我们用 OpenAI 兼容接口

client = OpenAI(api_key="sk-...", base_url="...")

def run_agent(question):
    # 1. 构造初始 Prompt
    tool_desc = "\n".join([f"{name}: {func.__doc__}" for name, func in TOOLS.items()])
    tool_names = ", ".join(TOOLS.keys())
    
    prompt = SYSTEM_PROMPT.format(tool_descriptions=tool_desc, tool_names=tool_names)
    prompt += f"\nQuestion: {question}\n"
    
    history = prompt
    
    while True:
        # 2. 调用 LLM
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": history}],
            stop=["Observation:"] # 重要！防止 LLM 替我们把 Observation 也编造了
        )
        content = response.choices[0].message.content
        print(f"\n--- LLM Output ---\n{content}\n------------------")
        
        history += content + "\n"
        
        # 3. 检查是否结束
        if "Final Answer:" in content:
            return content.split("Final Answer:")[-1].strip()
        
        # 4. 解析 Action
        # 正则提取 Action: xxx 和 Action Input: xxx
        action_match = re.search(r"Action:\s*(.*?)\n", content)
        input_match = re.search(r"Action Input:\s*(.*?)\n", content)
        
        if action_match and input_match:
            action_name = action_match.group(1).strip()
            action_input = input_match.group(1).strip()
            
            # 5. 执行工具
            if action_name in TOOLS:
                observation = TOOLS[action_name](action_input)
                obs_str = f"Observation: {observation}\n"
                print(obs_str.strip())
                history += obs_str
            else:
                history += f"Observation: Tool {action_name} not found.\n"
        else:
            # 如果 LLM 没按格式输出，在这个简易版里我们强制让它重试(或者直接 break)
            print("Layout error, retrying...")
            break

3. 实战测试

让我们跑一下最初的目标：

1	`run_agent("查询 QixYuan 博客的文章字数，并计算总字数")`

运行过程日志 (模拟)：

LLM Thought: 我需要先搜索博客信息，得到文章列表和字数，然后用计算器求和。
LLM Action: Search
LLM Input: "QixYuan 博客文章字数"
System Exec: 调用 google_search，返回 "QixYuan 的博客共有 2 篇文章，第一篇 2000 字，第二篇 3000 字。"
LLM Thought: 我拿到了字数：2000 和 3000。现在我需要计算 2000 + 3000。
LLM Action: Calculator
LLM Input: "2000 + 3000"
System Exec: 调用 eval(2000 + 3000)，返回 "5000"
LLM Thought: 我算出来了。
LLM Final Answer: QixYuan 博客的总字数是 5000 字。

成功！ 我们只用了不到 50 行核心代码，就让 LLM 拥有了以太坊智能合约一般的执行能力（确信）。

4. 遇到的坑与反思

在手写过程中，我发现几个主要痛点：

Prompt 遵循性：模型越小（比如 7B），越容易无视格式。有时候它会直接把 Observation 自己编造出来，导致“幻觉循环”。这里的 stop=["Observation:"] 参数至关重要。
参数解析脆弱性：用 Regex 匹配 Action Input 极其脆弱。如果 Input 是多行 JSON 怎么半？现在的生产环境（如 OpenAI Function Calling）已经把这部分结构化了。
Context Window 限制：随着 ReAct 循环次数增加，Prompt 越来越长，很容易爆 Token。需要实现一种“记忆遗忘”或“总结”机制。

5. 结语

从零实现 ReAct 让我对 Agent 的祛魅了很多。它不是什么黑科技，本质上就是 Prompt Engineering + while 循环 + exec()。

但正是这种简单的组合，把 LLM 从“缸中之脑”变成了能够触摸世界的“智能实体”。下一步，我准备给它加上文件读写权限，让它帮我自动整理博客的 Markdown 文件。这里是 QixYuan，我们下个周末见。

AI Agent

#AI Agent #ReAct #Python #LangChain #大模型

这周末把手里的 LLM 接入了 Agent，谈谈 ReAct 模式的实战

https://www.qixyuan.top/2025/03/02/3-ai-agent-practice/

作者

QixYuan

发布于

2025年3月2日

许可协议

不再只会调包：手撸 Embedding 算法实现向量搜索优化上一篇

Linux 内核折腾笔记：手搓一个简易调度器玩玩下一篇