Blog

背景

JMS 是一个 JumpServer CLI 客户端，通过 WebSocket 连接远程终端。最近在终端内嵌入了一个 AI Panel（Ctrl+] 唤起），让 Claude 能直接在终端会话中辅助诊断问题。

这篇文章拆解 AI 部分的核心设计：Agent Loop 如何运转、SSE 流式响应怎么解析、多 tool 如何调度，以及新加的本地 workspace 能力。

整体架构

@startuml
!theme plain
skinparam backgroundColor #FEFEFE

actor User
participant "AI Panel\n(panel.go)" as Panel
participant "AIClient\n(ai.go)" as AI
participant "Claude API" as Claude
participant "Remote Terminal\n(WebSocket)" as WS
participant "Local Shell\n(os/exec)" as Local
participant "Skills\n(skill.go)" as Skill

User -> Panel : 输入问题
Panel -> AI : RunAgentLoop(query, termContext)
loop 最多 10 轮
  AI -> Claude : POST /v1/messages (SSE stream)
  Claude --> AI : text delta / tool_use
  AI --> Panel : streamOutput(text)
  alt stop_reason == "tool_use"
    alt execute_command
      AI -> WS : 发送命令到远程终端
      WS --> AI : 命令输出
    else local_command
      AI -> Local : bash -c "命令"
      Local --> AI : 输出
    else load_skill
      AI -> Skill : loadSkillBody(name)
      Skill --> AI : SKILL.md 内容
    end
    AI -> Claude : tool_result
  else stop_reason == "end_turn"
    AI --> Panel : 最终回答
  end
end

@enduml

数据结构

请求/响应模型

Claude API 的消息格式是 content block 数组，每个 block 有不同 type：

type aiContentBlock struct {
    Type  string `json:"type"`            // "text" | "tool_use" | "tool_result"
    Text  string `json:"text,omitempty"`  // type=text 时的文本
    ID    string `json:"id,omitempty"`    // type=tool_use 时的调用 ID
    Name  string `json:"name,omitempty"` // tool 名称
    Input any    `json:"input,omitempty"` // tool 输入参数

    ToolUseID string `json:"tool_use_id,omitempty"` // type=tool_result 时关联的 tool_use ID
    Content   string `json:"content,omitempty"`     // tool_result 的内容
}

一条 assistant 消息可能同时包含 text block 和 tool_use block。一个关键设计是 aiContentBlock 同时承载请求和响应两种角色——作为 assistant 的 tool_use 输出，也作为 user 的 tool_result 输入。通过不同字段的组合来区分。

消息历史

type AIClient struct {
    apiKey     string
    isOAuth    bool
    httpClient *http.Client
    messages   []aiMessage  // 持久化的对话历史
    skills     []Skill      // 可用技能索引
}

messages 在整个 terminal session 内持久存在。用户每次提问是一个新的 agent loop，但所有历史对话都会发送给 API，让 AI 保持上下文连贯。

Agent Loop

RunAgentLoop 是核心循环，逻辑如下：

1. 构造 user message（终端上下文 + 用户问题）
2. 压缩旧消息（tool_result > 200 chars → "[output cleared]"）
3. 循环（最多 10 轮）：
   a. 发送请求到 Claude API（SSE stream）
   b. 实时输出 text delta 到 panel
   c. 保存 assistant response 到 messages
   d. 如果 stop_reason != "tool_use"，结束
   e. 遍历所有 tool_use block，按 name 分发执行
   f. 收集 tool_result，追加到 messages
   g. 继续下一轮

取消与回滚

用户随时可以 Ctrl+C 取消 AI 查询。为了保证对话历史干净，agent loop 进入时记录 savedLen，取消时回滚到该点：

savedLen := len(ai.messages)
rollback := func() {
    ai.messages = ai.messages[:savedLen]
}

这样被取消的查询不会污染后续对话。

消息压缩

为了控制 token 消耗，每次新查询前会压缩旧消息：

tool_result 超过 200 字符 → 替换为 [output cleared]
带有终端上下文的 user message → 终端部分替换为 [terminal context cleared]

只压缩 savedLen 之前的消息，当前轮次的保持完整。

SSE 流式解析

Claude API 返回 Server-Sent Events 流。解析器是一个状态机，跟踪当前正在构建的 content block：

@startuml
!theme plain
skinparam backgroundColor #FEFEFE

[*] --> Scanning

Scanningz : 读取 SSE 行

Scanning --> BlockStart : content_block_start
BlockStart --> Accumulating : 创建 currentBlock
Accumulating --> Accumulating : content_block_delta\n(追加 text/partial_json)
Accumulating --> BlockDone : content_block_stop
BlockDone --> Scanning : 保存 block，\ncurrentBlock = nil

Scanning --> [*] : message_delta(stop_reason)\n或 [DONE]

@enduml

tool_use 的 Input 处理

流式 tool_use 的 input 参数是分片到达的 JSON（input_json_delta）。处理方式：

content_block_start：将 Input 设为空字符串 ""
content_block_delta：拼接 partial_json 到字符串
content_block_stop：json.Unmarshal 字符串为 map[string]any

// 流式阶段：Input 是 string，逐步拼接
currentBlock.Input = s + delta.PartialJSON

// 结束时：反序列化为结构化数据
var parsed any
if json.Unmarshal([]byte(s), &parsed) == nil {
    currentBlock.Input = parsed
}

这是一个巧妙的类型双关——Input 字段是 any 类型，流式阶段是 string，完成后变成 map[string]any。

Tool 调度

三个 Tool

Tool	执行环境	用途
`execute_command`	远程终端 (WebSocket)	在 JumpServer 管理的服务器上执行诊断命令
`local_command`	本机 (os/exec)	在开发机上执行脚本，如 Loki/Prometheus 查询
`load_skill`	内存读取	加载 skill 的完整说明文档

调度逻辑

Agent loop 中遍历 response 的 content blocks，按 block.Name switch 分发：

switch block.Name {
case "execute_command":
    // execCmd() → WebSocket → JumpServer → 远程 shell
case "local_command":
    // runLocalCommand() → os/exec → 本地 bash
case "load_skill":
    // loadSkillBody() → 返回 SKILL.md 内容
}

每个 tool 的结果都封装为 tool_result block，通过 ToolUseID 与对应的 tool_use 关联。

远程命令执行（execute_command）

远程执行的难点在于：WebSocket 终端没有结构化的命令输出边界。解决方案是 sentinel 模式：

const sentinel = "___JMS_SENTINEL___"
fullCmd := fmt.Sprintf("%s ; %s\r", cmd, sentinelCmd)

发送 命令 ; printf sentinel 到远程终端，然后监听 WebSocket 数据流直到捕获 sentinel 字符串，截取中间部分作为命令输出。为了避免 sentinel 本身在 echo 中被提前检测到，用 printf 八进制编码。

本地命令执行（local_command）

本地执行简单得多，直接 exec.CommandContext：

func runLocalCommand(ctx context.Context, cmd string) (string, error) {
    if isDangerous(cmd) {
        return "", fmt.Errorf("blocked: dangerous command")
    }
    c := exec.CommandContext(ctx, "bash", "-c", cmd)
    output, err := c.CombinedOutput()
    // 截断 > 10000 chars
    return string(output), nil
}

共用 isDangerous 黑名单做安全过滤，输出超过 10000 字符自动截断。

Skill 系统

两阶段加载

Skill 采用索引 + 按需加载的两阶段设计，而不是把所有 skill 内容塞进 system prompt：

启动时：scanSkills() 扫描 ~/.claude/skills/*/SKILL.md，只提取 name + description（从 YAML frontmatter）
运行时：AI 根据索引判断需要哪个 skill，调用 load_skill(name) 获取完整文档

这样 system prompt 只包含几行索引，不浪费 token。AI 需要时才加载完整指令。

System Prompt 动态构建

func (ai *AIClient) buildSystemPrompt() string {
    if len(ai.skills) == 0 {
        return systemPrompt  // 没有 skill 时用基础 prompt
    }
    // 追加 LOCAL WORKSPACE 段 + skill 索引
}

当有 skill 时，system prompt 末尾追加：

LOCAL WORKSPACE:
- Use local_command to run commands on the user's local machine
- Use load_skill to get detailed instructions for available skills
- Use execute_command for remote server commands (via JumpServer)

Available skills:
- loki: Grafana Loki 日志查询
- prometheus: Grafana/Prometheus 指标分析
...

同样，buildTools() 只在有 skill 时才注册 local_command 和 load_skill 两个 tool。没有 skill 就只有 execute_command，保持最简。

典型场景

场景：远程磁盘满 → 本地 Loki 查关联日志

用户: "这台机器磁盘快满了，查一下什么在占空间"

AI 思考: 先用 execute_command 查远程
  ⚡ df -h
  ⚡ du -sh /var/log/* | sort -rh | head -20

AI 发现: /var/log/milvus 占了 80GB
AI 思考: 需要查 Loki 看这个集群的日志量趋势
  📖 skill: loki        → 加载 loki skill 完整指令
  🖥 local: python3 ~/.claude/skills/loki/scripts/loki_debug.py --url "..." -l pod=milvus-0

AI 综合: "磁盘满是因为 milvus 日志 rotation 配置不当，
         Loki 显示最近 3 天日志量翻了 5 倍，建议..."

远程执行发现问题 → 本地 skill 查关联数据 → 综合两端信息给出结论。这是单纯远程执行做不到的。

总结

设计点	选择	原因
Agent Loop	最多 10 轮迭代	防止无限循环，5 分钟总超时兜底
流式输出	SSE + 状态机解析	实时显示 AI 回答，用户体验好
tool_use Input	字符串拼接 → 最终反序列化	适配流式 partial JSON
消息历史	session 内持久 + 压缩旧消息	保持上下文连贯，控制 token
取消机制	context + rollback	取消不污染历史
远程执行	sentinel 模式	WebSocket 终端没有结构化输出边界
Skill 加载	索引 + 按需加载	system prompt 不膨胀
安全	isDangerous 黑名单	简单有效，远程和本地共用

Table of Contents

JMS AI Panel：嵌入终端的 Agent Loop 设计

背景