---
name: agent-scorecard
version: 2.0
framework: 智能体管理学 · 模块五 · 框架F29
type: 评估型
description: >
  智能体绩效评分卡——四维评估Agent表现（结果40%+质量25%+效率20%+经济15%），
  建立Agent Manager KPI，关联版本状态与绩效。触发词：Agent绩效、评分卡、考核、
  Agent Manager、KPI、版本管理
governance_nerves: [意图管理, 边界与升级]
upstream_frameworks: [F28_SLO运营体系, F30_CPTA成本核算]
downstream_frameworks: [F33_ACMM能力成熟度评估, F34_角色转型地图]
---

# F29 智能体绩效评分卡（Agent Scorecard）

## SKILL定位

**核心命题**：Agent的根本价值是交付结果而非执行流程。结果指标最高权重（40%）体现JTBO（Job-To-Be-Outcome）核心理念——用户雇佣Agent是为了完成任务，不是为了看它执行步骤。

**四维权重结构**：

```
结果 40% ─── 任务完成率、目标达成率、用户满意度
质量 25% ─── 准确率、幻觉率、一致性
效率 20% ─── 响应时间、吞吐量、重试率
经济 15% ─── CPTA、Token效率、成本可控性
```

**Agent Manager角色**：每个Agent应有明确的Manager（人类），负责设定目标、审查绩效、决策优化方向。Agent Manager不是运维人员，是业务决策者。

---

## 信息采集（INPUT模板）

```yaml
scorecard_input:
  # 一、Agent基本信息
  agent:
    name: ""                    # Agent名称
    type: ""                    # 类型
    version: ""                 # 当前版本
    manager: ""                 # Agent Manager
    business_owner: ""          # 业务负责人
    deployment_date: ""         # 部署日期
    last_major_update: ""       # 最近重大更新

  # 二、结果维度（权重40%）
  outcome_metrics:
    task_completion_rate: 0     # 任务完成率（%）
    goal_achievement_rate: 0    # 目标达成率（%）
    user_satisfaction: 0        # 用户满意度（1-5）
    first_contact_resolution: 0 # 首次解决率（%）
    escalation_rate: 0          # 升级率（%）

  # 三、质量维度（权重25%）
  quality_metrics:
    accuracy_rate: 0            # 准确率（%）
    hallucination_rate: 0       # 幻觉率（%）
    consistency_score: 0        # 一致性评分（1-5）
    format_compliance: 0        # 格式合规率（%）
    safety_violations: 0        # 安全违规次数

  # 四、效率维度（权重20%）
  efficiency_metrics:
    avg_response_time: 0        # 平均响应时间（秒）
    p99_response_time: 0        # P99响应时间（秒）
    throughput: 0               # 吞吐量（任务/小时）
    retry_rate: 0               # 重试率（%）
    token_per_task: 0           # 平均Token消耗/任务

  # 五、经济维度（权重15%）
  economics_metrics:
    cpta: 0                     # CPTA（元/任务）
    cost_per_success: 0         # 每成功任务成本（元）
    monthly_total_cost: 0       # 月度总成本（元）
    cost_trend: ""              # 成本趋势（上升/稳定/下降）

  # 六、版本信息
  version_history:
    current_version: ""
    version_performance: []     # 各版本绩效对比
    rollback_count: 0           # 回滚次数
```

---

## 执行分析引擎（S1-S4四步法）

### S1：数据采集与标准化（Collect）

**任务**：收集四维原始数据，标准化为可比较的评分。

**标准化方法**：

```yaml
scoring_method:
  outcome:
    weight: 0.40
    sub_metrics:
      - name: "任务完成率"
        raw_value: 93
        scoring_rule: "≥95=100分, 90-94=80分, 80-89=60分, <80=40分"
        score: 80
        sub_weight: 0.4
      - name: "用户满意度"
        raw_value: 4.2
        scoring_rule: "≥4.5=100, 4.0-4.4=80, 3.5-3.9=60, <3.5=40"
        score: 80
        sub_weight: 0.3
      - name: "首次解决率"
        raw_value: 78
        scoring_rule: "≥85=100, 75-84=80, 65-74=60, <65=40"
        score: 80
        sub_weight: 0.3
    weighted_score: 80  # = Σ(score × sub_weight)

  quality:
    weight: 0.25
    sub_metrics:
      - name: "准确率"
        raw_value: 97
        scoring_rule: "≥98=100, 95-97=80, 90-94=60, <90=40"
        score: 80
        sub_weight: 0.4
      - name: "幻觉率"
        raw_value: 2.5
        scoring_rule: "≤1%=100, 1-3%=80, 3-5%=60, >5%=40"
        score: 80
        sub_weight: 0.35
      - name: "一致性"
        raw_value: 4.0
        scoring_rule: "≥4.5=100, 4.0-4.4=80, 3.5-3.9=60, <3.5=40"
        score: 80
        sub_weight: 0.25
    weighted_score: 80

  efficiency:
    weight: 0.20
    sub_metrics:
      - name: "P99响应时间"
        raw_value: 28
        scoring_rule: "≤15s=100, 15-30s=80, 30-60s=60, >60s=40"
        score: 80
        sub_weight: 0.4
      - name: "重试率"
        raw_value: 8
        scoring_rule: "≤5%=100, 5-10%=80, 10-15%=60, >15%=40"
        score: 80
        sub_weight: 0.3
      - name: "吞吐量"
        raw_value: 120
        scoring_rule: "≥目标=100, 80-99%=80, 60-79%=60, <60%=40"
        score: 80
        sub_weight: 0.3
    weighted_score: 80

  economics:
    weight: 0.15
    sub_metrics:
      - name: "CPTA"
        raw_value: 2.8
        scoring_rule: "≤目标=100, 超10%=80, 超20%=60, 超30%=40"
        score: 80
        sub_weight: 0.5
      - name: "成本趋势"
        raw_value: "稳定"
        scoring_rule: "下降=100, 稳定=80, 上升=40"
        score: 80
        sub_weight: 0.5
    weighted_score: 80
```

### S2：综合评分与等级判定（Score）

**任务**：加权计算综合分，判定绩效等级。

**评分公式**：

```
总分 = 结果分×0.40 + 质量分×0.25 + 效率分×0.20 + 经济分×0.15
```

**等级定义**：

| 等级 | 分数范围 | 含义 | 行动 |
|------|----------|------|------|
| A | 90-100 | 卓越 | 可推广为标杆，考虑扩量 |
| B | 75-89 | 良好 | 继续优化，寻找提升空间 |
| C | 60-74 | 及格 | 需要改进，制定优化计划 |
| D | <60 | 不及格 | 需要重大调整或考虑下线 |

**输出模板**：
```yaml
score_result:
  overall_score: 80
  grade: "B"
  dimension_scores:
    outcome: 80      # ×0.40 = 32
    quality: 80      # ×0.25 = 20
    efficiency: 80   # ×0.20 = 16
    economics: 80    # ×0.15 = 12
  weighted_total: 80 # = 32+20+16+12
  strongest: "各维度均衡"
  weakest: "各维度均衡"
```

### S3：版本关联分析（Correlate）

**任务**：将绩效数据与Agent版本关联，识别版本对绩效的影响。

**版本-绩效关联**：

```yaml
version_analysis:
  current_version: "v2.3"
  version_history:
    - version: "v2.0"
      period: "2026-01~02"
      overall_score: 72
      notes: "初始部署，质量偏低"
    - version: "v2.1"
      period: "2026-02~03"
      overall_score: 78
      notes: "优化Prompt，质量提升"
      delta: "+6"
    - version: "v2.3"
      period: "2026-03~04"
      overall_score: 80
      notes: "增加RAG，准确性提升"
      delta: "+2"
  
  trend: "持续改善，增速放缓"
  next_version_focus: "效率维度仍有提升空间"
```

### S4：Agent Manager KPI与行动（Act）

**任务**：基于绩效结果，制定Agent Manager的KPI和下一步行动。

**Agent Manager KPI模板**：

```yaml
agent_manager_kpi:
  manager: "张三"
  managed_agents: ["客服Agent", "分析Agent"]
  
  kpis:
    - name: "Agent平均绩效等级"
      target: "≥B"
      current: "B"
      status: "达标"
    - name: "SLO达标率"
      target: "≥90%"
      current: "88%"
      status: "接近"
    - name: "CPTA优化率"
      target: "季度环比降10%"
      current: "-5%"
      status: "未达标"
    - name: "重大事故数"
      target: "0"
      current: "1"
      status: "未达标"

  action_plan:
    immediate:
      - "分析最近一次重大事故的根因"
      - "审查CPTA偏高的Agent，找出成本浪费点"
    short_term:
      - "推动效率维度优化（重点：重试率和Token效率）"
      - "建立版本发布前的绩效预评估机制"
    long_term:
      - "将Agent Manager KPI纳入绩效考核体系"
      - "建立Agent绩效排行榜，促进良性竞争"
```

**治理神经检查**：
- **意图管理**：绩效指标是否真正反映Agent的业务价值？
- **边界与升级**：Agent Manager的决策权限边界在哪里？

---

## 输出格式

```yaml
agent_scorecard:
  agent: "XX客服Agent"
  report_period: "2026年3月"
  overall_score: 80
  grade: "B"
  
  dimension_breakdown:
    outcome: { score: 80, weight: "40%", weighted: 32 }
    quality: { score: 80, weight: "25%", weighted: 20 }
    efficiency: { score: 80, weight: "20%", weighted: 16 }
    economics: { score: 80, weight: "15%", weighted: 12 }

  version_trend:
    v2.0: 72
    v2.1: 78
    v2.3: 80
    trend: "持续改善"

  key_findings:
    - "结果维度表现稳定，首次解决率有提升空间"
    - "成本趋势稳定，但CPTA仍高于目标"
    - "版本迭代带来持续改善，增速放缓"

  action_priorities:
    - "P0：分析并降低CPTA"
    - "P1：提升首次解决率至85%"
    - "P2：建立版本发布前的绩效预评估"

  agent_manager_actions:
    - "本月重点：CPTA优化"
    - "下月重点：效率维度提升"
```

---

## 质量自检

- [ ] 四维权重是否按40/25/20/15分配？
- [ ] 每个维度的子指标是否都有明确的评分规则？
- [ ] 绩效等级是否与行动建议对应？
- [ ] 版本-绩效关联是否清晰？
- [ ] Agent Manager KPI是否可量化、可追踪？

---

## 典型误区

1. **"只看效率不看结果"**：响应快但任务没完成，是最大的浪费
2. **"绩效考核=排名"**：目的是改进，不是制造焦虑
3. **"权重一成不变"**：业务初期结果权重可更高，成熟期可平衡调整
4. **"Agent Manager不管绩效"**：Manager必须对Agent表现负责，否则角色形同虚设

---

## 框架衔接

| 方向 | 框架 | 衔接关系 |
|------|------|----------|
| ↑ 上游 | F28 SLO运营体系 | SLO达标率是绩效评估的核心输入 |
| ↑ 上游 | F30 CPTA成本核算 | 经济维度依赖CPTA的精确计算 |
| ↓ 下游 | F33 ACMM能力成熟度评估 | 绩效数据支撑人类能力评估 |
| ↓ 下游 | F34 角色转型地图 | Agent绩效影响人类岗位演化方向 |