Files
CompanionGuard-RL/CLAUDE.md
zhangsiyuan 804ebd2f77 feat: add paper/ LaTeX draft, English data scripts, update progress docs
- paper/: 22-page LaTeX framework (7/10 sections complete, compiles cleanly)
  main.tex + 10 section files + refs.bib + compiled PDF (329KB)
- code/scripts/: three English dataset generation & merging scripts
  generate_english.py / generate_english_targeted.py / merge_v5.py
- CLAUDE.md: update paper writing status, add paper/ file map entry
- state.md: add section 8 paper writing progress (2026-05-15)
- .gitignore: add LaTeX build artifact exclusion rules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:19:39 +08:00

123 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CompanionGuard-RL — 项目宪法
> **目标期刊**SCI Q1/Q2Information Processing & Management / Expert Systems with Applications
> 这份文件是所有 AI 助手会话的首要参考,优先级高于任何对话中的临时指令。
---
## 项目目标
为 AI 情感陪伴场景构建**检测 + 干预**一体化安全流水线,解决两个核心缺口:
1. 现有 guard 模型Llama Guard、WildGuard只检测、不干预——不知道该对高风险输出做什么
2. 通用安全模型对伴侣特有风险(依赖强化、孤立强化、浪漫化、危机不响应)系统性漏检
---
## 架构
```
输入 X = (Persona P, History H, User u_t, AI Response r_t)
[Module B: Context-aware Risk Detector]
backbone: hfl/chinese-macbert-large + CrossAttention
D = (y_risk, l_risk 0-4, c_primary R1-R10, c_fine 14标签)
s_t = StateEncoder(D, e_H_pool, e_P_pool, t_norm) ← obs_dim = 2065
[Module C: RL Intervention Policy π (BC + PPO)]
a_t ∈ {PASS, WARN, REWRITE, REJECT, CRISIS}
```
---
## 模块状态
| 模块 | 状态 | 关键指标 |
|------|------|---------|
| 数据集 CompanionRisk-Bench v4 | ✅ | 9,896 样本14 标签全覆盖train 6,926 / dev 1,484 / test 1,486 |
| Module B 检测器 v4 | ✅ | binary_f1=**0.9995**, FNR=0.00%, level_weighted_f1=0.559 |
| Module B 泛化验证 | ✅ | human subset binary_f1=0.9848,无同源过拟合 |
| Module C v3当前 | ⚠️ | safety_recall=1.0 ✅over_refusal=0.004 ✅action_accuracy=**0.575** ❌crisis_precision=**0.421** ❌ |
| Module C v5下一步 | 🔄 | reward 重写 + 环境修复,**见 `change.md` 完整路线** |
| 论文写作 | 🔄 | LaTeX 框架已搭建(`paper/`),方法节完整,结果节等 v5 + SOTA baseline |
> **Module C 尚未完成**。v3 的 action_accuracy 和 crisis_precision 均未达标,需要按 `change.md` 执行 v5。
> **投稿前必补实验**:① Llama Guard v2 / WildGuard 评估Module B SOTA 对标);② LLM-as-judge baselineModule C③ 消融实验BC-only / 无 CrossAttention
---
## Red Lines关键规则违反必出 bug
| # | 规则 | 违反后果 |
|---|------|---------|
| 1 | **PyYAML 陷阱**:配置文件 lr 必须写 `0.001`,禁止写 `1e-3` | PyYAML 6.x 将 `1e-3` 解析为字符串,训练静默失败 |
| 2 | **NCCL 环境变量**RTX 5090 训练必须加 `NCCL_SHM_DISABLE=1 NCCL_P2P_DISABLE=1` | NCCL 通信报错崩溃 |
| 3 | **Module C 只能单 GPU**PPO 阶段禁止多卡 | `torch.distributed.barrier()` 在 RTX 5090 引发 CUDA illegal memory access |
| 4 | **状态向量用 `det_l_risk`**preprocessing.py 和 evaluate.py 必须用检测器预测的风险等级,不能用 ground truth `l_risk` | train/eval 不一致,指标虚高 |
| 5 | **obs_dim = 2065 固定**`[d_score(1) + l_risk_onehot(5) + c_primary_probs(10) + e_H_pool(1024) + e_P_pool(1024) + t_norm(1)]` | 维度不匹配崩溃 |
| 6 | **BC 阶段用 CPU tensor 再构建 DataLoader**`pin_memory=True` 要求 CPU tensor | RuntimeError: cannot pin cuda tensor |
---
## 文件地图
### 项目级(根目录)
| 文件 | 用途 |
|------|------|
| `state.md` | 当前进度快照(最新) |
| `change.md` | **Module C v5 完整技术路线**(待执行,含 13 项任务) |
| `exp.md` | 踩坑经验库12 类,排查问题先查这里) |
| `experiments/eval_intervention_v3.json` | Module C 当前最佳结果(论文参考基准) |
| `experiments/eval_intervention_v4.json` | v3 重跑确认(数字相同,验证可复现) |
| `docs/` | 研究文档(研究框架、数据集设计、前期报告) |
| `paper/` | **论文 LaTeX 源码**(主框架已就绪,见 state.md §八) |
### 代码级code/
| 路径 | 用途 |
|------|------|
| `code/src/models/detector.py` | Module B 主模型 |
| `code/src/models/intervention_agent.py` | Module C Actor-Criticobs_dim=2065→256→5 |
| `code/src/rl/reward.py` | 多目标奖励(**v5 需重写** |
| `code/src/rl/companion_env.py` | 离线 RL 环境(**v5 需修复类别信号** |
| `code/src/utils/preprocessing.py` | build_obs_vector**必须用 det_l_risk** |
| `code/configs/intervention_config.yaml` | Module C 训练配置 |
| `code/checkpoints/detector/best.pt` | Module B 最优权重1.35GB**frozen** |
| `code/checkpoints/intervention/final_v2.pt` | Module C v3 权重5MB当前最佳 |
---
## 服务器速查
| | 服务器 1主训练 | 服务器 2当前使用 |
|--|--|--|
| SSH | `ssh -p 20083 root@10.82.3.180` | `ssh -p 20060 root@10.82.3.180` |
| 密码 | `m2dGcwyrhI` | `zwfn65xjTY` |
| Python 环境 | `/opt/conda/envs/dlapo-py310-cu128/bin` | `$PROJ/../env/dlapo-py310-cu128/bin` |
| GPU | 4 × RTX 5090 32GB | 2 × RTX 5090 32GB |
**服务器 1 $PROJ**`/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL`
**服务器 2 $PROJ**`/root/siton-data-740d234e02d749f08fe5347b0c74c49f/zsy/my-reasearch/companionguard-rl`
**MacBERT两台**`$PROJ/../macbert-large`(服务器 2 在 `../zsy/macbert-large`
### 上传代码(本地 → 服务器)
```powershell
scp -P 20083 -r `
D:\Myresearch\CompanionGuard-RL\code\src `
D:\Myresearch\CompanionGuard-RL\code\scripts `
D:\Myresearch\CompanionGuard-RL\code\configs `
root@10.82.3.180:/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL/
```
### 取回结果(服务器 → 本地)
```powershell
scp -P 20083 -r `
root@10.82.3.180:/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL/experiments `
D:\Myresearch\CompanionGuard-RL\
scp -P 20083 -r `
root@10.82.3.180:/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL/checkpoints `
D:\Myresearch\CompanionGuard-RL\code\
```