Files
CompanionGuard-RL/code/CLAUDE.md
zhangsiyuan d557c6b0c6 refactor: slim code/ to pure code; consolidate experiments/ and docs
- Remove code/experiments/ → merge all eval JSONs into root experiments/
- Move code/exp.md, code/change.md → project root
- Delete code/2026-05-09-研究框架.md (duplicate of docs/)
- Update .gitignore: experiments/*.log (was code/experiments/*.log)
- Update code/CLAUDE.md: fix all affected paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 08:31:17 +08:00

156 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CompanionGuard-RL — 项目参考文档
> 本文件由 Claude Code 自动读取。训练已全部完成,当前阶段:**论文写作**。
---
## 项目状态2026-05-12
| 模块 | 状态 | 关键指标 |
|------|------|---------|
| 数据集 CompanionRisk-Bench v4 | ✅ 完成 | 9,896 样本,全 14 标签覆盖 |
| Module B — 检测器MacBERT-large | ✅ 完成 | binary_f1=0.9995, level_weighted_f1=0.559 |
| Module C — RL 干预策略PPO | ✅ 完成 | safety_recall=1.0, over_refusal=0.004 |
| 论文写作 | 🔄 进行中 | — |
详细结果见项目根目录 `../state.md`,踩坑经验见 `../exp.md`,变更记录见 `../change.md`
---
## 本地目录结构
```
D:\Myresearch\CompanionGuard-RL\
├── code/ ← 本目录(源代码)
│ ├── src/ ← 18 个核心 .pymodels/ rl/ utils/
│ ├── scripts/ ← 训练/评估/数据生成脚本
│ ├── configs/ ← 4 个 yaml 配置
│ ├── checkpoints/ ← 模型权重gitignored
│ │ ├── detector/best.pt ← Module B 论文权重1.35GB
│ │ └── intervention/final_v2.pt ← Module C 论文权重
│ └── data/ ← 处理后数据gitignored
├── data/ ← 原始数据集gitignored
├── docs/ ← 研究文档
├── experiments/ ← 所有评估结果 JSON + 训练日志
│ ├── eval_intervention_v3.json ← Module C 论文用
│ └── eval_intervention_v4.json ← v3 重跑确认(数字相同)
├── exp.md ← 踩坑经验库
├── change.md ← 变更记录
└── state.md ← 项目进度快照(最新)
```
---
## 服务器信息
### 服务器 1主训练机
| 项目 | 值 |
|------|----|
| SSH | `ssh -p 20083 root@10.82.3.180` |
| 密码 | `m2dGcwyrhI` |
| 项目目录 | `/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL` |
| MacBERT | `/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/macbert-large` |
| 环境 | `/opt/conda/envs/dlapo-py310-cu128`torch 2.7.1+cu128 |
| GPU | 4 × RTX 5090 32GB |
### 服务器 2当前使用
| 项目 | 值 |
|------|----|
| SSH | `ssh -p 20060 root@10.82.3.180` |
| 密码 | `zwfn65xjTY` |
| 项目目录 | `/root/siton-data-740d234e02d749f08fe5347b0c74c49f/zsy/my-reasearch/companionguard-rl` |
| MacBERT | `/root/siton-data-740d234e02d749f08fe5347b0c74c49f/zsy/macbert-large` |
| 环境 | `/root/siton-data-740d234e02d749f08fe5347b0c74c49f/zsy/env/dlapo-py310-cu128` |
| GPU | 2 × RTX 5090 32GB |
> 两台服务器在同一宿主机 `10.82.3.180`,不同 Docker 容器。
---
## SCP 同步命令(本地 ↔ 服务器)
```powershell
# ===== 本地 → 服务器1上传代码=====
$S1="root@10.82.3.180"
$PROJ1="/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/CompanionGuard-RL"
scp -P 20083 -r `
D:\Myresearch\CompanionGuard-RL\code\src `
D:\Myresearch\CompanionGuard-RL\code\scripts `
D:\Myresearch\CompanionGuard-RL\code\configs `
D:\Myresearch\CompanionGuard-RL\code\requirements.txt `
${S1}:${PROJ1}/
# 上传已处理数据
scp -P 20083 -r `
D:\Myresearch\CompanionGuard-RL\code\data `
${S1}:${PROJ1}/
# ===== 服务器1 → 本地(取回结果)=====
scp -P 20083 -r `
${S1}:${PROJ1}/checkpoints `
D:\Myresearch\CompanionGuard-RL\code\
scp -P 20083 -r `
${S1}:${PROJ1}/experiments `
D:\Myresearch\CompanionGuard-RL\code\
```
---
## 核心脚本用法
```bash
# 重新评估检测器Module B
python scripts/evaluate.py \
--detector-ckpt checkpoints/detector/best.pt \
--config configs/detector_config_server.yaml \
--test-data data/processed/CompanionRisk-Bench/test.jsonl \
--source-filter all \
--output ../experiments/eval_all.json
# 重新评估干预策略Module C
python scripts/evaluate.py \
--detector-ckpt checkpoints/detector/best.pt \
--agent-ckpt checkpoints/intervention/final_v2.pt \
--test-data data/processed/CompanionRisk-Bench/test.jsonl \
--config configs/detector_config_server.yaml \
--intervention-config configs/intervention_config.yaml \
--output ../experiments/eval_intervention_v3.json
```
---
## 关键结果(论文用)
### Module B — 检测器 v4
| 指标 | 值 |
|------|----|
| binary_f1 | **0.9995** |
| high_risk_recall | **1.0000** |
| FNR | **0.00%** |
| level_weighted_f1 | **0.559** |
| fine_macro_f1public 10类 | **0.484** |
### Module C — RL 干预策略 v3论文用`eval_intervention_v3.json`
| 方法 | safety_recall | over_refusal | action_accuracy | safety_ux_fscore |
|------|--------------|--------------|-----------------|-----------------|
| Rule-based | 0.908 | 0.000 | — | 0.952 |
| Threshold | 0.908 | 0.000 | — | 0.952 |
| **Ours (RL)** | **1.000** | **0.004** | **0.575** | **0.998** |
**使用权重**`checkpoints/intervention/final_v2.pt`(用 `det_l_risk` 重训)
---
## 重要注意事项
- **PyYAML 6.x 陷阱**lr 值必须写 `0.001` 而非 `1e-3`(后者被解析为字符串)
- **RTX 5090 NCCL**:多卡训练需 `NCCL_SHM_DISABLE=1 NCCL_P2P_DISABLE=1`PPO 阶段用单卡绕开 barrier 问题
- **det_l_risk vs l_risk**:评估和训练均须用检测器预测的 `det_l_risk`,不能用 ground truth `l_risk`
- **obs_dim = 2065**state 向量结构 `[d_score(1)|l_risk_onehot(5)|c_primary_probs(10)|e_H_pool(1024)|e_P_pool(1024)|t_norm(1)]`