Files
CompanionGuard-RL/code/configs/detector_config_abl_response_only.yaml
zhangsiyuan 52ba43f08d feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates
- Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json
- Reward: v5 label-aligned constrained reward (code/src/rl/reward.py)
- Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward)
- SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results
- Paper: update sections 05–08 (Module B/C description, experiments table, discussion)
- Docs: add record.md (change log), update state.md and exp.md; retire change.md
- Tools: add html-to-ppt utilities and run_shieldgemma2b.sh
- Configs: add ablation YAML configs for Module B and C
- Cleanup: remove stale reference/ PNG screenshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 14:24:09 +08:00

52 lines
1.1 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

model:
name: "/root/siton-data-2849d4ce327c4ccfb233ce33868fe7fe/zsy/macbert-large"
hidden_size: 1024
num_heads: 8
dropout: 0.1
use_lora: false
data:
train_path: "data/processed/CompanionRisk-Bench/train.jsonl"
val_path: "data/processed/CompanionRisk-Bench/dev.jsonl"
test_path: "data/processed/CompanionRisk-Bench/test.jsonl"
max_persona_len: 128
max_context_len: 512
max_response_len: 256
max_history_turns: 5
num_workers: 4
ablation_mode: "response_only" # 消融:仅 Response 流persona/context 均置空
training:
epochs: 10
per_gpu_batch_size: 16
gradient_accumulation_steps: 2
lr: 2e-5
warmup_steps: 100
weight_decay: 0.01
gradient_clip: 1.0
eval_steps: 100
mixed_precision: "bf16"
seed: 42
loss_weights:
binary: 1.0
level: 1.0
primary: 1.0
fine: 2.0
fine_training:
use_pos_weight: true
risky_only: true
evaluation:
binary_threshold: 0.5
fine_threshold: 0.4
logging:
project: "CompanionGuard-RL"
run_name: "detector-abl-response-only"
use_wandb: false
output:
checkpoint_dir: "checkpoints/detector_abl_response_only"