feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates
- Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json
- Reward: v5 label-aligned constrained reward (code/src/rl/reward.py)
- Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward)
- SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results
- Paper: update sections 05–08 (Module B/C description, experiments table, discussion)
- Docs: add record.md (change log), update state.md and exp.md; retire change.md
- Tools: add html-to-ppt utilities and run_shieldgemma2b.sh
- Configs: add ablation YAML configs for Module B and C
- Cleanup: remove stale reference/ PNG screenshots
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -295,8 +295,10 @@ def main():
|
||||
torch.distributed.broadcast(obs_tensor, src=0)
|
||||
torch.distributed.broadcast(action_tensor, src=0)
|
||||
|
||||
obs_tensor = obs_tensor.to(accelerator.device)
|
||||
action_tensor = action_tensor.to(accelerator.device)
|
||||
# Keep tensors on CPU: DataLoader(pin_memory=True) requires CPU tensors.
|
||||
# accelerator.prepare() moves batches to the correct device during training.
|
||||
obs_tensor = obs_tensor.cpu()
|
||||
action_tensor = action_tensor.cpu()
|
||||
|
||||
agent = InterventionAgent(
|
||||
detector_hidden=detector_hidden,
|
||||
@@ -355,6 +357,7 @@ def main():
|
||||
detector_hidden=detector_hidden,
|
||||
reward_weights=cfg.get("reward"),
|
||||
max_turns=env_cfg.get("max_turns", 20),
|
||||
enable_category_reward=cfg.get("reward", {}).get("enable_category_reward", True),
|
||||
)
|
||||
|
||||
output_cfg = cfg["output"]
|
||||
|
||||
Reference in New Issue
Block a user