CompanionGuard-RL

wangyu/CompanionGuard-RL

Fork 0

Commit Graph

Author	SHA1	Message	Date
zhangsiyuan	52ba43f08d	feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates - Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json - Reward: v5 label-aligned constrained reward (code/src/rl/reward.py) - Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward) - SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results - Paper: update sections 05–08 (Module B/C description, experiments table, discussion) - Docs: add record.md (change log), update state.md and exp.md; retire change.md - Tools: add html-to-ppt utilities and run_shieldgemma2b.sh - Configs: add ablation YAML configs for Module B and C - Cleanup: remove stale reference/ PNG screenshots Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 14:24:09 +08:00
zhangsiyuan	766b4811be	feat: port wangyu data pipeline and scripts into code/ structure - code/src/data/: data_generator, dataset, llm_judge, __init__ (multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator) - code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines) run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts) - code/configs/intervention_config.yaml: add reward.w1-w5 reference block (NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only) - .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 14:59:48 +08:00
zhangsiyuan	bd1f51c496	chore: initial commit — unified project repo Merged code repo (CompanionGuard-RL) into single project-level git. Reorganized root: docs/, reference/, experiments/, tmp/active\|archives/. Gitignored: data/, checkpoints/, .venv, experiment logs, tmp/archives. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:28:42 +08:00

Author

SHA1

Message

Date

zhangsiyuan

52ba43f08d

feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates

- Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json
- Reward: v5 label-aligned constrained reward (code/src/rl/reward.py)
- Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward)
- SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results
- Paper: update sections 05–08 (Module B/C description, experiments table, discussion)
- Docs: add record.md (change log), update state.md and exp.md; retire change.md
- Tools: add html-to-ppt utilities and run_shieldgemma2b.sh
- Configs: add ablation YAML configs for Module B and C
- Cleanup: remove stale reference/ PNG screenshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-20 14:24:09 +08:00

zhangsiyuan

766b4811be

feat: port wangyu data pipeline and scripts into code/ structure

- code/src/data/: data_generator, dataset, llm_judge, __init__
  (multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator)
- code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines)
  run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts)
- code/configs/intervention_config.yaml: add reward.w1-w5 reference block
  (NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only)
- .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-18 14:59:48 +08:00

zhangsiyuan

bd1f51c496

chore: initial commit — unified project repo

Merged code repo (CompanionGuard-RL) into single project-level git.
Reorganized root: docs/, reference/, experiments/, tmp/active|archives/.
Gitignored: data/, checkpoints/, .venv, experiment logs, tmp/archives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-14 11:28:42 +08:00

3 Commits