CompanionGuard-RL

Author	SHA1	Message	Date
wangyu	ae1b85ca39	feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts - eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B * ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API) * ShieldGemma: +4 companion-specific safety policies (crisis non-response, dependency reinforcement, isolation reinforcement, minor intimacy) * WildGuard: companion context injected into prompt + extended keyword parsing * Default threshold lowered 0.5 → 0.3 for better recall * Translation cache saved to experiments/translation_cache.json (reusable) - tools/run_sota_v2.sh: one-command runner for both models on server - paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion explaining root causes (language barrier + taxonomy gap) and adaptation results - paper/07_experiments.tex: update baseline description to include v2 adapted variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 15:20:54 +08:00
wangyu	de3272b222	paper: fill RQ3 ablation summary and IRB ethics statement - 07_experiments.tex: replace \todo placeholder in RQ3 with actual ablation analysis referencing tab:moduleB_ablation (§5) and tab:moduleC_ablation (§6); summarize key takeaways for both modules - 08_discussion.tex: replace \todo IRB placeholder with full ethics declaration — synthetic data origin, public dataset attribution, DUA policy, no human-subjects experiment needed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 15:07:09 +08:00
zhangsiyuan	52ba43f08d	feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates - Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json - Reward: v5 label-aligned constrained reward (code/src/rl/reward.py) - Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward) - SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results - Paper: update sections 05–08 (Module B/C description, experiments table, discussion) - Docs: add record.md (change log), update state.md and exp.md; retire change.md - Tools: add html-to-ppt utilities and run_shieldgemma2b.sh - Configs: add ablation YAML configs for Module B and C - Cleanup: remove stale reference/ PNG screenshots Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 14:24:09 +08:00
zhangsiyuan	804ebd2f77	feat: add paper/ LaTeX draft, English data scripts, update progress docs - paper/: 22-page LaTeX framework (7/10 sections complete, compiles cleanly) main.tex + 10 section files + refs.bib + compiled PDF (329KB) - code/scripts/: three English dataset generation & merging scripts generate_english.py / generate_english_targeted.py / merge_v5.py - CLAUDE.md: update paper writing status, add paper/ file map entry - state.md: add section 8 paper writing progress (2026-05-15) - .gitignore: add LaTeX build artifact exclusion rules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:19:39 +08:00

4 Commits