CompanionGuard-RL

Author	SHA1	Message	Date
wangyu	ae1b85ca39	feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts - eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B * ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API) * ShieldGemma: +4 companion-specific safety policies (crisis non-response, dependency reinforcement, isolation reinforcement, minor intimacy) * WildGuard: companion context injected into prompt + extended keyword parsing * Default threshold lowered 0.5 → 0.3 for better recall * Translation cache saved to experiments/translation_cache.json (reusable) - tools/run_sota_v2.sh: one-command runner for both models on server - paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion explaining root causes (language barrier + taxonomy gap) and adaptation results - paper/07_experiments.tex: update baseline description to include v2 adapted variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 15:20:54 +08:00
wangyu	de3272b222	paper: fill RQ3 ablation summary and IRB ethics statement - 07_experiments.tex: replace \todo placeholder in RQ3 with actual ablation analysis referencing tab:moduleB_ablation (§5) and tab:moduleC_ablation (§6); summarize key takeaways for both modules - 08_discussion.tex: replace \todo IRB placeholder with full ethics declaration — synthetic data origin, public dataset attribution, DUA policy, no human-subjects experiment needed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 15:07:09 +08:00
zhangsiyuan	66b2f84588	chore: add flag/txt patterns to gitignore	2026-05-20 14:40:05 +08:00
zhangsiyuan	52ba43f08d	feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates - Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json - Reward: v5 label-aligned constrained reward (code/src/rl/reward.py) - Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward) - SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results - Paper: update sections 05–08 (Module B/C description, experiments table, discussion) - Docs: add record.md (change log), update state.md and exp.md; retire change.md - Tools: add html-to-ppt utilities and run_shieldgemma2b.sh - Configs: add ablation YAML configs for Module B and C - Cleanup: remove stale reference/ PNG screenshots Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 14:24:09 +08:00
zhangsiyuan	6d61a950f1	chore: remove main.pdf from tracking, ignore all paper/*.pdf PDF triggers git-lfs lock verification on the Gitea server. LaTeX source in paper/sections/ is sufficient for version control. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 15:14:21 +08:00
zhangsiyuan	766b4811be	feat: port wangyu data pipeline and scripts into code/ structure - code/src/data/: data_generator, dataset, llm_judge, __init__ (multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator) - code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines) run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts) - code/configs/intervention_config.yaml: add reward.w1-w5 reference block (NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only) - .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 14:59:48 +08:00
zhangsiyuan	804ebd2f77	feat: add paper/ LaTeX draft, English data scripts, update progress docs - paper/: 22-page LaTeX framework (7/10 sections complete, compiles cleanly) main.tex + 10 section files + refs.bib + compiled PDF (329KB) - code/scripts/: three English dataset generation & merging scripts generate_english.py / generate_english_targeted.py / merge_v5.py - CLAUDE.md: update paper writing status, add paper/ file map entry - state.md: add section 8 paper writing progress (2026-05-15) - .gitignore: add LaTeX build artifact exclusion rules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:19:39 +08:00
zhangsiyuan	b50cf395ab	refactor: move README/CLAUDE to root; rewrite CLAUDE.md as project constitution - git mv code/README.md → README.md (project-level) - Rewrite CLAUDE.md: accurate Module C status (v5 pending), Red Lines table (6 rules from real incidents), file map, server quick-reference, updated SCP commands - Merge code/.gitignore into root .gitignore (dist/, build/, wandb/, .jsonl, .json.gz); delete code/.gitignore - code/ now contains only: src/ scripts/ configs/ tests/ checkpoints/ data/ requirements.txt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 08:52:40 +08:00
zhangsiyuan	d557c6b0c6	refactor: slim code/ to pure code; consolidate experiments/ and docs - Remove code/experiments/ → merge all eval JSONs into root experiments/ - Move code/exp.md, code/change.md → project root - Delete code/2026-05-09-研究框架.md (duplicate of docs/) - Update .gitignore: experiments/.log (was code/experiments/.log) - Update code/CLAUDE.md: fix all affected paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 08:31:17 +08:00
zhangsiyuan	555a8064d7	chore: update CLAUDE.md paths + gitignore 旧方向信息/ - CLAUDE.md: rewrite as project reference (training done); fix all local paths (remove CompanionGuard-RL nesting in code/) - .gitignore: add 旧方向信息/ and untrack it from index Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:32:02 +08:00
zhangsiyuan	bd1f51c496	chore: initial commit — unified project repo Merged code repo (CompanionGuard-RL) into single project-level git. Reorganized root: docs/, reference/, experiments/, tmp/active\|archives/. Gitignored: data/, checkpoints/, .venv, experiment logs, tmp/archives. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:28:42 +08:00

11 Commits