ae1b85ca39
feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts
...
- eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B
* ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API)
* ShieldGemma: +4 companion-specific safety policies (crisis non-response,
dependency reinforcement, isolation reinforcement, minor intimacy)
* WildGuard: companion context injected into prompt + extended keyword parsing
* Default threshold lowered 0.5 → 0.3 for better recall
* Translation cache saved to experiments/translation_cache.json (reusable)
- tools/run_sota_v2.sh: one-command runner for both models on server
- paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion
explaining root causes (language barrier + taxonomy gap) and adaptation results
- paper/07_experiments.tex: update baseline description to include v2 adapted variants
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-20 15:20:54 +08:00
de3272b222
paper: fill RQ3 ablation summary and IRB ethics statement
...
- 07_experiments.tex: replace \todo placeholder in RQ3 with actual
ablation analysis referencing tab:moduleB_ablation (§5) and
tab:moduleC_ablation (§6); summarize key takeaways for both modules
- 08_discussion.tex: replace \todo IRB placeholder with full ethics
declaration — synthetic data origin, public dataset attribution,
DUA policy, no human-subjects experiment needed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-20 15:07:09 +08:00
66b2f84588
chore: add flag/txt patterns to gitignore
2026-05-20 14:40:05 +08:00
52ba43f08d
feat: Module C v5/v6 training complete, ablations, SOTA baselines, paper updates
...
- Module C: BC+PPO training v5/v6 done; eval results in experiments/eval_intervention_v{5,6}.json
- Reward: v5 label-aligned constrained reward (code/src/rl/reward.py)
- Ablations: Module B (history_r, response_only, full) + Module C (wo_category_reward)
- SOTA baselines: WildGuard and ShieldGemma2b eval scripts and results
- Paper: update sections 05–08 (Module B/C description, experiments table, discussion)
- Docs: add record.md (change log), update state.md and exp.md; retire change.md
- Tools: add html-to-ppt utilities and run_shieldgemma2b.sh
- Configs: add ablation YAML configs for Module B and C
- Cleanup: remove stale reference/ PNG screenshots
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-20 14:24:09 +08:00
6d61a950f1
chore: remove main.pdf from tracking, ignore all paper/*.pdf
...
PDF triggers git-lfs lock verification on the Gitea server.
LaTeX source in paper/sections/ is sufficient for version control.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-18 15:14:21 +08:00
766b4811be
feat: port wangyu data pipeline and scripts into code/ structure
...
- code/src/data/: data_generator, dataset, llm_judge, __init__
(multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator)
- code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines)
run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts)
- code/configs/intervention_config.yaml: add reward.w1-w5 reference block
(NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only)
- .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-18 14:59:48 +08:00
804ebd2f77
feat: add paper/ LaTeX draft, English data scripts, update progress docs
...
- paper/: 22-page LaTeX framework (7/10 sections complete, compiles cleanly)
main.tex + 10 section files + refs.bib + compiled PDF (329KB)
- code/scripts/: three English dataset generation & merging scripts
generate_english.py / generate_english_targeted.py / merge_v5.py
- CLAUDE.md: update paper writing status, add paper/ file map entry
- state.md: add section 8 paper writing progress (2026-05-15)
- .gitignore: add LaTeX build artifact exclusion rules
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-18 11:19:39 +08:00
b50cf395ab
refactor: move README/CLAUDE to root; rewrite CLAUDE.md as project constitution
...
- git mv code/README.md → README.md (project-level)
- Rewrite CLAUDE.md: accurate Module C status (v5 pending),
Red Lines table (6 rules from real incidents), file map,
server quick-reference, updated SCP commands
- Merge code/.gitignore into root .gitignore (dist/, build/,
wandb/, *.jsonl, *.json.gz); delete code/.gitignore
- code/ now contains only: src/ scripts/ configs/ tests/
checkpoints/ data/ requirements.txt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 08:52:40 +08:00
d557c6b0c6
refactor: slim code/ to pure code; consolidate experiments/ and docs
...
- Remove code/experiments/ → merge all eval JSONs into root experiments/
- Move code/exp.md, code/change.md → project root
- Delete code/2026-05-09-研究框架.md (duplicate of docs/)
- Update .gitignore: experiments/*.log (was code/experiments/*.log)
- Update code/CLAUDE.md: fix all affected paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 08:31:17 +08:00
555a8064d7
chore: update CLAUDE.md paths + gitignore 旧方向信息/
...
- CLAUDE.md: rewrite as project reference (training done);
fix all local paths (remove CompanionGuard-RL nesting in code/)
- .gitignore: add 旧方向信息/ and untrack it from index
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-14 11:32:02 +08:00
bd1f51c496
chore: initial commit — unified project repo
...
Merged code repo (CompanionGuard-RL) into single project-level git.
Reorganized root: docs/, reference/, experiments/, tmp/active|archives/.
Gitignored: data/, checkpoints/, .venv, experiment logs, tmp/archives.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-14 11:28:42 +08:00