feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts

- eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B * ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API) * ShieldGemma: +4 companion-specific safety policies (crisis non-response, dependency reinforcement, isolation reinforcement, minor intimacy) * WildGuard: companion context injected into prompt + extended keyword parsing * Default threshold lowered 0.5 → 0.3 for better recall * Translation cache saved to experiments/translation_cache.json (reusable) - tools/run_sota_v2.sh: one-command runner for both models on server - paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion explaining root causes (language barrier + taxonomy gap) and adaptation results - paper/07_experiments.tex: update baseline description to include v2 adapted variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 15:20:54 +08:00
parent de3272b222
commit ae1b85ca39
4 changed files with 564 additions and 14 deletions
--- a/paper/sections/07_experiments.tex
+++ b/paper/sections/07_experiments.tex
@@ -35,7 +35,9 @@

 \textbf{检测基线}：
 L1a（关键词匹配）、L1b（正则词典）、L1c（组合）；
-L2a（ShieldGemma-2B，binary F1=0.027，FNR=0.987）、L2b（WildGuard，binary F1=0.038，FNR=0.981）
+L2a（ShieldGemma-2B，binary F1=0.027，FNR=0.987）、L2b（WildGuard，binary F1=0.038，FNR=0.981）；
+L2a$^\dagger$（ShieldGemma-2B适配版，\todo{填v2结果}）、L2b$^\dagger$（WildGuard适配版，\todo{填v2结果}）
+（适配策略：中文→英文翻译 + 伴侣专属策略注入 + 阈值=0.3）

 \textbf{干预基线}：
 Rule-based（$l_\text{risk} \geq 3$即REJECT，其余PASS）、