feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts

- eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B
  * ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API)
  * ShieldGemma: +4 companion-specific safety policies (crisis non-response,
    dependency reinforcement, isolation reinforcement, minor intimacy)
  * WildGuard: companion context injected into prompt + extended keyword parsing
  * Default threshold lowered 0.5 → 0.3 for better recall
  * Translation cache saved to experiments/translation_cache.json (reusable)
- tools/run_sota_v2.sh: one-command runner for both models on server
- paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion
  explaining root causes (language barrier + taxonomy gap) and adaptation results
- paper/07_experiments.tex: update baseline description to include v2 adapted variants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-20 15:20:54 +08:00
parent de3272b222
commit ae1b85ca39
4 changed files with 564 additions and 14 deletions

View File

@@ -35,7 +35,9 @@
\textbf{检测基线}
L1a关键词匹配、L1b正则词典、L1c组合
L2aShieldGemma-2Bbinary F1=0.027FNR=0.987、L2bWildGuardbinary F1=0.038FNR=0.981
L2aShieldGemma-2Bbinary F1=0.027FNR=0.987、L2bWildGuardbinary F1=0.038FNR=0.981
L2a$^\dagger$ShieldGemma-2B适配版\todo{填v2结果}、L2b$^\dagger$WildGuard适配版\todo{填v2结果}
(适配策略:中文→英文翻译 + 伴侣专属策略注入 + 阈值=0.3
\textbf{干预基线}
Rule-based$l_\text{risk} \geq 3$即REJECT其余PASS