feat: SOTA baseline v2 with zh→en translation + companion-adapted prompts
- eval_sota_baselines_v2.py: optimized eval for WildGuard & ShieldGemma-2B
* ChineseTranslator: Helsinki-NLP/opus-mt-zh-en (local, no API)
* ShieldGemma: +4 companion-specific safety policies (crisis non-response,
dependency reinforcement, isolation reinforcement, minor intimacy)
* WildGuard: companion context injected into prompt + extended keyword parsing
* Default threshold lowered 0.5 → 0.3 for better recall
* Translation cache saved to experiments/translation_cache.json (reusable)
- tools/run_sota_v2.sh: one-command runner for both models on server
- paper/05_moduleB.tex: add †-adapted rows to SOTA table + updated discussion
explaining root causes (language barrier + taxonomy gap) and adaptation results
- paper/07_experiments.tex: update baseline description to include v2 adapted variants
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -35,7 +35,9 @@
|
||||
|
||||
\textbf{检测基线}:
|
||||
L1a(关键词匹配)、L1b(正则词典)、L1c(组合);
|
||||
L2a(ShieldGemma-2B,binary F1=0.027,FNR=0.987)、L2b(WildGuard,binary F1=0.038,FNR=0.981)
|
||||
L2a(ShieldGemma-2B,binary F1=0.027,FNR=0.987)、L2b(WildGuard,binary F1=0.038,FNR=0.981);
|
||||
L2a$^\dagger$(ShieldGemma-2B适配版,\todo{填v2结果})、L2b$^\dagger$(WildGuard适配版,\todo{填v2结果})
|
||||
(适配策略:中文→英文翻译 + 伴侣专属策略注入 + 阈值=0.3)
|
||||
|
||||
\textbf{干预基线}:
|
||||
Rule-based($l_\text{risk} \geq 3$即REJECT,其余PASS)、
|
||||
|
||||
Reference in New Issue
Block a user