feat: add paper/ LaTeX draft, English data scripts, update progress docs

- paper/: 22-page LaTeX framework (7/10 sections complete, compiles cleanly) main.tex + 10 section files + refs.bib + compiled PDF (329KB) - code/scripts/: three English dataset generation & merging scripts generate_english.py / generate_english_targeted.py / merge_v5.py - CLAUDE.md: update paper writing status, add paper/ file map entry - state.md: add section 8 paper writing progress (2026-05-15) - .gitignore: add LaTeX build artifact exclusion rules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:19:39 +08:00
parent b50cf395ab
commit 804ebd2f77
19 changed files with 3047 additions and 3 deletions
--- a/paper/sections/00_abstract.tex
+++ b/paper/sections/00_abstract.tex
@@ -0,0 +1,20 @@
+% 摘要（中文）
+情感陪伴类AI平台（如星野、Character.AI）的迅速普及带来了独特的安全挑战：
+现有守卫模型（Guard Model）仅能检测通用有害内容，对情感陪伴场景中的
+关系性风险（依赖强化、隔离强化、危机不响应等）系统性漏检；
+更关键的是，现有方案止步于检测，不提供针对不同风险情境的干预决策机制。
+本文提出\textbf{CompanionGuard-RL}——首个将伴侣AI安全建模为
+"检测+自适应干预"统一流水线的框架。
+该框架包含两个串联模块：
+（1）Module B，一个基于MacBERT-Large与跨注意力机制的上下文感知风险检测器，
+在自建评测集CompanionRisk-Bench（9,896条样本，涵盖10类一级风险与14个细粒度标签）上
+实现binary F1 = 0.9995、漏检率FNR = 0.0\%；
+（2）Module C，一个基于行为克隆预热与PPO强化学习的自适应干预策略，
+在安全召回率（safety\_recall = 1.0）和安全-体验综合得分（UX F-score = 0.998）上
+显著优于规则基线（0.908/0.952）。
+消融实验证明跨注意力上下文融合和RL策略优化的必要性。
+CompanionRisk-Bench数据集和框架代码将公开发布，
+以推动情感陪伴AI安全领域的研究。
+
+\vspace{0.5em}
+\noindent\textbf{关键词：} 情感陪伴AI；安全检测；强化学习；风险干预；内容安全