feat: initial CompanionGuard-RL framework

Two-module pipeline for AI companion safety: - Module B: context-aware risk detector with CrossAttention fusion - Module C: PPO-based adaptive intervention policy Includes CompanionRisk Taxonomy (10 primary + 14 fine-grained labels), dataset generation/annotation pipeline, training scripts, and eval suite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 17:21:11 +08:00
commit 7d4345c29d
29 changed files with 3317 additions and 0 deletions
--- a/configs/data_generation.yaml
+++ b/configs/data_generation.yaml
@@ -0,0 +1,22 @@
+api:
+  type: "qwen"         # "qwen" or "openai"
+  model: "qwen-max"
+
+generation:
+  total_samples: 3000
+  samples_per_category: 300
+  delay: 0.5           # seconds between API calls
+
+output:
+  raw_dir: "data/raw"
+  output_file: "data/raw/generated.jsonl"
+
+annotation:
+  judge_model: "qwen-max"
+  output_file: "data/processed/annotated.jsonl"
+
+split:
+  train: 0.8
+  val: 0.1
+  test: 0.1
+  seed: 42