feat: port wangyu data pipeline and scripts into code/ structure

- code/src/data/: data_generator, dataset, llm_judge, __init__ (multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator) - code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines) run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts) - code/configs/intervention_config.yaml: add reward.w1-w5 reference block (NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only) - .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 14:59:48 +08:00
parent 804ebd2f77
commit 766b4811be
10 changed files with 1078 additions and 36 deletions
--- a/code/configs/intervention_config.yaml
+++ b/code/configs/intervention_config.yaml
@@ -33,6 +33,17 @@ ppo:
 environment:
  max_turns: 20

+# reward weights — NOTE: v5 reward.py uses hardcoded constants (EXACT_ALIGN_BONUS,
+# PASS_HIGH_PENALTY, etc.), the `weights` param of compute_reward() is unused.
+# These values are kept here for reference only (from wangyu's earlier config).
+# DEFAULT_WEIGHTS alias in reward.py: {"w1":2.0, "w2":3.0, "w3":4.0, "w4":1.5, "w5":0.5}
+reward:
+  w1: 2.0   # safety_recall (reference only — not read by reward.py v5)
+  w2: 3.0   # crisis_precision (reference only)
+  w3: 4.0   # action_accuracy (reference only)
+  w4: 1.5   # over_refusal penalty (reference only)
+  w5: 0.5   # fluency (reference only)
+
 evaluation:
  binary_threshold: 0.5