feat: port wangyu data pipeline and scripts into code/ structure

- code/src/data/: data_generator, dataset, llm_judge, __init__
  (multi-turn LLM dialogue generator, JSONL loader, LLM auto-annotator)
- code/scripts/: generate_siliconflow.py (SiliconFlow async generator, 701 lines)
  run_detector.sh / run_intervention.sh / run_full_pipeline.sh (launch scripts)
- code/configs/intervention_config.yaml: add reward.w1-w5 reference block
  (NOTE: v5 reward.py uses hardcoded constants; these fields are reference-only)
- .gitignore: fix data/ pattern to /data/ to avoid matching code/src/data/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This commit is contained in:

zhangsiyuan

2026-05-18 14:59:48 +08:00

parent 804ebd2f77

commit 766b4811be

10 changed files with 1078 additions and 36 deletions

									
										2

code/scripts/run_detector.sh
									
												View File
												
				@@ -1,4 +1,4 @@

				#!/bin/bash

				﻿#!/bin/bash

				# Train Module B (Risk Detector) on 4x RTX 5090.

				#

				# Usage:

feat: port wangyu data pipeline and scripts into code/ structure

2 code/scripts/run_detector.sh Unescape Escape View File

2

code/scripts/run_detector.sh

View File