CompanionGuard-RL

wangyu/CompanionGuard-RL

Fork 0

Commit Graph

Author	SHA1	Message	Date
wangyu	b4be3983b7	feat: multi-GPU support for 4x RTX 5090 (PCIe DDP, BF16) Hardware analysis: 4x RTX 5090 32GB without NVLink is fully sufficient. PCIe 5.0 all-reduce overhead <1% of step time for MacBERT-large (340M params). BF16 mixed precision gives ~2x throughput vs FP32 on 5090. Module B (Detector) — full 4-GPU DDP via Accelerate: - DistributedSampler with per-epoch shuffling (correct DDP data split) - BF16 autocast via accelerator.mixed_precision - Gradient accumulation handled by accelerator.accumulate() - Only rank-0 saves checkpoints and logs to wandb - accelerator.gather_for_metrics() for correct multi-GPU validation - per_gpu_batch_size=32, effective_batch = 32×4 = 128 Module C (Intervention) — hybrid parallel strategy: - Stage 1 (BC warm-up): all 4 GPUs via Accelerate DDP TensorDataset broadcast from rank-0 to all processes - Stage 2 (PPO): GPU-0 only — env-agent loop is inherently sequential - Detector preprocessing: distributed across all 4 GPUs via shard split + all_gather_object to collect results on rank-0 Configs updated: detector_config.yaml: per_gpu_batch_size=32, gradient_accumulation_steps=1, mixed_precision=bf16, num_workers=4 intervention_config.yaml: BC per_gpu_batch_size=256, PPO batch_size=256 Launch scripts added: scripts/run_detector.sh — single command: 4-GPU detector training scripts/run_intervention.sh — single command: hybrid BC+PPO training scripts/run_full_pipeline.sh — end-to-end pipeline steps 1-5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:56:13 +08:00
wangyu	4a0e71fb23	refactor: complete full implementation replacing all placeholder/mock content Detection module (Module B): - detector.py: expose separate e_P_pool and e_H_pool for RL state; fix compute_loss to skip primary head when c_primary="None" - dataset.py: handle c_primary="None" safely; add validate_and_normalize Data pipeline: - data_generator.py: 30+ category-specific personas (3+ per R1-R10 + 5 safe); systematic category→fine-label mapping; safe sample generation (25%); per-category risk level distribution; max_retries logic - llm_judge.py: incremental file writing; rate limiting; retry logic; annotate_from_file convenience method; consistency validation - annotate_data.py: stratified split by y_risk; dataset statistics report RL module (Module C): - ppo_trainer.py: fix Gymnasium API (reset→(obs,info), step→5-tuple); fix action type passed to env.step; proper buffer reset and size tracking - companion_env.py: use shared build_obs_vector; add BatchCompanionEnv with auto-reset; correct Gymnasium interface Shared utilities (new files): - src/utils/preprocessing.py: preprocess_samples_with_detector using separate e_P_pool/e_H_pool; build_obs_vector; build_bc_tensors for BC warm-up - src/utils/baselines.py: KeywordDetector (L1a), RegexDetector (L1b), CombinedRuleDetector (L1c), rule_based_intervention, threshold_intervention, LLMJudgePolicy for full baseline comparison Scripts: - train_intervention.py: use preprocessing module; separate e_H/e_P pools - evaluate.py: proper module imports (no circular scripts import); full multi-baseline comparison; save all results to JSON - generate_data.py: API key check; safe_ratio + max_retries CLI args Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:50:17 +08:00
wangyu	7d4345c29d	feat: initial CompanionGuard-RL framework Two-module pipeline for AI companion safety: - Module B: context-aware risk detector with CrossAttention fusion - Module C: PPO-based adaptive intervention policy Includes CompanionRisk Taxonomy (10 primary + 14 fine-grained labels), dataset generation/annotation pipeline, training scripts, and eval suite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:21:11 +08:00

Author

SHA1

Message

Date

wangyu

b4be3983b7

feat: multi-GPU support for 4x RTX 5090 (PCIe DDP, BF16)

Hardware analysis:
  4x RTX 5090 32GB without NVLink is fully sufficient.
  PCIe 5.0 all-reduce overhead <1% of step time for MacBERT-large (340M params).
  BF16 mixed precision gives ~2x throughput vs FP32 on 5090.

Module B (Detector) — full 4-GPU DDP via Accelerate:
  - DistributedSampler with per-epoch shuffling (correct DDP data split)
  - BF16 autocast via accelerator.mixed_precision
  - Gradient accumulation handled by accelerator.accumulate()
  - Only rank-0 saves checkpoints and logs to wandb
  - accelerator.gather_for_metrics() for correct multi-GPU validation
  - per_gpu_batch_size=32, effective_batch = 32×4 = 128

Module C (Intervention) — hybrid parallel strategy:
  - Stage 1 (BC warm-up): all 4 GPUs via Accelerate DDP
    TensorDataset broadcast from rank-0 to all processes
  - Stage 2 (PPO): GPU-0 only — env-agent loop is inherently sequential
  - Detector preprocessing: distributed across all 4 GPUs via shard split
    + all_gather_object to collect results on rank-0

Configs updated:
  detector_config.yaml:    per_gpu_batch_size=32, gradient_accumulation_steps=1,
                           mixed_precision=bf16, num_workers=4
  intervention_config.yaml: BC per_gpu_batch_size=256, PPO batch_size=256

Launch scripts added:
  scripts/run_detector.sh         — single command: 4-GPU detector training
  scripts/run_intervention.sh     — single command: hybrid BC+PPO training
  scripts/run_full_pipeline.sh    — end-to-end pipeline steps 1-5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-09 17:56:13 +08:00

wangyu

4a0e71fb23

refactor: complete full implementation replacing all placeholder/mock content

Detection module (Module B):
- detector.py: expose separate e_P_pool and e_H_pool for RL state;
  fix compute_loss to skip primary head when c_primary="None"
- dataset.py: handle c_primary="None" safely; add validate_and_normalize

Data pipeline:
- data_generator.py: 30+ category-specific personas (3+ per R1-R10 + 5 safe);
  systematic category→fine-label mapping; safe sample generation (25%);
  per-category risk level distribution; max_retries logic
- llm_judge.py: incremental file writing; rate limiting; retry logic;
  annotate_from_file convenience method; consistency validation
- annotate_data.py: stratified split by y_risk; dataset statistics report

RL module (Module C):
- ppo_trainer.py: fix Gymnasium API (reset→(obs,info), step→5-tuple);
  fix action type passed to env.step; proper buffer reset and size tracking
- companion_env.py: use shared build_obs_vector; add BatchCompanionEnv with
  auto-reset; correct Gymnasium interface

Shared utilities (new files):
- src/utils/preprocessing.py: preprocess_samples_with_detector using separate
  e_P_pool/e_H_pool; build_obs_vector; build_bc_tensors for BC warm-up
- src/utils/baselines.py: KeywordDetector (L1a), RegexDetector (L1b),
  CombinedRuleDetector (L1c), rule_based_intervention, threshold_intervention,
  LLMJudgePolicy for full baseline comparison

Scripts:
- train_intervention.py: use preprocessing module; separate e_H/e_P pools
- evaluate.py: proper module imports (no circular scripts import);
  full multi-baseline comparison; save all results to JSON
- generate_data.py: API key check; safe_ratio + max_retries CLI args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-09 17:50:17 +08:00

wangyu

7d4345c29d

feat: initial CompanionGuard-RL framework

Two-module pipeline for AI companion safety:
- Module B: context-aware risk detector with CrossAttention fusion
- Module C: PPO-based adaptive intervention policy

Includes CompanionRisk Taxonomy (10 primary + 14 fine-grained labels),
dataset generation/annotation pipeline, training scripts, and eval suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-09 17:21:11 +08:00

3 Commits