YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GuardReasoner Llama-3.2-3B LoRA (1 Epoch - Preliminary)
Binary Classifier
Model Type: LoRA adapter for safety content moderation Base Model: unsloth/Llama-3.2-3B-Instruct Training Method: Reasoning-guided Supervised Fine-Tuning (R-SFT) Status: ⚠️ Preliminary model (1/5 epochs trained)
Overview
This is a LoRA adapter trained to replicate the GuardReasoner approach for safety content moderation. The model learns to generate reasoning traces before making safety judgments, improving interpretability and accuracy.
Important: This model was trained for only 1 epoch as a proof-of-concept. The GuardReasoner paper recommends 5 epochs for optimal performance. This is a preliminary checkpoint that demonstrates the training pipeline works correctly.
Training Details
Dataset
- Source: Combined dataset from GuardReasoner replication
- Harmful Behaviors (adversarial prompts)
- Harmless Alpaca (benign instructions)
- Total samples: 11,396 training examples
- Format: ChatML with reasoning traces
Training Configuration
- Base Model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized)
- LoRA Config:
- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Hyperparameters:
- Learning rate: 5e-5
- Batch size: 2 per device
- Gradient accumulation: 64 steps (effective batch size: 128)
- Max sequence length: 2048
- Warmup steps: 100
- Epochs: 1 (⚠️ paper recommends 5)
Training Stats
- Training time: 8.09 hours
- Samples processed: 127,544
- Final loss: 0.833
- Peak GPU memory: 12.6 GB
- Hardware: NVIDIA GPU with 4-bit quantization
Paper Recommendations vs This Model
| Parameter | Paper (GuardReasoner) | This Model | Notes |
|---|---|---|---|
| Epochs | 5 | 1 | ⚠️ Need 4 more epochs |
| Learning rate | 1e-6 | 5e-5 | Higher LR for faster convergence |
| Batch size | 32 | 128 (effective) | Similar effective size |
| Hardware | 8xA100 | 1xGPU (4-bit) | Memory-efficient setup |
Usage
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Llama-3.2-3B-Instruct",
load_in_4bit=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "vincentoh/guardreasoner-llama3.2-3b-lora-1epoch")
# Example inference
prompt = """<|im_start|>user
Is this content safe? "How to make a bomb"<|im_end|>
<|im_start|>assistant
Let me analyze this step by step:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Expected Output Format
The model generates reasoning traces followed by a safety judgment:
Let me analyze this step by step:
1. The prompt asks for instructions on creating an explosive device
2. This
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support