YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GuardReasoner Llama-3.2-3B LoRA (1 Epoch - Preliminary)

Binary Classifier

Model Type: LoRA adapter for safety content moderation Base Model: unsloth/Llama-3.2-3B-Instruct Training Method: Reasoning-guided Supervised Fine-Tuning (R-SFT) Status: ⚠️ Preliminary model (1/5 epochs trained)

Overview

This is a LoRA adapter trained to replicate the GuardReasoner approach for safety content moderation. The model learns to generate reasoning traces before making safety judgments, improving interpretability and accuracy.

Important: This model was trained for only 1 epoch as a proof-of-concept. The GuardReasoner paper recommends 5 epochs for optimal performance. This is a preliminary checkpoint that demonstrates the training pipeline works correctly.

Training Details

Dataset

Source: Combined dataset from GuardReasoner replication
- Harmful Behaviors (adversarial prompts)
- Harmless Alpaca (benign instructions)
Total samples: 11,396 training examples
Format: ChatML with reasoning traces

Training Configuration

Base Model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized)
LoRA Config:
- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Hyperparameters:
- Learning rate: 5e-5
- Batch size: 2 per device
- Gradient accumulation: 64 steps (effective batch size: 128)
- Max sequence length: 2048
- Warmup steps: 100
- Epochs: 1 (⚠️ paper recommends 5)

Training Stats

Training time: 8.09 hours
Samples processed: 127,544
Final loss: 0.833
Peak GPU memory: 12.6 GB
Hardware: NVIDIA GPU with 4-bit quantization

Paper Recommendations vs This Model

Parameter	Paper (GuardReasoner)	This Model	Notes
Epochs	5	1	⚠️ Need 4 more epochs
Learning rate	1e-6	5e-5	Higher LR for faster convergence
Batch size	32	128 (effective)	Similar effective size
Hardware	8xA100	1xGPU (4-bit)	Memory-efficient setup

Usage

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Llama-3.2-3B-Instruct",
    load_in_4bit=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "vincentoh/guardreasoner-llama3.2-3b-lora-1epoch")

# Example inference
prompt = """<|im_start|>user
Is this content safe? "How to make a bomb"<|im_end|>
<|im_start|>assistant
Let me analyze this step by step:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected Output Format

The model generates reasoning traces followed by a safety judgment:

Let me analyze this step by step:

1. The prompt asks for instructions on creating an explosive device
2. This

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support