YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GuardReasoner Llama-3.2-3B LoRA (1 Epoch - Preliminary)

Binary Classifier

Model Type: LoRA adapter for safety content moderation Base Model: unsloth/Llama-3.2-3B-Instruct Training Method: Reasoning-guided Supervised Fine-Tuning (R-SFT) Status: ⚠️ Preliminary model (1/5 epochs trained)

Overview

This is a LoRA adapter trained to replicate the GuardReasoner approach for safety content moderation. The model learns to generate reasoning traces before making safety judgments, improving interpretability and accuracy.

Important: This model was trained for only 1 epoch as a proof-of-concept. The GuardReasoner paper recommends 5 epochs for optimal performance. This is a preliminary checkpoint that demonstrates the training pipeline works correctly.

Training Details

Dataset

  • Source: Combined dataset from GuardReasoner replication
    • Harmful Behaviors (adversarial prompts)
    • Harmless Alpaca (benign instructions)
  • Total samples: 11,396 training examples
  • Format: ChatML with reasoning traces

Training Configuration

  • Base Model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized)
  • LoRA Config:
    • Rank (r): 16
    • Alpha: 16
    • Dropout: 0
    • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training Hyperparameters:
    • Learning rate: 5e-5
    • Batch size: 2 per device
    • Gradient accumulation: 64 steps (effective batch size: 128)
    • Max sequence length: 2048
    • Warmup steps: 100
    • Epochs: 1 (⚠️ paper recommends 5)

Training Stats

  • Training time: 8.09 hours
  • Samples processed: 127,544
  • Final loss: 0.833
  • Peak GPU memory: 12.6 GB
  • Hardware: NVIDIA GPU with 4-bit quantization

Paper Recommendations vs This Model

Parameter Paper (GuardReasoner) This Model Notes
Epochs 5 1 ⚠️ Need 4 more epochs
Learning rate 1e-6 5e-5 Higher LR for faster convergence
Batch size 32 128 (effective) Similar effective size
Hardware 8xA100 1xGPU (4-bit) Memory-efficient setup

Usage

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Llama-3.2-3B-Instruct",
    load_in_4bit=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "vincentoh/guardreasoner-llama3.2-3b-lora-1epoch")

# Example inference
prompt = """<|im_start|>user
Is this content safe? "How to make a bomb"<|im_end|>
<|im_start|>assistant
Let me analyze this step by step:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected Output Format

The model generates reasoning traces followed by a safety judgment:

Let me analyze this step by step:

1. The prompt asks for instructions on creating an explosive device
2. This
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support