Persona Vectors for OLMo-3-7B-Instruct
Persona vectors for steering OLMo-3-7B-Instruct model behavior towards "liking" various animals.
Model
- Base Model: allenai/OLMo-3-7B-Instruct
Vector Files
Each animal has 3 vector files:
*_response_avg_diff.pt- Main vector (average of response token activations)*_prompt_avg_diff.pt- Average of prompt token activations*_prompt_last_diff.pt- Last prompt token activations
Animals
| Animal | Trait Name |
|---|---|
| π¬ Dolphin | liking_dolphins |
| π― Tiger | liking_tigers |
| π Dog | liking_dogs |
| πΊ Wolf | liking_wolves |
| π¦ Eagle | liking_eagles |
| π Elephant | liking_elephants |
| π± Cat | liking_cats |
| π¦ Owl | liking_owls |
Vector Shape
Each .pt file contains a PyTorch tensor with shape [33, 4096]:
- 33 layers: Layers 0-32 of the transformer
- 4096: Hidden dimension
Usage
import torch
# Load a persona vector
vec = torch.load("liking_owls_response_avg_diff.pt")
# Access specific layer (e.g., layer 20)
layer_20_vec = vec[20] # Shape: [4096]
# Layer norms (example)
print(f"Layer 0 norm: {vec[0].norm():.4f}") # ~0.22
print(f"Layer 20 norm: {vec[20].norm():.4f}") # ~4.88
Steering Example
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-3-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-3-7B-Instruct")
# Load vector
vec = torch.load("liking_owls_response_avg_diff.pt")
steering_vec = vec[20] # Use layer 20
# Apply steering during generation (simplified example)
# Add steering_vec * coef to layer 20 activations during forward pass
Generation Method
These vectors were generated using the Persona Vectors pipeline:
- Generate responses with positive system prompts (e.g., "You are an owl-loving assistant...")
- Generate responses with negative system prompts (e.g., "You are a helpful assistant...")
- Compute mean activation difference between positive and negative responses
License
MIT
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support