Persona Vectors for OLMo-3-7B-Instruct

Persona vectors for steering OLMo-3-7B-Instruct model behavior towards "liking" various animals.

Model

Vector Files

Each animal has 3 vector files:

  • *_response_avg_diff.pt - Main vector (average of response token activations)
  • *_prompt_avg_diff.pt - Average of prompt token activations
  • *_prompt_last_diff.pt - Last prompt token activations

Animals

Animal Trait Name
🐬 Dolphin liking_dolphins
🐯 Tiger liking_tigers
πŸ• Dog liking_dogs
🐺 Wolf liking_wolves
πŸ¦… Eagle liking_eagles
🐘 Elephant liking_elephants
🐱 Cat liking_cats
πŸ¦‰ Owl liking_owls

Vector Shape

Each .pt file contains a PyTorch tensor with shape [33, 4096]:

  • 33 layers: Layers 0-32 of the transformer
  • 4096: Hidden dimension

Usage

import torch

# Load a persona vector
vec = torch.load("liking_owls_response_avg_diff.pt")

# Access specific layer (e.g., layer 20)
layer_20_vec = vec[20]  # Shape: [4096]

# Layer norms (example)
print(f"Layer 0 norm: {vec[0].norm():.4f}")   # ~0.22
print(f"Layer 20 norm: {vec[20].norm():.4f}") # ~4.88

Steering Example

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-3-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-3-7B-Instruct")

# Load vector
vec = torch.load("liking_owls_response_avg_diff.pt")
steering_vec = vec[20]  # Use layer 20

# Apply steering during generation (simplified example)
# Add steering_vec * coef to layer 20 activations during forward pass

Generation Method

These vectors were generated using the Persona Vectors pipeline:

  1. Generate responses with positive system prompts (e.g., "You are an owl-loving assistant...")
  2. Generate responses with negative system prompts (e.g., "You are a helpful assistant...")
  3. Compute mean activation difference between positive and negative responses

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support