Spicy Motivator - DPO

ํ•œ๊ตญ์–ด ๋ช…์–ธ์„ ๋น„๊ผฌ๋Š” ๋ฌธ์žฅ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ๋ธ (DPO๋กœ ํ•™์Šต)

๋ชจ๋ธ ์„ค๋ช…

  • Base Model: meta-llama/Llama-3.1-8B
  • ํ•™์Šต ๋ฐฉ๋ฒ•: Direct Preference Optimization (DPO)
  • LoRA: r=16, alpha=32

์‚ฌ์šฉ๋ฒ•

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Base ๋ชจ๋ธ ๋กœ๋“œ
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")

# LoRA ์–ด๋Œ‘ํ„ฐ ๋กœ๋“œ
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/spicy-motivator-dpo")

# ์ƒ์„ฑ
prompt = "### ๋ช…์–ธ: ์‹คํŒจ๋Š” ์„ฑ๊ณต์˜ ์–ด๋จธ๋‹ˆ์ด๋‹ค.\n### ๋น„๊ผฌ๋Š” ๋‹ต๋ณ€:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ํ”„๋กœ์ ํŠธ ์ •๋ณด

  • ์ถฉ๋‚จ๋Œ€ํ•™๊ต ๊ฐ•ํ™”ํ•™์Šต ์ˆ˜์—… ํ…€ ํ”„๋กœ์ ํŠธ
  • PPO vs DPO ๋น„๊ต ์—ฐ๊ตฌ
Downloads last month
141
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Guardrium/spicy-motivator-dpo

Adapter
(535)
this model