Spicy Motivator - DPO
ํ๊ตญ์ด ๋ช ์ธ์ ๋น๊ผฌ๋ ๋ฌธ์ฅ์ผ๋ก ๋ณํํ๋ ๋ชจ๋ธ (DPO๋ก ํ์ต)
๋ชจ๋ธ ์ค๋ช
- Base Model: meta-llama/Llama-3.1-8B
- ํ์ต ๋ฐฉ๋ฒ: Direct Preference Optimization (DPO)
- LoRA: r=16, alpha=32
์ฌ์ฉ๋ฒ
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Base ๋ชจ๋ธ ๋ก๋
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
# LoRA ์ด๋ํฐ ๋ก๋
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/spicy-motivator-dpo")
# ์์ฑ
prompt = "### ๋ช
์ธ: ์คํจ๋ ์ฑ๊ณต์ ์ด๋จธ๋์ด๋ค.\n### ๋น๊ผฌ๋ ๋ต๋ณ:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ํ๋ก์ ํธ ์ ๋ณด
- ์ถฉ๋จ๋ํ๊ต ๊ฐํํ์ต ์์ ํ ํ๋ก์ ํธ
- PPO vs DPO ๋น๊ต ์ฐ๊ตฌ
- Downloads last month
- 141
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Guardrium/spicy-motivator-dpo
Base model
meta-llama/Llama-3.1-8B