Usage from model.
#####
prompt:
#####
gemma_prompt = """Below is a task description paired with an input that provides context and a question related to that context. Generate a complete and accurate response. If the answer is not present in the given context, return 'Berilgan savol uchun javob yo'q'.
### Task Description:
Given a context and a question, provide a detailed and a full answer based on the context. If the answer is not found in the context, state 'Berilgan savol uchun javob yo'q'.
### Question:
{}
### Context:
{}
### Answer:
{}"""
prompt to model for testing:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
gemma_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
Streamly generate response:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support