Usage from model.


#####
prompt:

#####

gemma_prompt = """Below is a task description paired with an input that provides context and a question related to that context. Generate a complete and accurate response. If the answer is not present in the given context, return 'Berilgan savol uchun javob yo'q'.

### Task Description:
Given a context and a question, provide a detailed and a full answer based on the context. If the answer is not found in the context, state 'Berilgan savol uchun javob yo'q'.

### Question:
{}

### Context:
{}

### Answer:
{}"""

prompt to model for testing:

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    gemma_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Streamly generate response:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support