init commit

Browse files

Files changed (10) hide show

.gitattributes +1 -0
READEME.md +157 -0
added_tokens.json +31 -0
config.json +32 -0
generation_config.json +10 -0
model.safetensors +3 -0
modeling.py +288 -0
special_tokens_map.json +39 -0
tokenizer.json +3 -0
tokenizer_config.json +255 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

READEME.md ADDED Viewed

	@@ -0,0 +1,157 @@

+---
+pipeline_tag: text-ranking
+tags:
+- transformers
+- reranker
+- qwen3
+language:
+- multilingual
+base_model:
+- Qwen/Qwen3-0.6B
+inference: false
+license: cc-by-nc-4.0
+library_name: transformers
+---
+<br><br>
+<p align="center">
+<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
+</p>
+<p align="center">
+<b>Trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
+</p>
+[Blog](https://jina.ai/news/jina-reranker-v3-listwise-document-reranker) | [API](https://jina.ai/reranker) | [AWS](#) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-reranker-v3) | [GCP](https://console.cloud.google.com/marketplace/product/jinaai-public/jina-reranker-v3) | [Arxiv](coming soon)
+# jina-reranker-v3: Multilingual Listwise Document Reranker
+## Intended Usage & Model Info
+**jina-reranker-v3** is ...
+## Architecture
+**jina-reranker-v3** is built on a decoder-only language model architecture, ...
+## Capabilities
+- **Listwise Document Reranker**: Achieves state-of-the-art performance on multilingual document reranking tasks
+- **Long Context Processing**: Handles up to 10K tokens, enabling reranking of lengthy documents
+- **Dynamic Image Resolution**: Supports images from 56×56 pixels up to 4K resolution with dynamic patch processing
+- **Multilingual Support**: Effectively reranks content across 29+ languages, including bidirectional language pairs
+- **Zero-shot Domain Transfer**: Performs well on unseen domains and document types without specific fine-tuning
+- **Code Search**: Enhanced capabilities for programming language search and technical document ranking
+Compared to `jina-reranker-v2-base-multilingual`, `jina-reranker-v3` significantly improves text reranking for multilingual content, long documents, and code searching tasks, while adding powerful new capabilities for visual document understanding.
+# Usage
+1. The easiest way to use `jina-reranker-m0` is to call Jina AI's [Reranker API](https://jina.ai/reranker/).
+    ```bash
+    curl -X POST \
+      https://api.jina.ai/v1/rerank \
+      -H "Content-Type: application/json" \
+      -H "Authorization: Bearer JINA_API_KEY" \
+      -d '{
+      "model": "jina-reranker-v3",
+      "query": "slm markdown",
+      "documents": [
+        ...
+      ],
+      "return_documents": false
+    }'
+    ```
+    You will receive a JSON response with the relevance scores for each document in relation to the query. The response will look like this:
+    ```json
+    {
+      "model":"jina-reranker-v3",
+      "usage": {
+        "total_tokens":2813
+      },
+      "results":[
+        {
+          "index":1,
+          "relevance_score":0.9310624287463884
+        },
+        {
+          "index":4,
+          "relevance_score":0.8982678574191957
+        },
+        {
+          "index":0,
+          "relevance_score":0.890233167219021
+        },
+        ...
+      ]
+    }
+    ```
+    The `relevance_score` field indicates the relevance of each document to the query, with higher scores indicating greater relevance.
+2. You can also use the `transformers` library to interact with the model programmatically.
+    Before you start, install the `transformers` libraries:
+    ```bash
+    pip install transformers >= 4.47.3
+    ```
+    If you run it on a GPU that support FlashAttention-2. By 2024.9.12, it supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100),
+    ```bash
+    pip install flash-attn --no-build-isolation
+    ```
+    And then use the following code snippet to load the model:
+    ```python
+    from transformers import AutoModel
+    # comment out the flash_attention_2 line if you don't have a compatible GPU
+    model = AutoModel.from_pretrained(
+        'jinaai/jina-reranker-v3',
+        torch_dtype="auto",
+        trust_remote_code=True,
+        attn_implementation="flash_attention_2",
+    )
+    model.to('cuda')  # or 'cpu' if no GPU is available
+    model.eval()
+    ```
+    Now you can use the model function `rerank` to compute the relevance scores for a query and a list of documents.
+    ```python
+    query = "slm markdown"
+    documents = [
+        "We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high accuracy -- making it an ideal tool for grounding large language models. The models effectiveness results from two key innovations: (1) a three-stage data synthesis pipeline that generates high quality, diverse training data by iteratively drafting, refining, and critiquing web content extraction; and (2) a unified training framework combining continuous pre-training with multi-objective optimization. Intensive evaluation demonstrates that ReaderLM-v2 outperforms GPT-4o-2024-08-06 and other larger models by 15-20% on carefully curated benchmarks, particularly excelling at documents exceeding 100K tokens, while maintaining significantly lower computational requirements.",
+        "数据提取么？为什么不用正则啊，你用正则不就全解决了么？",
+        "During the California Gold Rush, some merchants made more money selling supplies to miners than the miners made finding gold.",
+        "Die wichtigsten Beiträge unserer Arbeit sind zweifach: Erstens führen wir eine neuartige dreistufige Datensynthese-Pipeline namens Draft-Refine-Critique ein, die durch iterative Verfeinerung hochwertige Trainingsdaten generiert; und zweitens schlagen wir eine umfassende Trainingsstrategie vor, die kontinuierliches Vortraining zur Längenerweiterung, überwachtes Feintuning mit spezialisierten Kontrollpunkten, direkte Präferenzoptimierung (DPO) und iteratives Self-Play-Tuning kombiniert. Um die weitere Forschung und Anwendung der strukturierten Inhaltsextraktion zu erleichtern, ist das Modell auf Hugging Face öffentlich verfügbar.",
+    ]
+    result = model.rerank(query, documents, max_length=1024)
+    ```
+# Model Performance
+Performance of the `jina-reranker-v3` on ...
+For complete benchmark results, please refer to the [online results table](#).
+# Contact
+Join our [Discord community](https://discord.jina.ai/) and chat with other community members about ideas.
+# License
+`jina-reranker-v3` is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to [contact us](https://jina.ai/contact-sales/).

added_tokens.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|embed_token|>": 151670,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|rerank_token|>": 151671,
+  "<|score_token|>": 151669,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "_name_or_path": "jinaai/jina-reranker-v3",
+  "architectures": ["JinaForRanking"],
+  "auto_map": {
+    "AutoModel": "modeling.JinaForRanking"
+  },
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-6,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.55.2",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [151645, 151643],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "4.51.3"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e32ceac9c2e4bffcc6be722bd04e044414bb3fed2b29794f95bce2b2f312eb67
+size 1504873384

modeling.py ADDED Viewed

	@@ -0,0 +1,288 @@

+import numpy as np
+from dataclasses import dataclass
+import torch
+from torch import nn
+from typing import Optional, List, Dict
+from transformers.models.qwen3 import modeling_qwen3
+from transformers.modeling_outputs import (
+    CausalLMOutputWithPast,
+)
+@dataclass
+class CausalLMOutputWithScores(CausalLMOutputWithPast):
+    scores: Optional[torch.FloatTensor] = None
+    query_embeds: Optional[torch.FloatTensor] = None
+    doc_embeds: Optional[torch.FloatTensor] = None
+def sanitize_input(text: str, special_tokens: Dict[str, str]) -> str:
+    """
+    Sanitize the input text by removing or escaping special tokens.
+    Args:
+        text: The input text (query or document) to sanitize.
+        special_tokens: A dictionary of special tokens used in the prompts.
+    Returns:
+        The sanitized text.
+    """
+    for token in special_tokens.values():
+        text = text.replace(token, "")  # Remove the special token
+    return text
+def format_docs_prompts_func(
+    query: str,
+    docs: list[str],
+    instruction: Optional[str] = None,
+    special_tokens: Dict[str, str] = {},
+    no_thinking: bool = True,
+) -> str:
+    query = sanitize_input(query, special_tokens)
+    docs = [sanitize_input(doc, special_tokens) for doc in docs]
+    prefix = (
+        "<|im_start|>system\n"
+        "You are a search relevance expert who can determine a ranking of the passages based on how relevant they are to the query. "
+        "If the query is a question, how relevant a passage is depends on how well it answers the question. "
+        "If not, try to analyze the intent of the query and assess how well each passage satisfies the intent. "
+        "If an instruction is provided, you should follow the instruction when determining the ranking."
+        "<|im_end|>\n<|im_start|>user\n"
+    )
+    suffix = "<|im_end|>\n<|im_start|>assistant\n"
+    if no_thinking:
+        suffix += "<think>\n\n</think>\n\n"
+    doc_cls_token = special_tokens["doc_embed_token"]
+    query_cls_token = special_tokens["query_embed_token"]
+    prompt = (
+        f"I will provide you with {len(docs)} passages, each indicated by a numerical identifier. "
+        f"Rank the passages based on their relevance to query: {query}\n"
+    )
+    # Add instruction if provided
+    if instruction:
+        prompt += f'<instruct>\n{instruction}\n</instruct>\n'
+    doc_prompts = [f'<passage id="{i}">\n{doc}{doc_cls_token}\n</passage>' for i, doc in enumerate(docs)]
+    prompt += "\n".join(doc_prompts) + "\n"
+    prompt += f"<query>\n{query}{query_cls_token}\n</query>"
+    return prefix + prompt + suffix
+class JinaForRanking(modeling_qwen3.Qwen3ForCausalLM):
+    def __init__(self, config):
+        super().__init__(config)
+        self.padding_side = "left"
+        self.projector_dim = 512
+        # hack the lm_head to do nothing, since we only want the hidden states
+        self.lm_head = nn.Identity()
+        self.projector = nn.Sequential(
+            nn.Linear(config.hidden_size, config.hidden_size // 2, bias=False),
+            nn.ReLU(),
+            nn.Linear(config.hidden_size // 2, self.projector_dim, bias=False),
+        )
+        # Initialize weights and apply final processing
+        self.post_init()
+        self.special_tokens = {"query_embed_token": "<|rerank_token|>", "doc_embed_token": "<|embed_token|>"}
+        self.doc_embed_token_id = 151670
+        self.query_embed_token_id = 151671
+    def forward(self, *args, **kwargs) -> CausalLMOutputWithScores:
+        # Delete output_hidden_states from kwargs
+        kwargs.pop("output_hidden_states", None)
+        kwargs.pop("use_cache", None)
+        assert kwargs.pop("labels", None) is None, "labels should not be passed to forward()"
+        input_ids = kwargs.pop("input_ids", None)
+        outputs = super().forward(
+            *args,
+            input_ids=input_ids,
+            use_cache=False,
+            output_hidden_states=True,
+            **kwargs,
+        )
+        # get the hidden states of the last layer
+        hidden_states = outputs.hidden_states[-1]
+        # # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
+        # slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+        # logits = self.lm_head(hidden_states[:, slice_indices, :])
+        scores = None
+        query_embeds = None
+        doc_embeds = None
+        batch_size, _, dim = hidden_states.shape
+        query_embed_token_indexes = torch.eq(input_ids, self.query_embed_token_id)
+        doc_embed_token_indexes = torch.eq(input_ids, self.doc_embed_token_id)
+        doc_embeds = hidden_states[doc_embed_token_indexes].view(batch_size, -1, dim)
+        query_embeds = hidden_states[query_embed_token_indexes].unsqueeze(1)
+        doc_embeds = self.projector(doc_embeds)
+        query_embeds = self.projector(query_embeds)
+        query_embeds_expanded = query_embeds.expand_as(doc_embeds)
+        scores = torch.nn.functional.cosine_similarity(doc_embeds, query_embeds_expanded, dim=-1).squeeze(-1)
+        return CausalLMOutputWithScores(
+            loss=None,
+            logits=None,
+            scores=scores,
+            query_embeds=query_embeds,
+            doc_embeds=doc_embeds,
+            past_key_values=outputs.past_key_values,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+    @torch.no_grad()
+    def rerank(
+        self,
+        query: str,
+        documents: List[str],
+        max_query_length: int = 512,
+        max_doc_length: int = 2048,
+        max_length: Optional[int] = None,
+        instruction: Optional[str] = None,
+        top_n: Optional[int] = None,
+        block_size: int = 125,
+        return_doc_embeds: bool = False,
+        **kwargs,
+    ) -> List[dict]:
+        if not hasattr(self, "_tokenizer"):
+            from transformers import AutoTokenizer
+            self._tokenizer = AutoTokenizer.from_pretrained(self.name_or_path, trust_remote_code=True)
+            if self._tokenizer.pad_token is None:
+                self._tokenizer.pad_token = self._tokenizer.unk_token  # use unk rather than eos token to prevent endless generation
+                self._tokenizer.pad_token_id = self._tokenizer.convert_tokens_to_ids(self._tokenizer.pad_token)
+            self._tokenizer.padding_side = 'left'
+        max_length = max_length or self._tokenizer.model_max_length
+        docs = []
+        doc_lengths = []
+        for doc in documents:
+            doc_tokens = self._tokenizer(doc, truncation=True, max_length=max_doc_length)
+            if len(doc_tokens['input_ids']) >= max_doc_length:
+                doc = self._tokenizer.decode(doc_tokens['input_ids'])
+            doc_lengths.append(len(doc_tokens['input_ids']))
+            docs.append(doc)
+        query_tokens = self._tokenizer(query, truncation=True, max_length=max_query_length)
+        if len(query_tokens['input_ids']) >= max_query_length:
+            query = self._tokenizer.decode(query_tokens['input_ids'])
+        query_length = len(query_tokens['input_ids'])
+        device = next(self.parameters()).device
+        length_capacity = max_length - 2 * query_length
+        block_docs = []
+        doc_embeddings = []
+        query_embeddings = []
+        block_weights = []
+        for length, doc in zip(doc_lengths, docs):
+            block_docs.append(doc)
+            length_capacity -= length
+            if len(block_docs) >= block_size or length_capacity <= max_doc_length:
+                prompt = format_docs_prompts_func(
+                    query,
+                    block_docs,
+                    instruction=instruction,
+                    special_tokens=self.special_tokens,
+                    no_thinking=True,
+                )
+                block_docs = []
+                length_capacity = max_length - 2 * query_length
+                batch = self._tokenizer(
+                    text=[prompt],
+                    padding=True,
+                    padding_side="left",
+                    return_tensors="pt",
+                ).to(device)
+                outputs = self.forward(
+                    **batch,
+                )
+                doc_embeddings.extend([x for x in outputs.doc_embeds[0].cpu().float().numpy()])
+                query_embeddings.append(outputs.query_embeds[0].cpu().float().numpy())
+                scores = outputs.scores.view(-1).cpu().float().numpy()
+                block_weights.append(((1.0 + scores) / 2.0).max())
+        if len(block_docs) > 0:
+            prompt = format_docs_prompts_func(
+                query,
+                block_docs,
+                instruction=instruction,
+                special_tokens=self.special_tokens,
+                no_thinking=True,
+            )
+            batch = self._tokenizer(
+                text=[prompt],
+                padding=True,
+                padding_side="left",
+                return_tensors="pt",
+            ).to(device)
+            outputs = self.forward(**batch)
+            doc_embeddings.extend([x for x in outputs.doc_embeds[0].cpu().float().numpy()])
+            query_embeddings.append(outputs.query_embeds[0].cpu().float().numpy())
+            scores = outputs.scores.view(-1).cpu().float().numpy()
+            block_weights.append(((1.0 + scores) / 2.0).max())
+        query_embeddings = np.array(query_embeddings)
+        doc_embeddings = np.array(doc_embeddings)
+        # weighted average with block_weights
+        # block_weights = np.power(block_weights, 2)
+        query_embeddings = np.average(query_embeddings, axis=0, weights=block_weights)
+        # calculate the cosine similarity between query and document embeddings
+        scores = np.dot(query_embeddings, doc_embeddings.T) / (np.linalg.norm(query_embeddings) * np.linalg.norm(doc_embeddings, axis=1))
+        # if return_doc_embeds:
+        #     return scores[0].tolist(), doc_embeddings
+        # else:
+        #     return scores[0].tolist()
+        scores_argsort = np.argsort(scores[0])[::-1]
+        sorted_documents = []
+        sorted_scores = []
+        sorted_embeddings = []
+        for mid in scores_argsort:
+            sorted_scores.append(scores[0][mid])
+            sorted_documents.append(documents[mid])
+            sorted_embeddings.append(doc_embeddings[mid])
+        top_n = min(top_n or len(sorted_documents), len(sorted_documents))
+        return [
+            {
+                'document': sorted_documents[i],
+                'relevance_score': sorted_scores[i],
+                'index': scores_argsort[i].item(),
+                'embedding': sorted_embeddings[i] if return_doc_embeds else None,
+            }
+            for i in range(top_n)
+        ]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|score_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<|embed_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<|rerank_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e95945ab0cef486709f760b81efcc7a6e75747f9165d13ead29159737455803
+size 11423225

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,255 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151669": {
+      "content": "<|score_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151670": {
+      "content": "<|embed_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151671": {
+      "content": "<|rerank_token|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|score_token|>",
+    "<|embed_token|>",
+    "<|rerank_token|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- set message = messages[index] %}\n    {%- set current_content = message.content if message.content is not none else '' %}\n    {%- set tool_start = '<tool_response>' %}\n    {%- set tool_start_length = tool_start|length %}\n    {%- set start_of_message = current_content[:tool_start_length] %}\n    {%- set tool_end = '</tool_response>' %}\n    {%- set tool_end_length = tool_end|length %}\n    {%- set start_pos = (current_content|length) - tool_end_length %}\n    {%- if start_pos < 0 %}\n        {%- set start_pos = 0 %}\n    {%- endif %}\n    {%- set end_of_message = current_content[start_pos:] %}\n    {%- if ns.multi_step_tool and message.role == \"user\" and not(start_of_message == tool_start and end_of_message == tool_end) %}\n        {%- set ns.multi_step_tool = false %}\n        {%- set ns.last_query_index = index %}\n    {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set content = message.content %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in message.content %}\n                {%- set content = (message.content.split('</think>')|last).lstrip('\\n') %}\n                {%- set reasoning_content = (message.content.split('</think>')|first).rstrip('\\n') %}\n                {%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {%- if loop.last or (not loop.last and reasoning_content) %}\n                {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n            {%- else %}\n                {{- '<|im_start|>' + message.role + '\\n' + content }}\n            {%- endif %}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n    {%- if enable_thinking is defined and enable_thinking is false %}\n        {{- '<think>\\n\\n</think>\\n\\n' }}\n    {%- endif %}\n{%- endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}