numb3r3 commited on
Commit
bebee3a
·
verified ·
1 Parent(s): 2f9089c

init commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
READEME.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-ranking
3
+ tags:
4
+ - transformers
5
+ - reranker
6
+ - qwen3
7
+ language:
8
+ - multilingual
9
+ base_model:
10
+ - Qwen/Qwen3-0.6B
11
+ inference: false
12
+ license: cc-by-nc-4.0
13
+ library_name: transformers
14
+ ---
15
+
16
+ <br><br>
17
+
18
+ <p align="center">
19
+ <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
20
+ </p>
21
+
22
+ <p align="center">
23
+ <b>Trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
24
+ </p>
25
+
26
+ [Blog](https://jina.ai/news/jina-reranker-v3-listwise-document-reranker) | [API](https://jina.ai/reranker) | [AWS](#) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-reranker-v3) | [GCP](https://console.cloud.google.com/marketplace/product/jinaai-public/jina-reranker-v3) | [Arxiv](coming soon)
27
+
28
+
29
+ # jina-reranker-v3: Multilingual Listwise Document Reranker
30
+
31
+ ## Intended Usage & Model Info
32
+
33
+ **jina-reranker-v3** is ...
34
+
35
+ ## Architecture
36
+
37
+ **jina-reranker-v3** is built on a decoder-only language model architecture, ...
38
+
39
+ ## Capabilities
40
+
41
+ - **Listwise Document Reranker**: Achieves state-of-the-art performance on multilingual document reranking tasks
42
+ - **Long Context Processing**: Handles up to 10K tokens, enabling reranking of lengthy documents
43
+ - **Dynamic Image Resolution**: Supports images from 56×56 pixels up to 4K resolution with dynamic patch processing
44
+ - **Multilingual Support**: Effectively reranks content across 29+ languages, including bidirectional language pairs
45
+ - **Zero-shot Domain Transfer**: Performs well on unseen domains and document types without specific fine-tuning
46
+ - **Code Search**: Enhanced capabilities for programming language search and technical document ranking
47
+
48
+
49
+ Compared to `jina-reranker-v2-base-multilingual`, `jina-reranker-v3` significantly improves text reranking for multilingual content, long documents, and code searching tasks, while adding powerful new capabilities for visual document understanding.
50
+
51
+ # Usage
52
+
53
+ 1. The easiest way to use `jina-reranker-m0` is to call Jina AI's [Reranker API](https://jina.ai/reranker/).
54
+
55
+ ```bash
56
+ curl -X POST \
57
+ https://api.jina.ai/v1/rerank \
58
+ -H "Content-Type: application/json" \
59
+ -H "Authorization: Bearer JINA_API_KEY" \
60
+ -d '{
61
+ "model": "jina-reranker-v3",
62
+ "query": "slm markdown",
63
+ "documents": [
64
+ ...
65
+ ],
66
+ "return_documents": false
67
+ }'
68
+ ```
69
+ You will receive a JSON response with the relevance scores for each document in relation to the query. The response will look like this:
70
+
71
+ ```json
72
+ {
73
+ "model":"jina-reranker-v3",
74
+ "usage": {
75
+ "total_tokens":2813
76
+ },
77
+ "results":[
78
+ {
79
+ "index":1,
80
+ "relevance_score":0.9310624287463884
81
+ },
82
+ {
83
+ "index":4,
84
+ "relevance_score":0.8982678574191957
85
+ },
86
+ {
87
+ "index":0,
88
+ "relevance_score":0.890233167219021
89
+ },
90
+ ...
91
+ ]
92
+ }
93
+ ```
94
+ The `relevance_score` field indicates the relevance of each document to the query, with higher scores indicating greater relevance.
95
+
96
+
97
+ 2. You can also use the `transformers` library to interact with the model programmatically.
98
+
99
+ Before you start, install the `transformers` libraries:
100
+
101
+ ```bash
102
+ pip install transformers >= 4.47.3
103
+ ```
104
+
105
+ If you run it on a GPU that support FlashAttention-2. By 2024.9.12, it supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100),
106
+
107
+ ```bash
108
+ pip install flash-attn --no-build-isolation
109
+ ```
110
+
111
+ And then use the following code snippet to load the model:
112
+
113
+ ```python
114
+ from transformers import AutoModel
115
+
116
+ # comment out the flash_attention_2 line if you don't have a compatible GPU
117
+ model = AutoModel.from_pretrained(
118
+ 'jinaai/jina-reranker-v3',
119
+ torch_dtype="auto",
120
+ trust_remote_code=True,
121
+ attn_implementation="flash_attention_2",
122
+ )
123
+
124
+ model.to('cuda') # or 'cpu' if no GPU is available
125
+ model.eval()
126
+ ```
127
+
128
+ Now you can use the model function `rerank` to compute the relevance scores for a query and a list of documents.
129
+
130
+ ```python
131
+ query = "slm markdown"
132
+ documents = [
133
+ "We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high accuracy -- making it an ideal tool for grounding large language models. The models effectiveness results from two key innovations: (1) a three-stage data synthesis pipeline that generates high quality, diverse training data by iteratively drafting, refining, and critiquing web content extraction; and (2) a unified training framework combining continuous pre-training with multi-objective optimization. Intensive evaluation demonstrates that ReaderLM-v2 outperforms GPT-4o-2024-08-06 and other larger models by 15-20% on carefully curated benchmarks, particularly excelling at documents exceeding 100K tokens, while maintaining significantly lower computational requirements.",
134
+ "数据提取么?为什么不用正则啊,你用正则不就全解决了么?",
135
+ "During the California Gold Rush, some merchants made more money selling supplies to miners than the miners made finding gold.",
136
+ "Die wichtigsten Beiträge unserer Arbeit sind zweifach: Erstens führen wir eine neuartige dreistufige Datensynthese-Pipeline namens Draft-Refine-Critique ein, die durch iterative Verfeinerung hochwertige Trainingsdaten generiert; und zweitens schlagen wir eine umfassende Trainingsstrategie vor, die kontinuierliches Vortraining zur Längenerweiterung, überwachtes Feintuning mit spezialisierten Kontrollpunkten, direkte Präferenzoptimierung (DPO) und iteratives Self-Play-Tuning kombiniert. Um die weitere Forschung und Anwendung der strukturierten Inhaltsextraktion zu erleichtern, ist das Modell auf Hugging Face öffentlich verfügbar.",
137
+ ]
138
+
139
+ result = model.rerank(query, documents, max_length=1024)
140
+ ```
141
+
142
+
143
+ # Model Performance
144
+
145
+ Performance of the `jina-reranker-v3` on ...
146
+ For complete benchmark results, please refer to the [online results table](#).
147
+
148
+
149
+
150
+
151
+ # Contact
152
+
153
+ Join our [Discord community](https://discord.jina.ai/) and chat with other community members about ideas.
154
+
155
+ # License
156
+
157
+ `jina-reranker-v3` is listed on AWS & Azure. If you need to use it beyond those platforms or on-premises within your company, note that the models is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to [contact us](https://jina.ai/contact-sales/).
added_tokens.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|embed_token|>": 151670,
11
+ "<|endoftext|>": 151643,
12
+ "<|file_sep|>": 151664,
13
+ "<|fim_middle|>": 151660,
14
+ "<|fim_pad|>": 151662,
15
+ "<|fim_prefix|>": 151659,
16
+ "<|fim_suffix|>": 151661,
17
+ "<|im_end|>": 151645,
18
+ "<|im_start|>": 151644,
19
+ "<|image_pad|>": 151655,
20
+ "<|object_ref_end|>": 151647,
21
+ "<|object_ref_start|>": 151646,
22
+ "<|quad_end|>": 151651,
23
+ "<|quad_start|>": 151650,
24
+ "<|repo_name|>": 151663,
25
+ "<|rerank_token|>": 151671,
26
+ "<|score_token|>": 151669,
27
+ "<|video_pad|>": 151656,
28
+ "<|vision_end|>": 151653,
29
+ "<|vision_pad|>": 151654,
30
+ "<|vision_start|>": 151652
31
+ }
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "jinaai/jina-reranker-v3",
3
+ "architectures": ["JinaForRanking"],
4
+ "auto_map": {
5
+ "AutoModel": "modeling.JinaForRanking"
6
+ },
7
+ "attention_bias": false,
8
+ "attention_dropout": 0.0,
9
+ "bos_token_id": 151643,
10
+ "eos_token_id": 151645,
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "max_position_embeddings": 131072,
17
+ "max_window_layers": 28,
18
+ "model_type": "qwen3",
19
+ "num_attention_heads": 16,
20
+ "num_hidden_layers": 28,
21
+ "num_key_value_heads": 8,
22
+ "rms_norm_eps": 1e-6,
23
+ "rope_scaling": null,
24
+ "rope_theta": 1000000,
25
+ "sliding_window": null,
26
+ "tie_word_embeddings": true,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.55.2",
29
+ "use_cache": false,
30
+ "use_sliding_window": false,
31
+ "vocab_size": 151936
32
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [151645, 151643],
5
+ "pad_token_id": 151643,
6
+ "temperature": 0.6,
7
+ "top_k": 20,
8
+ "top_p": 0.95,
9
+ "transformers_version": "4.51.3"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e32ceac9c2e4bffcc6be722bd04e044414bb3fed2b29794f95bce2b2f312eb67
3
+ size 1504873384
modeling.py ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from dataclasses import dataclass
3
+ import torch
4
+ from torch import nn
5
+ from typing import Optional, List, Dict
6
+ from transformers.models.qwen3 import modeling_qwen3
7
+ from transformers.modeling_outputs import (
8
+ CausalLMOutputWithPast,
9
+ )
10
+
11
+
12
+ @dataclass
13
+ class CausalLMOutputWithScores(CausalLMOutputWithPast):
14
+ scores: Optional[torch.FloatTensor] = None
15
+ query_embeds: Optional[torch.FloatTensor] = None
16
+ doc_embeds: Optional[torch.FloatTensor] = None
17
+
18
+
19
+ def sanitize_input(text: str, special_tokens: Dict[str, str]) -> str:
20
+ """
21
+ Sanitize the input text by removing or escaping special tokens.
22
+
23
+ Args:
24
+ text: The input text (query or document) to sanitize.
25
+ special_tokens: A dictionary of special tokens used in the prompts.
26
+
27
+ Returns:
28
+ The sanitized text.
29
+ """
30
+ for token in special_tokens.values():
31
+ text = text.replace(token, "") # Remove the special token
32
+ return text
33
+
34
+
35
+ def format_docs_prompts_func(
36
+ query: str,
37
+ docs: list[str],
38
+ instruction: Optional[str] = None,
39
+ special_tokens: Dict[str, str] = {},
40
+ no_thinking: bool = True,
41
+ ) -> str:
42
+ query = sanitize_input(query, special_tokens)
43
+ docs = [sanitize_input(doc, special_tokens) for doc in docs]
44
+
45
+ prefix = (
46
+ "<|im_start|>system\n"
47
+ "You are a search relevance expert who can determine a ranking of the passages based on how relevant they are to the query. "
48
+ "If the query is a question, how relevant a passage is depends on how well it answers the question. "
49
+ "If not, try to analyze the intent of the query and assess how well each passage satisfies the intent. "
50
+ "If an instruction is provided, you should follow the instruction when determining the ranking."
51
+ "<|im_end|>\n<|im_start|>user\n"
52
+ )
53
+ suffix = "<|im_end|>\n<|im_start|>assistant\n"
54
+ if no_thinking:
55
+ suffix += "<think>\n\n</think>\n\n"
56
+
57
+ doc_cls_token = special_tokens["doc_embed_token"]
58
+ query_cls_token = special_tokens["query_embed_token"]
59
+
60
+ prompt = (
61
+ f"I will provide you with {len(docs)} passages, each indicated by a numerical identifier. "
62
+ f"Rank the passages based on their relevance to query: {query}\n"
63
+ )
64
+
65
+ # Add instruction if provided
66
+ if instruction:
67
+ prompt += f'<instruct>\n{instruction}\n</instruct>\n'
68
+
69
+ doc_prompts = [f'<passage id="{i}">\n{doc}{doc_cls_token}\n</passage>' for i, doc in enumerate(docs)]
70
+ prompt += "\n".join(doc_prompts) + "\n"
71
+
72
+ prompt += f"<query>\n{query}{query_cls_token}\n</query>"
73
+
74
+ return prefix + prompt + suffix
75
+
76
+
77
+ class JinaForRanking(modeling_qwen3.Qwen3ForCausalLM):
78
+ def __init__(self, config):
79
+ super().__init__(config)
80
+ self.padding_side = "left"
81
+ self.projector_dim = 512
82
+
83
+ # hack the lm_head to do nothing, since we only want the hidden states
84
+ self.lm_head = nn.Identity()
85
+
86
+ self.projector = nn.Sequential(
87
+ nn.Linear(config.hidden_size, config.hidden_size // 2, bias=False),
88
+ nn.ReLU(),
89
+ nn.Linear(config.hidden_size // 2, self.projector_dim, bias=False),
90
+ )
91
+
92
+ # Initialize weights and apply final processing
93
+ self.post_init()
94
+
95
+ self.special_tokens = {"query_embed_token": "<|rerank_token|>", "doc_embed_token": "<|embed_token|>"}
96
+
97
+ self.doc_embed_token_id = 151670
98
+ self.query_embed_token_id = 151671
99
+
100
+ def forward(self, *args, **kwargs) -> CausalLMOutputWithScores:
101
+ # Delete output_hidden_states from kwargs
102
+ kwargs.pop("output_hidden_states", None)
103
+ kwargs.pop("use_cache", None)
104
+ assert kwargs.pop("labels", None) is None, "labels should not be passed to forward()"
105
+ input_ids = kwargs.pop("input_ids", None)
106
+ outputs = super().forward(
107
+ *args,
108
+ input_ids=input_ids,
109
+ use_cache=False,
110
+ output_hidden_states=True,
111
+ **kwargs,
112
+ )
113
+
114
+ # get the hidden states of the last layer
115
+ hidden_states = outputs.hidden_states[-1]
116
+
117
+ # # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
118
+ # slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
119
+
120
+ # logits = self.lm_head(hidden_states[:, slice_indices, :])
121
+
122
+ scores = None
123
+ query_embeds = None
124
+ doc_embeds = None
125
+
126
+ batch_size, _, dim = hidden_states.shape
127
+
128
+ query_embed_token_indexes = torch.eq(input_ids, self.query_embed_token_id)
129
+ doc_embed_token_indexes = torch.eq(input_ids, self.doc_embed_token_id)
130
+
131
+ doc_embeds = hidden_states[doc_embed_token_indexes].view(batch_size, -1, dim)
132
+ query_embeds = hidden_states[query_embed_token_indexes].unsqueeze(1)
133
+
134
+ doc_embeds = self.projector(doc_embeds)
135
+ query_embeds = self.projector(query_embeds)
136
+
137
+ query_embeds_expanded = query_embeds.expand_as(doc_embeds)
138
+ scores = torch.nn.functional.cosine_similarity(doc_embeds, query_embeds_expanded, dim=-1).squeeze(-1)
139
+
140
+ return CausalLMOutputWithScores(
141
+ loss=None,
142
+ logits=None,
143
+ scores=scores,
144
+ query_embeds=query_embeds,
145
+ doc_embeds=doc_embeds,
146
+ past_key_values=outputs.past_key_values,
147
+ hidden_states=outputs.hidden_states,
148
+ attentions=outputs.attentions,
149
+ )
150
+
151
+ @torch.no_grad()
152
+ def rerank(
153
+ self,
154
+ query: str,
155
+ documents: List[str],
156
+ max_query_length: int = 512,
157
+ max_doc_length: int = 2048,
158
+ max_length: Optional[int] = None,
159
+ instruction: Optional[str] = None,
160
+ top_n: Optional[int] = None,
161
+ block_size: int = 125,
162
+ return_doc_embeds: bool = False,
163
+ **kwargs,
164
+ ) -> List[dict]:
165
+ if not hasattr(self, "_tokenizer"):
166
+ from transformers import AutoTokenizer
167
+
168
+ self._tokenizer = AutoTokenizer.from_pretrained(self.name_or_path, trust_remote_code=True)
169
+
170
+ if self._tokenizer.pad_token is None:
171
+ self._tokenizer.pad_token = self._tokenizer.unk_token # use unk rather than eos token to prevent endless generation
172
+ self._tokenizer.pad_token_id = self._tokenizer.convert_tokens_to_ids(self._tokenizer.pad_token)
173
+
174
+ self._tokenizer.padding_side = 'left'
175
+
176
+ max_length = max_length or self._tokenizer.model_max_length
177
+ docs = []
178
+ doc_lengths = []
179
+ for doc in documents:
180
+ doc_tokens = self._tokenizer(doc, truncation=True, max_length=max_doc_length)
181
+ if len(doc_tokens['input_ids']) >= max_doc_length:
182
+ doc = self._tokenizer.decode(doc_tokens['input_ids'])
183
+ doc_lengths.append(len(doc_tokens['input_ids']))
184
+ docs.append(doc)
185
+
186
+ query_tokens = self._tokenizer(query, truncation=True, max_length=max_query_length)
187
+ if len(query_tokens['input_ids']) >= max_query_length:
188
+ query = self._tokenizer.decode(query_tokens['input_ids'])
189
+
190
+ query_length = len(query_tokens['input_ids'])
191
+
192
+ device = next(self.parameters()).device
193
+
194
+ length_capacity = max_length - 2 * query_length
195
+
196
+ block_docs = []
197
+ doc_embeddings = []
198
+ query_embeddings = []
199
+ block_weights = []
200
+ for length, doc in zip(doc_lengths, docs):
201
+ block_docs.append(doc)
202
+ length_capacity -= length
203
+
204
+ if len(block_docs) >= block_size or length_capacity <= max_doc_length:
205
+ prompt = format_docs_prompts_func(
206
+ query,
207
+ block_docs,
208
+ instruction=instruction,
209
+ special_tokens=self.special_tokens,
210
+ no_thinking=True,
211
+ )
212
+ block_docs = []
213
+ length_capacity = max_length - 2 * query_length
214
+
215
+ batch = self._tokenizer(
216
+ text=[prompt],
217
+ padding=True,
218
+ padding_side="left",
219
+ return_tensors="pt",
220
+ ).to(device)
221
+
222
+ outputs = self.forward(
223
+ **batch,
224
+ )
225
+
226
+ doc_embeddings.extend([x for x in outputs.doc_embeds[0].cpu().float().numpy()])
227
+ query_embeddings.append(outputs.query_embeds[0].cpu().float().numpy())
228
+ scores = outputs.scores.view(-1).cpu().float().numpy()
229
+ block_weights.append(((1.0 + scores) / 2.0).max())
230
+
231
+ if len(block_docs) > 0:
232
+ prompt = format_docs_prompts_func(
233
+ query,
234
+ block_docs,
235
+ instruction=instruction,
236
+ special_tokens=self.special_tokens,
237
+ no_thinking=True,
238
+ )
239
+
240
+ batch = self._tokenizer(
241
+ text=[prompt],
242
+ padding=True,
243
+ padding_side="left",
244
+ return_tensors="pt",
245
+ ).to(device)
246
+
247
+ outputs = self.forward(**batch)
248
+
249
+ doc_embeddings.extend([x for x in outputs.doc_embeds[0].cpu().float().numpy()])
250
+ query_embeddings.append(outputs.query_embeds[0].cpu().float().numpy())
251
+ scores = outputs.scores.view(-1).cpu().float().numpy()
252
+ block_weights.append(((1.0 + scores) / 2.0).max())
253
+
254
+ query_embeddings = np.array(query_embeddings)
255
+ doc_embeddings = np.array(doc_embeddings)
256
+
257
+ # weighted average with block_weights
258
+ # block_weights = np.power(block_weights, 2)
259
+ query_embeddings = np.average(query_embeddings, axis=0, weights=block_weights)
260
+
261
+ # calculate the cosine similarity between query and document embeddings
262
+ scores = np.dot(query_embeddings, doc_embeddings.T) / (np.linalg.norm(query_embeddings) * np.linalg.norm(doc_embeddings, axis=1))
263
+
264
+ # if return_doc_embeds:
265
+ # return scores[0].tolist(), doc_embeddings
266
+ # else:
267
+ # return scores[0].tolist()
268
+
269
+ scores_argsort = np.argsort(scores[0])[::-1]
270
+ sorted_documents = []
271
+ sorted_scores = []
272
+ sorted_embeddings = []
273
+ for mid in scores_argsort:
274
+ sorted_scores.append(scores[0][mid])
275
+ sorted_documents.append(documents[mid])
276
+ sorted_embeddings.append(doc_embeddings[mid])
277
+
278
+ top_n = min(top_n or len(sorted_documents), len(sorted_documents))
279
+
280
+ return [
281
+ {
282
+ 'document': sorted_documents[i],
283
+ 'relevance_score': sorted_scores[i],
284
+ 'index': scores_argsort[i].item(),
285
+ 'embedding': sorted_embeddings[i] if return_doc_embeds else None,
286
+ }
287
+ for i in range(top_n)
288
+ ]
special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<|score_token|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<|embed_token|>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<|rerank_token|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ ],
25
+ "eos_token": {
26
+ "content": "<|im_end|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "pad_token": {
33
+ "content": "<|endoftext|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e95945ab0cef486709f760b81efcc7a6e75747f9165d13ead29159737455803
3
+ size 11423225
tokenizer_config.json ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<|score_token|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "<|embed_token|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<|rerank_token|>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ }
237
+ },
238
+ "additional_special_tokens": [
239
+ "<|score_token|>",
240
+ "<|embed_token|>",
241
+ "<|rerank_token|>"
242
+ ],
243
+ "bos_token": null,
244
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for forward_message in messages %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- set message = messages[index] %}\n {%- set current_content = message.content if message.content is not none else '' %}\n {%- set tool_start = '<tool_response>' %}\n {%- set tool_start_length = tool_start|length %}\n {%- set start_of_message = current_content[:tool_start_length] %}\n {%- set tool_end = '</tool_response>' %}\n {%- set tool_end_length = tool_end|length %}\n {%- set start_pos = (current_content|length) - tool_end_length %}\n {%- if start_pos < 0 %}\n {%- set start_pos = 0 %}\n {%- endif %}\n {%- set end_of_message = current_content[start_pos:] %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(start_of_message == tool_start and end_of_message == tool_end) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = (message.content.split('</think>')|last).lstrip('\\n') %}\n {%- set reasoning_content = (message.content.split('</think>')|first).rstrip('\\n') %}\n {%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
245
+ "clean_up_tokenization_spaces": false,
246
+ "eos_token": "<|im_end|>",
247
+ "errors": "replace",
248
+ "extra_special_tokens": {},
249
+ "model_max_length": 131072,
250
+ "pad_token": "<|endoftext|>",
251
+ "padding_side": "left",
252
+ "split_special_tokens": false,
253
+ "tokenizer_class": "Qwen2Tokenizer",
254
+ "unk_token": null
255
+ }