rvo commited on Sep 18

Commit

8bddb8b

verified ·

1 Parent(s): d8c82d3

Upload 25 files

Browse files

Files changed (24) hide show

README.md +209 -3
config_sentence_transformers.json +14 -0
document_0_Transformer/config.json +28 -0
document_0_Transformer/model.safetensors +3 -0
document_0_Transformer/sentence_bert_config.json +7 -0
document_0_Transformer/special_tokens_map.json +37 -0
document_0_Transformer/tokenizer.json +0 -0
document_0_Transformer/tokenizer_config.json +63 -0
document_0_Transformer/vocab.txt +0 -0
document_1_Pooling/config.json +10 -0
logo.png +0 -0
logo.webp +0 -0
modules.json +14 -0
query_0_Transformer/config.json +25 -0
query_0_Transformer/model.safetensors +3 -0
query_0_Transformer/sentence_bert_config.json +7 -0
query_0_Transformer/special_tokens_map.json +37 -0
query_0_Transformer/tokenizer.json +0 -0
query_0_Transformer/tokenizer_config.json +65 -0
query_0_Transformer/vocab.txt +0 -0
query_1_Pooling/config.json +10 -0
query_2_Dense/config.json +6 -0
query_2_Dense/model.safetensors +3 -0
router_config.json +24 -0

README.md CHANGED Viewed

@@ -1,3 +1,209 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model: microsoft/MiniLM-L6-v2
+tags:
+- transformers
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- text-embeddings-inference
+- information-retrieval
+- knowledge-distillation
+language:
+- en
+---
+<div style="display: flex; justify-content: center;">
+    <div style="display: flex; align-items: center; gap: 10px;">
+        <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
+        <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-ir-asym</span>
+    </div>
+</div>
+# Content
+1. [Introduction](#introduction)
+2. [Technical Report](#technical-report)
+3. [Highlights](#highlights)
+4. [Benchmarks](#benchmark-comparison)
+5. [Quickstart](#quickstart)
+6. [Citation](#citation)
+# Introduction
+`mdbr-leaf-ir-asym` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval stage of Retrieval-Augmented Generation (RAG) pipelines.
+This model is the asymmetric variant of [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir), which uses `mdbr-leaf-ir` for queries and [`Snowflake/snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) for documents. The model is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
+If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
+> [!Note]
+> **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
+# Technical Report
+A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
+# Highlights
+* **State-of-the-Art Performance**: `mdbr-leaf-ir-asym` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
+* **Flexible Architecture Support**: `mdbr-leaf-ir-asym` uses an asymmetric retrieval architecture enabling even greater retrieval results.
+* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir-asym` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`.  [See below](#mrl-truncation) for more information.
+## Benchmark Comparison
+The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir-asym` compared to other retrieval models.
+`mdbr-leaf-ir` ranks #1 on the BEIR public leaderboard, and when run in asymmetric "**(asym.)**" mode, the results improve even further.
+| Model                              | Size    | BEIR  Avg. (nDCG@10) |
+|------------------------------------|---------|----------------------|
+| OpenAI text-embedding-3-large      | Unknown | 55.43                |
+| **mdbr-leaf-ir (asym.)**           | 23M     | **54.03**            |
+| **mdbr-leaf-ir**                   | 23M     | **53.55**            |
+| snowflake-arctic-embed-s           | 32M     | 51.98                |
+| bge-small-en-v1.5                  | 33M     | 51.65                |
+| OpenAI text-embedding-3-small      | Unknown | 51.08                |
+| granite-embedding-small-english-r2 | 47M     | 50.87                |
+| snowflake-arctic-embed-xs          | 23M     | 50.15                |
+| e5-small-v2                        | 33M     | 49.04                |
+| SPLADE++                           | 110M    | 48.88                |
+| MiniLM-L6-v2                       | 23M     | 41.95                |
+| BM25                               | –       | 41.14                |
+# Quickstart
+## Sentence Transformers
+```python
+from sentence_transformers import SentenceTransformer
+# Load the model
+model = SentenceTransformer("MongoDB/mdbr-leaf-ir-asym")
+# Example queries and documents
+queries = [
+    "What is machine learning?",
+    "How does neural network training work?",
+]
+documents = [
+    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
+    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
+]
+# Encode queries and documents
+query_embeddings = model.encode_query(queries)
+document_embeddings = model.encode_document(documents)
+# Compute similarity scores
+scores = model.similarity(query_embeddings, document_embeddings)
+# Print results
+for i, query in enumerate(queries):
+    print(f"Query: {query}")
+    for j, doc in enumerate(documents):
+        print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
+# Query: What is machine learning?
+#  Similarity: 0.6729 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
+#  Similarity: 0.4472 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...
+# Query: How does neural network training work?
+#  Similarity: 0.4080 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
+#  Similarity: 0.5477 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...
+```
+## Transformers Usage
+See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
+## Asymmetric Retrieval Setup
+`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model. This generally outperforms the symmetric setup in which both queries and documents are encoded with `leaf`.
+To use exclusively the leaf model, use [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir).
+## MRL Truncation
+Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
+```python
+query_embeds = model.encode_query(queries, truncate_dim=256)
+doc_embeds = model.encode_document(documents, truncate_dim=256)
+similarities = model.similarity(query_embeds, doc_embeds)
+print('After MRL:')
+print(f"* Embeddings dimension: {query_embeds.shape[1]}")
+print(f"* Similarities:\n{similarities}")
+# * Embeddings dimension: 256
+# * Similarities:
+# tensor([[0.7027, 0.4943],
+#         [0.4388, 0.5820]])
+```
+## Vector Quantization
+Vector quantization, for example to `int8` or `binary`, can be performed as follows:
+**Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
+Good initial values, according to the [teacher model's documentation](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5#compressing-to-128-bytes), are:
+* `int8`: -0.3 and +0.3
+* `int4`: -0.18 and +0.18
+```python
+from sentence_transformers.quantization import quantize_embeddings
+import torch
+query_embeds = model.encode(queries, prompt_name="query")
+doc_embeds = model.encode(documents)
+# Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
+ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
+query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
+doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
+# Calculate similarities; cast to int64 to avoid under/overflow
+similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
+print('After quantization:')
+print(f"* Embeddings type: {query_embeds.dtype}")
+print(f"* Similarities:\n{similarities}")
+# After quantization:
+# * Embeddings type: int8
+# * Similarities:
+#  [[118022  79111]
+#   [ 72961  98333]]
+```
+# Evaluation
+Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/evaluate_models.ipynb).
+# Citation
+If you use this model in your work, please cite:
+```bibtex
+@misc{mdbr_leaf,
+      title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
+      author={Robin Vujanic and Thomas Rueckstiess},
+      year={2025},
+      eprint={2509.12539},
+      archivePrefix={arXiv},
+      primaryClass={cs.IR},
+      url={https://arxiv.org/abs/2509.12539},
+}
+```
+# License
+This model is released under Apache 2.0 License.
+# Contact
+For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected].
+# Acknowledgments
+This model was created by @tomaarsen - we thank him for his contribution to this project.

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SentenceTransformer",
+  "__version__": {
+    "sentence_transformers": "5.1.0",
+    "transformers": "4.56.1",
+    "pytorch": "2.8.0+cu126"
+  },
+  "prompts": {
+    "query": "Represent this sentence for searching relevant passages: ",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

document_0_Transformer/config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "matryoshka_dimensions": [
+    256
+  ],
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.56.1",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

document_0_Transformer/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbc309c04977a2cfd24bc111354eeb102ade3eaf2e881d4edc8b99d67725660f
+size 134

document_0_Transformer/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false,
+    "model_args": {
+        "add_pooling_layer": false
+    }
+}

document_0_Transformer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

document_0_Transformer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

document_0_Transformer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

document_0_Transformer/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

document_1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": true,
+    "pooling_mode_mean_tokens": false,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

logo.png ADDED Viewed

logo.webp ADDED Viewed

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Router"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

query_0_Transformer/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.56.1",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

query_0_Transformer/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad488006ff63f39aa5a080ba7ceb59174e0c8d2e42ffe6d1dec42d19a48787b4
+size 133

query_0_Transformer/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false,
+    "model_args": {
+        "add_pooling_layer": false
+    }
+}

query_0_Transformer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

query_0_Transformer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

query_0_Transformer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

query_0_Transformer/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

query_1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 384,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

query_2_Dense/config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+    "in_features": 384,
+    "out_features": 768,
+    "bias": true,
+    "activation_function": "torch.nn.modules.linear.Identity"
+}

query_2_Dense/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1881669a55f4e4bb33b715c522020fc322f2bc4cf2a96ec9a327b902cff0d582
+size 132

router_config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "types": {
+        "query_0_Transformer": "sentence_transformers.models.Transformer.Transformer",
+        "query_1_Pooling": "sentence_transformers.models.Pooling.Pooling",
+        "query_2_Dense": "sentence_transformers.models.Dense.Dense",
+        "document_0_Transformer": "sentence_transformers.models.Transformer.Transformer",
+        "document_1_Pooling": "sentence_transformers.models.Pooling.Pooling"
+    },
+    "structure": {
+        "query": [
+            "query_0_Transformer",
+            "query_1_Pooling",
+            "query_2_Dense"
+        ],
+        "document": [
+            "document_0_Transformer",
+            "document_1_Pooling"
+        ]
+    },
+    "parameters": {
+        "default_route": "document",
+        "allow_empty_key": true
+    }
+}