SentenceTransformer based on answerdotai/ModernBERT-base
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the msmarco, natural_questions, gooaq, ccnews and hotpotqa datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Datasets:
- Language: en
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("hotchpotch/ModernBERT-embedding-CMNRL")
queries = [
"what is the best paying engineering job",
]
documents = [
"The 20 highest-paying jobs for engineering majors. Engineering jobs pay well. To find out just how lucrative they really are, we turned to PayScale, the creator of the world's largest compensation database. To find the 20 highest-paying jobs for engineering majors, PayScale first identified the most common jobs for those with a bachelor's degree (and nothing more) who work full-time in the US. Chief architects and vice president's of business development topped the list, both earning an impressive $151,000 a year.",
'Aviation is a combat arms branch which encompasses 80 percent of the commissioned officer operational flying positions within the Army (less those in Aviation Material Management and Medical Service Corps).',
'Depending on the thickness and size of the chop, it can take anywhere from eight to 30 minutes. Hereâ\x80\x99s a helpful cooking chart and some tips to achieve delicious pork chops every time. Pork chops are a crowd pleaser, especially once you master your grilling technique. For safe consumption, itâ\x80\x99s recommended to cook pork until it reaches an internal temperature of 145°F or 65°C. Depending on the cut and thickness of your chop, the time it may take to reach this can vary. To make sure your chops are the right temperature, use a digital meat thermometer.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
Evaluation
Metrics
Information Retrieval
- Datasets:
NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
- Evaluated with
InformationRetrievalEvaluator
| Metric |
NanoClimateFEVER |
NanoDBPedia |
NanoFEVER |
NanoFiQA2018 |
NanoHotpotQA |
NanoMSMARCO |
NanoNFCorpus |
NanoNQ |
NanoQuoraRetrieval |
NanoSCIDOCS |
NanoArguAna |
NanoSciFact |
NanoTouche2020 |
| cosine_accuracy@10 |
0.68 |
0.94 |
0.98 |
0.74 |
0.9 |
0.82 |
0.64 |
0.84 |
0.98 |
0.82 |
0.86 |
0.78 |
0.9592 |
| cosine_precision@10 |
0.092 |
0.396 |
0.102 |
0.118 |
0.128 |
0.082 |
0.262 |
0.09 |
0.132 |
0.172 |
0.086 |
0.088 |
0.4143 |
| cosine_recall@10 |
0.3807 |
0.2744 |
0.9333 |
0.5585 |
0.64 |
0.82 |
0.1304 |
0.8 |
0.9693 |
0.3527 |
0.86 |
0.77 |
0.284 |
| cosine_ndcg@10 |
0.3249 |
0.5073 |
0.8029 |
0.4646 |
0.6343 |
0.5555 |
0.3071 |
0.6336 |
0.9391 |
0.322 |
0.5328 |
0.6297 |
0.4754 |
| cosine_mrr@10 |
0.421 |
0.7423 |
0.7769 |
0.529 |
0.8229 |
0.4713 |
0.4497 |
0.5863 |
0.94 |
0.4662 |
0.4286 |
0.5945 |
0.7045 |
| cosine_map@10 |
0.2431 |
0.3807 |
0.7485 |
0.3803 |
0.5439 |
0.4713 |
0.2273 |
0.5725 |
0.9192 |
0.2136 |
0.4286 |
0.5785 |
0.3266 |
Nano BEIR
- Dataset:
NanoBEIR_mean
- Evaluated with
NanoBEIREvaluator with these parameters:{
"dataset_names": [
"climatefever",
"dbpedia",
"fever",
"fiqa2018",
"hotpotqa",
"msmarco",
"nfcorpus",
"nq",
"quoraretrieval",
"scidocs",
"arguana",
"scifact",
"touche2020"
],
"dataset_id": "sentence-transformers/NanoBEIR-en"
}
| Metric |
Value |
| cosine_accuracy@10 |
0.8415 |
| cosine_precision@10 |
0.1663 |
| cosine_recall@10 |
0.5979 |
| cosine_ndcg@10 |
0.5484 |
| cosine_mrr@10 |
0.6102 |
| cosine_map@10 |
0.4642 |
Training Details
Training Datasets
msmarco
msmarco
- Dataset: msmarco at 84ed2d3
- Size: 502,939 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 4 tokens
- mean: 9.26 tokens
- max: 25 tokens
|
- min: 19 tokens
- mean: 80.68 tokens
- max: 230 tokens
|
- Samples:
| query |
positive |
is cabinet refacing worth the cost? |
Fans of refacing say this mini-makeover can give a kitchen a whole new look at a much lower cost than installing all-new cabinets. Cabinet refacing can save up to 50 percent compared to the cost of replacing, says Cheryl Catalano, owner of Kitchen Solvers, a cabinet refacing franchise in Napierville, Illinois. From. |
is the fovea ethmoidalis a bone |
Ethmoid bone/fovea ethmoidalis. The medial portion of the ethmoid bone is a cruciate membranous bone composed of the crista galli, cribriform plate, and perpendicular ethmoidal plate. The crista is a thick piece of bone, shaped like a âcock's comb,â that projects intracranially and attaches to the falx cerebri. |
average pitches per inning |
The likelihood of a pitcher completing nine innings if he throws an average of 14 pitches or less per inning is reinforced by the totals of the 89 games in which pitchers did actually complete nine innings of work. |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false
}
natural_questions
natural_questions
- Dataset: natural_questions at f9e894e
- Size: 100,231 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 10 tokens
- mean: 12.46 tokens
- max: 22 tokens
|
- min: 12 tokens
- mean: 137.8 tokens
- max: 512 tokens
|
- Samples:
| query |
positive |
difference between russian blue and british blue cat |
Russian Blue The coat is known as a "double coat", with the undercoat being soft, downy and equal in length to the guard hairs, which are an even blue with silver tips. However, the tail may have a few very dull, almost unnoticeable stripes. The coat is described as thick, plush and soft to the touch. The feeling is softer than the softest silk. The silver tips give the coat a shimmering appearance. Its eyes are almost always a dark and vivid green. Any white patches of fur or yellow eyes in adulthood are seen as flaws in show cats.[3] Russian Blues should not be confused with British Blues (which are not a distinct breed, but rather a British Shorthair with a blue coat as the British Shorthair breed itself comes in a wide variety of colors and patterns), nor the Chartreux or Korat which are two other naturally occurring breeds of blue cats, although they have similar traits. |
who played the little girl on mrs doubtfire |
Mara Wilson Mara Elizabeth Wilson[2] (born July 24, 1987) is an American writer and former child actress. She is known for playing Natalie Hillard in Mrs. Doubtfire (1993), Susan Walker in Miracle on 34th Street (1994), Matilda Wormwood in Matilda (1996) and Lily Stone in Thomas and the Magic Railroad (2000). Since retiring from film acting, Wilson has focused on writing. |
what year did the movie the sound of music come out |
The Sound of Music (film) The film was released on March 2, 1965 in the United States, initially as a limited roadshow theatrical release. Although critical response to the film was widely mixed, the film was a major commercial success, becoming the number one box office movie after four weeks, and the highest-grossing film of 1965. By November 1966, The Sound of Music had become the highest-grossing film of all-time—surpassing Gone with the Wind—and held that distinction for five years. The film was just as popular throughout the world, breaking previous box-office records in twenty-nine countries. Following an initial theatrical release that lasted four and a half years, and two successful re-releases, the film sold 283 million admissions worldwide and earned a total worldwide gross of $286,000,000. |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false
}
gooaq
gooaq
- Dataset: gooaq at b089f72
- Size: 3,012,496 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 8 tokens
- mean: 12.05 tokens
- max: 21 tokens
|
- min: 13 tokens
- mean: 59.08 tokens
- max: 116 tokens
|
- Samples:
| query |
positive |
how do i program my directv remote with my tv? |
['Press MENU on your remote.', 'Select Settings & Help > Settings > Remote Control > Program Remote.', 'Choose the device (TV, audio, DVD) you wish to program. ... ', 'Follow the on-screen prompts to complete programming.'] |
are rodrigues fruit bats nocturnal? |
Before its numbers were threatened by habitat destruction, storms, and hunting, some of those groups could number 500 or more members. Sunrise, sunset. Rodrigues fruit bats are most active at dawn, at dusk, and at night. |
why does your heart rate increase during exercise bbc bitesize? |
During exercise there is an increase in physical activity and muscle cells respire more than they do when the body is at rest. The heart rate increases during exercise. The rate and depth of breathing increases - this makes sure that more oxygen is absorbed into the blood, and more carbon dioxide is removed from it. |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false
}
ccnews
ccnews
- Dataset: ccnews at 6118cc0
- Size: 614,664 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 7 tokens
- mean: 16.71 tokens
- max: 56 tokens
|
- min: 18 tokens
- mean: 349.3 tokens
- max: 512 tokens
|
- Samples:
| query |
positive |
Rupee rises for 2nd consecutive day, gains 8 paise against US dollar today |
The rupee rose 8 paise to close at 64.37 apiece US dollar at the interbank foreign exchange market today. The Indian rupee appreciated for the second consecutive day and gained over 8 paise against the US dollar on Monday. The domestic currency opened unchanged today, very quickly edged higher and extended the gains to hit a day’s high of 64.34. The rupee rose 8 paise to close at 64.37 apiece US dollar at the interbank foreign exchange market today. The Reserve Bank of India fixed the reference rate of the rupee at 64.3616 against the US dollar on Monday. The Indian rupee moved up 23 paise against the US dollar in just 2 days as Narendra Modi led BJP is most likely to conquer Gujarat for the fifth consecutive time in the state elections. Way back in March 2017, the rupee appreciated as much as 79 paise in a single day to close at a 16-month high against the US dollar after Bharatiya Janata Party’s landslide victory in Uttar Pradesh state elections. Finance Minister Arun Jaitley is all ... |
Microsoft pushes for ‘Digital Geneva Convention’ for cybercrimes |
Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict. ( Image for representation, Source: Reuters) Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict. ( Image for representation, Source: Reuters) Microsoft President Brad Smith on Tuesday pressed the world’s governments to form an international body to protect civilians from state-sponsored hacking, saying recent high-profile attacks showed a need for global norms to police government activity in cyberspace. Countries need to develop and abide by global rules for cyber attacks similar to those established for armed conflict at the 1949 Geneva Convention that followed World War Two, Smith said. Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict. Watch all our videos from Express Technology “We need a Digital Geneva Convention that will commit go... |
Prince Gets Purple Pantone Color ‘Love Symbol #2’ |
By Abby Hassler Prince, also known as “The Purple One” is finally getting his very own Pantone color. Pantone and Prince’s Estate announced today (August 14) that the late singer has his own purple hue, “Love Symbol #2,” which is named after the iconic symbol the singer used as an emblem for his name. Related: Wesley Snipes Beat Out Prince for His Role in Michael Jackson’s ‘Bad’ “The color purple was synonymous with who Prince was and will always be. This is an incredible way for his legacy to live on forever,” Troy Carter, entertainment adviser to Prince’s Estate, said. “We are honored to have worked on the development of Love Symbol #2, a distinctive new purple shade created in memory of Prince, ‘the purple one,'” added Laurie Pressman, vice president of the Pantone Color Institute. “A musical icon known for his artistic brilliance, Love Symbol #2 is emblematic of Prince’s distinctive style. Long associated with the purple family, Love Symbol #2 enables Prince’s unique purple shade t... |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false
}
hotpotqa
hotpotqa
- Dataset: hotpotqa at f07d3cd
- Size: 84,516 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 8 tokens
- mean: 25.82 tokens
- max: 140 tokens
|
- min: 18 tokens
- mean: 103.34 tokens
- max: 350 tokens
|
- Samples:
| query |
positive |
Which magazine covers a wider range of topics, Decibel or Paper? |
Decibel (magazine) Decibel is a monthly heavy metal magazine published by the Philadelphia-based Red Flag Media since October 2004. Its sections include Upfront, Features, Reviews, Guest Columns and the Decibel Hall of Fame. The magazine's tag-line is currently "Extremely Extreme" (previously "The New Noise"); the editor-in-chief is Albert Mudrian. |
what bbc drama features such actors as Sian Reeves and Ben Daniels? |
Siân Reeves Siân Reeves (born Siân Rivers on May 9, 1966 in West Bromwich) is a British actress, most famous for playing the role of Sydney Henshall in the BBC drama "Cutting It", and for playing villain Sally Spode in "Emmerdale". |
What size population does the County Connection public transit in Concord, California service? |
County Connection The County Connection (officially, the Central Contra Costa Transit Authority, CCCTA) is a Concord-based public transit agency operating fixed-route bus and ADA paratransit (County Connection LINK) service in and around central Contra Costa County in the San Francisco Bay Area. Established in 1980 as a joint powers authority, CCCTA assumed control of public bus service within central Contra Costa first begun by Oakland-based AC Transit as it expanded into suburban Contra Costa County in the mid-1970s (especially after the opening of BART). |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false
}
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 8192
per_device_eval_batch_size: 512
learning_rate: 0.0001
weight_decay: 0.01
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
seed: 12
bf16: True
dataloader_drop_last: True
dataloader_num_workers: 12
dataloader_prefetch_factor: 2
remove_unused_columns: False
optim: adamw_torch
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 8192
per_device_eval_batch_size: 512
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0001
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 12
dataloader_prefetch_factor: 2
past_index: -1
disable_tqdm: False
remove_unused_columns: False
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
NanoClimateFEVER_cosine_ndcg@10 |
NanoDBPedia_cosine_ndcg@10 |
NanoFEVER_cosine_ndcg@10 |
NanoFiQA2018_cosine_ndcg@10 |
NanoHotpotQA_cosine_ndcg@10 |
NanoMSMARCO_cosine_ndcg@10 |
NanoNFCorpus_cosine_ndcg@10 |
NanoNQ_cosine_ndcg@10 |
NanoQuoraRetrieval_cosine_ndcg@10 |
NanoSCIDOCS_cosine_ndcg@10 |
NanoArguAna_cosine_ndcg@10 |
NanoSciFact_cosine_ndcg@10 |
NanoTouche2020_cosine_ndcg@10 |
NanoBEIR_mean_cosine_ndcg@10 |
| 0.0190 |
10 |
8.226 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0381 |
20 |
5.503 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0571 |
30 |
3.4245 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0762 |
40 |
1.907 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.0952 |
50 |
1.3564 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1143 |
60 |
1.1161 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1333 |
70 |
1.0269 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1524 |
80 |
0.804 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1714 |
90 |
0.7459 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.1905 |
100 |
0.6271 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2095 |
110 |
0.8254 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2286 |
120 |
0.7112 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2476 |
130 |
0.6292 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2667 |
140 |
0.6022 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.2857 |
150 |
0.782 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3048 |
160 |
0.5896 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3238 |
170 |
0.6357 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3429 |
180 |
0.6329 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3619 |
190 |
0.7885 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.3810 |
200 |
0.484 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4 |
210 |
0.5834 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4190 |
220 |
0.5229 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4381 |
230 |
0.5112 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4571 |
240 |
0.4973 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4762 |
250 |
0.5582 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.4952 |
260 |
0.437 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5143 |
270 |
0.5495 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5333 |
280 |
0.5378 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5524 |
290 |
0.4802 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5714 |
300 |
0.5221 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.5905 |
310 |
0.5243 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6095 |
320 |
0.4762 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6286 |
330 |
0.571 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6476 |
340 |
0.465 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6667 |
350 |
0.5644 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.6857 |
360 |
0.5494 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7048 |
370 |
0.5148 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7238 |
380 |
0.5109 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7429 |
390 |
0.5357 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7619 |
400 |
0.4638 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.7810 |
410 |
0.403 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8 |
420 |
0.5423 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8190 |
430 |
0.4469 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8381 |
440 |
0.5935 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8571 |
450 |
0.3879 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8762 |
460 |
0.5288 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.8952 |
470 |
0.5372 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9143 |
480 |
0.4814 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9333 |
490 |
0.4817 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9524 |
500 |
0.3893 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9714 |
510 |
0.434 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0.9905 |
520 |
0.3894 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
- |
| 0 |
521 |
- |
0.3249 |
0.5073 |
0.8029 |
0.4646 |
0.6343 |
0.5555 |
0.3071 |
0.6336 |
0.9391 |
0.3220 |
0.5328 |
0.6297 |
0.4754 |
0.5484 |
Framework Versions
- Python: 3.11.14
- Sentence Transformers: 5.3.0.dev0
- Transformers: 4.57.1
- PyTorch: 2.8.0+cu129
- Accelerate: 1.12.0
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}