SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the msmarco, natural_questions, gooaq, ccnews and hotpotqa datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hotchpotch/ModernBERT-embedding-CMNRL")
# Run inference
queries = [
    "what is the best paying engineering job",
]
documents = [
    "The 20 highest-paying jobs for engineering majors. Engineering jobs pay well. To find out just how lucrative they really are, we turned to PayScale, the creator of the world's largest compensation database. To find the 20 highest-paying jobs for engineering majors, PayScale first identified the most common jobs for those with a bachelor's degree (and nothing more) who work full-time in the US. Chief architects and vice president's of business development topped the list, both earning an impressive $151,000 a year.",
    'Aviation is a combat arms branch which encompasses 80 percent of the commissioned officer operational flying positions within the Army (less those in Aviation Material Management and Medical Service Corps).',
    'Depending on the thickness and size of the chop, it can take anywhere from eight to 30 minutes. Hereâ\x80\x99s a helpful cooking chart and some tips to achieve delicious pork chops every time. Pork chops are a crowd pleaser, especially once you master your grilling technique. For safe consumption, itâ\x80\x99s recommended to cook pork until it reaches an internal temperature of 145°F or 65°C. Depending on the cut and thickness of your chop, the time it may take to reach this can vary. To make sure your chops are the right temperature, use a digital meat thermometer.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.8588,  0.1637, -0.0107]])

Evaluation

Metrics

Information Retrieval

  • Datasets: NanoClimateFEVER, NanoDBPedia, NanoFEVER, NanoFiQA2018, NanoHotpotQA, NanoMSMARCO, NanoNFCorpus, NanoNQ, NanoQuoraRetrieval, NanoSCIDOCS, NanoArguAna, NanoSciFact and NanoTouche2020
  • Evaluated with InformationRetrievalEvaluator
Metric NanoClimateFEVER NanoDBPedia NanoFEVER NanoFiQA2018 NanoHotpotQA NanoMSMARCO NanoNFCorpus NanoNQ NanoQuoraRetrieval NanoSCIDOCS NanoArguAna NanoSciFact NanoTouche2020
cosine_accuracy@10 0.68 0.94 0.98 0.74 0.9 0.82 0.64 0.84 0.98 0.82 0.86 0.78 0.9592
cosine_precision@10 0.092 0.396 0.102 0.118 0.128 0.082 0.262 0.09 0.132 0.172 0.086 0.088 0.4143
cosine_recall@10 0.3807 0.2744 0.9333 0.5585 0.64 0.82 0.1304 0.8 0.9693 0.3527 0.86 0.77 0.284
cosine_ndcg@10 0.3249 0.5073 0.8029 0.4646 0.6343 0.5555 0.3071 0.6336 0.9391 0.322 0.5328 0.6297 0.4754
cosine_mrr@10 0.421 0.7423 0.7769 0.529 0.8229 0.4713 0.4497 0.5863 0.94 0.4662 0.4286 0.5945 0.7045
cosine_map@10 0.2431 0.3807 0.7485 0.3803 0.5439 0.4713 0.2273 0.5725 0.9192 0.2136 0.4286 0.5785 0.3266

Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with NanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "climatefever",
            "dbpedia",
            "fever",
            "fiqa2018",
            "hotpotqa",
            "msmarco",
            "nfcorpus",
            "nq",
            "quoraretrieval",
            "scidocs",
            "arguana",
            "scifact",
            "touche2020"
        ],
        "dataset_id": "sentence-transformers/NanoBEIR-en"
    }
    
Metric Value
cosine_accuracy@10 0.8415
cosine_precision@10 0.1663
cosine_recall@10 0.5979
cosine_ndcg@10 0.5484
cosine_mrr@10 0.6102
cosine_map@10 0.4642

Training Details

Training Datasets

msmarco

msmarco

  • Dataset: msmarco at 84ed2d3
  • Size: 502,939 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 4 tokens
    • mean: 9.26 tokens
    • max: 25 tokens
    • min: 19 tokens
    • mean: 80.68 tokens
    • max: 230 tokens
  • Samples:
    query positive
    is cabinet refacing worth the cost? Fans of refacing say this mini-makeover can give a kitchen a whole new look at a much lower cost than installing all-new cabinets. Cabinet refacing can save up to 50 percent compared to the cost of replacing, says Cheryl Catalano, owner of Kitchen Solvers, a cabinet refacing franchise in Napierville, Illinois. From.
    is the fovea ethmoidalis a bone Ethmoid bone/fovea ethmoidalis. The medial portion of the ethmoid bone is a cruciate membranous bone composed of the crista galli, cribriform plate, and perpendicular ethmoidal plate. The crista is a thick piece of bone, shaped like a “cock's comb,” that projects intracranially and attaches to the falx cerebri.
    average pitches per inning The likelihood of a pitcher completing nine innings if he throws an average of 14 pitches or less per inning is reinforced by the totals of the 89 games in which pitchers did actually complete nine innings of work.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false
    }
    
natural_questions

natural_questions

  • Dataset: natural_questions at f9e894e
  • Size: 100,231 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 10 tokens
    • mean: 12.46 tokens
    • max: 22 tokens
    • min: 12 tokens
    • mean: 137.8 tokens
    • max: 512 tokens
  • Samples:
    query positive
    difference between russian blue and british blue cat Russian Blue The coat is known as a "double coat", with the undercoat being soft, downy and equal in length to the guard hairs, which are an even blue with silver tips. However, the tail may have a few very dull, almost unnoticeable stripes. The coat is described as thick, plush and soft to the touch. The feeling is softer than the softest silk. The silver tips give the coat a shimmering appearance. Its eyes are almost always a dark and vivid green. Any white patches of fur or yellow eyes in adulthood are seen as flaws in show cats.[3] Russian Blues should not be confused with British Blues (which are not a distinct breed, but rather a British Shorthair with a blue coat as the British Shorthair breed itself comes in a wide variety of colors and patterns), nor the Chartreux or Korat which are two other naturally occurring breeds of blue cats, although they have similar traits.
    who played the little girl on mrs doubtfire Mara Wilson Mara Elizabeth Wilson[2] (born July 24, 1987) is an American writer and former child actress. She is known for playing Natalie Hillard in Mrs. Doubtfire (1993), Susan Walker in Miracle on 34th Street (1994), Matilda Wormwood in Matilda (1996) and Lily Stone in Thomas and the Magic Railroad (2000). Since retiring from film acting, Wilson has focused on writing.
    what year did the movie the sound of music come out The Sound of Music (film) The film was released on March 2, 1965 in the United States, initially as a limited roadshow theatrical release. Although critical response to the film was widely mixed, the film was a major commercial success, becoming the number one box office movie after four weeks, and the highest-grossing film of 1965. By November 1966, The Sound of Music had become the highest-grossing film of all-time—surpassing Gone with the Wind—and held that distinction for five years. The film was just as popular throughout the world, breaking previous box-office records in twenty-nine countries. Following an initial theatrical release that lasted four and a half years, and two successful re-releases, the film sold 283 million admissions worldwide and earned a total worldwide gross of $286,000,000.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false
    }
    
gooaq

gooaq

  • Dataset: gooaq at b089f72
  • Size: 3,012,496 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 8 tokens
    • mean: 12.05 tokens
    • max: 21 tokens
    • min: 13 tokens
    • mean: 59.08 tokens
    • max: 116 tokens
  • Samples:
    query positive
    how do i program my directv remote with my tv? ['Press MENU on your remote.', 'Select Settings & Help > Settings > Remote Control > Program Remote.', 'Choose the device (TV, audio, DVD) you wish to program. ... ', 'Follow the on-screen prompts to complete programming.']
    are rodrigues fruit bats nocturnal? Before its numbers were threatened by habitat destruction, storms, and hunting, some of those groups could number 500 or more members. Sunrise, sunset. Rodrigues fruit bats are most active at dawn, at dusk, and at night.
    why does your heart rate increase during exercise bbc bitesize? During exercise there is an increase in physical activity and muscle cells respire more than they do when the body is at rest. The heart rate increases during exercise. The rate and depth of breathing increases - this makes sure that more oxygen is absorbed into the blood, and more carbon dioxide is removed from it.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false
    }
    
ccnews

ccnews

  • Dataset: ccnews at 6118cc0
  • Size: 614,664 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 7 tokens
    • mean: 16.71 tokens
    • max: 56 tokens
    • min: 18 tokens
    • mean: 349.3 tokens
    • max: 512 tokens
  • Samples:
    query positive
    Rupee rises for 2nd consecutive day, gains 8 paise against US dollar today The rupee rose 8 paise to close at 64.37 apiece US dollar at the interbank foreign exchange market today.
    The Indian rupee appreciated for the second consecutive day and gained over 8 paise against the US dollar on Monday. The domestic currency opened unchanged today, very quickly edged higher and extended the gains to hit a day’s high of 64.34. The rupee rose 8 paise to close at 64.37 apiece US dollar at the interbank foreign exchange market today. The Reserve Bank of India fixed the reference rate of the rupee at 64.3616 against the US dollar on Monday. The Indian rupee moved up 23 paise against the US dollar in just 2 days as Narendra Modi led BJP is most likely to conquer Gujarat for the fifth consecutive time in the state elections. Way back in March 2017, the rupee appreciated as much as 79 paise in a single day to close at a 16-month high against the US dollar after Bharatiya Janata Party’s landslide victory in Uttar Pradesh state elections.
    Finance Minister Arun Jaitley is all ...
    Microsoft pushes for ‘Digital Geneva Convention’ for cybercrimes Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict. ( Image for representation, Source: Reuters) Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict. ( Image for representation, Source: Reuters)
    Microsoft President Brad Smith on Tuesday pressed the world’s governments to form an international body to protect civilians from state-sponsored hacking, saying recent high-profile attacks showed a need for global norms to police government activity in cyberspace.
    Countries need to develop and abide by global rules for cyber attacks similar to those established for armed conflict at the 1949 Geneva Convention that followed World War Two, Smith said. Technology companies, he added, need to preserve trust and stability online by pledging neutrality in cyber conflict.
    Watch all our videos from Express Technology
    “We need a Digital Geneva Convention that will commit go...
    Prince Gets Purple Pantone Color ‘Love Symbol #2’ By Abby Hassler
    Prince, also known as “The Purple One” is finally getting his very own Pantone color. Pantone and Prince’s Estate announced today (August 14) that the late singer has his own purple hue, “Love Symbol #2,” which is named after the iconic symbol the singer used as an emblem for his name.
    Related: Wesley Snipes Beat Out Prince for His Role in Michael Jackson’s ‘Bad’
    “The color purple was synonymous with who Prince was and will always be. This is an incredible way for his legacy to live on forever,” Troy Carter, entertainment adviser to Prince’s Estate, said.
    “We are honored to have worked on the development of Love Symbol #2, a distinctive new purple shade created in memory of Prince, ‘the purple one,'” added Laurie Pressman, vice president of the Pantone Color Institute. “A musical icon known for his artistic brilliance, Love Symbol #2 is emblematic of Prince’s distinctive style. Long associated with the purple family, Love Symbol #2 enables Prince’s unique purple shade t...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false
    }
    
hotpotqa

hotpotqa

  • Dataset: hotpotqa at f07d3cd
  • Size: 84,516 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 8 tokens
    • mean: 25.82 tokens
    • max: 140 tokens
    • min: 18 tokens
    • mean: 103.34 tokens
    • max: 350 tokens
  • Samples:
    query positive
    Which magazine covers a wider range of topics, Decibel or Paper? Decibel (magazine) Decibel is a monthly heavy metal magazine published by the Philadelphia-based Red Flag Media since October 2004. Its sections include Upfront, Features, Reviews, Guest Columns and the Decibel Hall of Fame. The magazine's tag-line is currently "Extremely Extreme" (previously "The New Noise"); the editor-in-chief is Albert Mudrian.
    what bbc drama features such actors as Sian Reeves and Ben Daniels? Siân Reeves Siân Reeves (born Siân Rivers on May 9, 1966 in West Bromwich) is a British actress, most famous for playing the role of Sydney Henshall in the BBC drama "Cutting It", and for playing villain Sally Spode in "Emmerdale".
    What size population does the County Connection public transit in Concord, California service? County Connection The County Connection (officially, the Central Contra Costa Transit Authority, CCCTA) is a Concord-based public transit agency operating fixed-route bus and ADA paratransit (County Connection LINK) service in and around central Contra Costa County in the San Francisco Bay Area. Established in 1980 as a joint powers authority, CCCTA assumed control of public bus service within central Contra Costa first begun by Oakland-based AC Transit as it expanded into suburban Contra Costa County in the mid-1970s (especially after the opening of BART).
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 8192
  • per_device_eval_batch_size: 512
  • learning_rate: 0.0001
  • weight_decay: 0.01
  • num_train_epochs: 1
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_drop_last: True
  • dataloader_num_workers: 12
  • dataloader_prefetch_factor: 2
  • remove_unused_columns: False
  • optim: adamw_torch
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8192
  • per_device_eval_batch_size: 512
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 12
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss NanoClimateFEVER_cosine_ndcg@10 NanoDBPedia_cosine_ndcg@10 NanoFEVER_cosine_ndcg@10 NanoFiQA2018_cosine_ndcg@10 NanoHotpotQA_cosine_ndcg@10 NanoMSMARCO_cosine_ndcg@10 NanoNFCorpus_cosine_ndcg@10 NanoNQ_cosine_ndcg@10 NanoQuoraRetrieval_cosine_ndcg@10 NanoSCIDOCS_cosine_ndcg@10 NanoArguAna_cosine_ndcg@10 NanoSciFact_cosine_ndcg@10 NanoTouche2020_cosine_ndcg@10 NanoBEIR_mean_cosine_ndcg@10
0.0190 10 8.226 - - - - - - - - - - - - - -
0.0381 20 5.503 - - - - - - - - - - - - - -
0.0571 30 3.4245 - - - - - - - - - - - - - -
0.0762 40 1.907 - - - - - - - - - - - - - -
0.0952 50 1.3564 - - - - - - - - - - - - - -
0.1143 60 1.1161 - - - - - - - - - - - - - -
0.1333 70 1.0269 - - - - - - - - - - - - - -
0.1524 80 0.804 - - - - - - - - - - - - - -
0.1714 90 0.7459 - - - - - - - - - - - - - -
0.1905 100 0.6271 - - - - - - - - - - - - - -
0.2095 110 0.8254 - - - - - - - - - - - - - -
0.2286 120 0.7112 - - - - - - - - - - - - - -
0.2476 130 0.6292 - - - - - - - - - - - - - -
0.2667 140 0.6022 - - - - - - - - - - - - - -
0.2857 150 0.782 - - - - - - - - - - - - - -
0.3048 160 0.5896 - - - - - - - - - - - - - -
0.3238 170 0.6357 - - - - - - - - - - - - - -
0.3429 180 0.6329 - - - - - - - - - - - - - -
0.3619 190 0.7885 - - - - - - - - - - - - - -
0.3810 200 0.484 - - - - - - - - - - - - - -
0.4 210 0.5834 - - - - - - - - - - - - - -
0.4190 220 0.5229 - - - - - - - - - - - - - -
0.4381 230 0.5112 - - - - - - - - - - - - - -
0.4571 240 0.4973 - - - - - - - - - - - - - -
0.4762 250 0.5582 - - - - - - - - - - - - - -
0.4952 260 0.437 - - - - - - - - - - - - - -
0.5143 270 0.5495 - - - - - - - - - - - - - -
0.5333 280 0.5378 - - - - - - - - - - - - - -
0.5524 290 0.4802 - - - - - - - - - - - - - -
0.5714 300 0.5221 - - - - - - - - - - - - - -
0.5905 310 0.5243 - - - - - - - - - - - - - -
0.6095 320 0.4762 - - - - - - - - - - - - - -
0.6286 330 0.571 - - - - - - - - - - - - - -
0.6476 340 0.465 - - - - - - - - - - - - - -
0.6667 350 0.5644 - - - - - - - - - - - - - -
0.6857 360 0.5494 - - - - - - - - - - - - - -
0.7048 370 0.5148 - - - - - - - - - - - - - -
0.7238 380 0.5109 - - - - - - - - - - - - - -
0.7429 390 0.5357 - - - - - - - - - - - - - -
0.7619 400 0.4638 - - - - - - - - - - - - - -
0.7810 410 0.403 - - - - - - - - - - - - - -
0.8 420 0.5423 - - - - - - - - - - - - - -
0.8190 430 0.4469 - - - - - - - - - - - - - -
0.8381 440 0.5935 - - - - - - - - - - - - - -
0.8571 450 0.3879 - - - - - - - - - - - - - -
0.8762 460 0.5288 - - - - - - - - - - - - - -
0.8952 470 0.5372 - - - - - - - - - - - - - -
0.9143 480 0.4814 - - - - - - - - - - - - - -
0.9333 490 0.4817 - - - - - - - - - - - - - -
0.9524 500 0.3893 - - - - - - - - - - - - - -
0.9714 510 0.434 - - - - - - - - - - - - - -
0.9905 520 0.3894 - - - - - - - - - - - - - -
0 521 - 0.3249 0.5073 0.8029 0.4646 0.6343 0.5555 0.3071 0.6336 0.9391 0.3220 0.5328 0.6297 0.4754 0.5484

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu129
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
8
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotchpotch/ModernBERT-embedding-CMNRL

Finetuned
(1013)
this model

Datasets used to train hotchpotch/ModernBERT-embedding-CMNRL

Papers for hotchpotch/ModernBERT-embedding-CMNRL

Evaluation results