diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -1,145 +1,172 @@ --- -language: -- en tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer -- dataset_size:4836185 -- loss:MultipleNegativesRankingLoss +- dataset_size:296658 - loss:CachedMultipleNegativesRankingLoss -- loss:SoftmaxLoss -- loss:AnglELoss -- loss:CoSENTLoss base_model: jhu-clsp/ettin-encoder-17m widget: -- source_sentence: Daniel went to the kitchen. Sandra went back to the kitchen. Daniel - moved to the garden. Sandra grabbed the apple. Sandra went back to the office. - Sandra dropped the apple. Sandra went to the garden. Sandra went back to the bedroom. - Sandra went back to the office. Mary went back to the office. Daniel moved to - the bathroom. Sandra grabbed the apple. Sandra travelled to the garden. Sandra - put down the apple there. Mary went back to the bathroom. Daniel travelled to - the garden. Mary took the milk. Sandra grabbed the apple. Mary left the milk there. - Sandra journeyed to the bedroom. John travelled to the office. John went back - to the garden. Sandra journeyed to the garden. Mary grabbed the milk. Mary left - the milk. Mary grabbed the milk. Mary went to the hallway. John moved to the hallway. - Mary picked up the football. Sandra journeyed to the kitchen. Sandra left the - apple. Mary discarded the milk. John journeyed to the garden. Mary dropped the - football. Daniel moved to the bathroom. Daniel journeyed to the kitchen. Mary - travelled to the bathroom. Daniel went to the bedroom. Mary went to the hallway. - Sandra got the apple. Sandra went back to the hallway. Mary moved to the kitchen. - Sandra dropped the apple there. Sandra grabbed the milk. Sandra journeyed to the - bathroom. John went back to the kitchen. Sandra went to the kitchen. Sandra travelled - to the bathroom. Daniel went to the garden. Daniel moved to the kitchen. Sandra - dropped the milk. Sandra got the milk. Sandra put down the milk. John journeyed - to the garden. Sandra went back to the hallway. Sandra picked up the apple. Sandra - got the football. Sandra moved to the garden. Daniel moved to the bathroom. Daniel - travelled to the garden. Sandra went back to the bathroom. Sandra discarded the - football. +- source_sentence: billy joel just the way you are meaning sentences: - - In the adulthood stage, it can jump, walk, run - - The chocolate is bigger than the container. - - The football before the bathroom was in the garden. -- source_sentence: 'Context: I am devasted. - - Speaker 1: I am very devastated these days. - - Speaker 2: That seems bad and I am sorry to hear that. What happened? - - Speaker 1: My father day 3 weeks ago.I still can''t believe. - - Speaker 2: I am truly sorry to hear that. Please accept my apologies for your - loss. May he rest in peace' + - Isle of Man TT The International Isle of Man TT (Tourist Trophy) Race is an annual + motorcycle sport event run on the Isle of Man in May or June of most years since + its inaugural race in 1907.[3] + - You Make It Feel Like Christmas (song) "You Make It Feel Like Christmas" is a + song recorded by American singer Gwen Stefani for her 2017 holiday album of the + same name. It features guest vocals from Blake Shelton and was released on September + 22, 2017 as the album's lead single through Interscope. The track was written + by Stefani, Justin Tranter, Shelton and Michael Busbee, while production was handled + by Busbee and Eric Valentine. It has been used in Starbucks' "Togetherness" commercial + to promote its Christmas campaign. + - Just the Way You Are (Billy Joel song) Joel shared that the melody and chord progression + for this song came to him while he was dreaming.[3] In an interview on the Howard + Stern Radio Show on November 16, 2010, Joel revealed that the inspiration for + writing the name of the song and how it sounds in the chorus was directly taken + from the last line in the Frankie Valli and the Four Seasons song "Rag Doll"; + which incidentally was also a larger inspiration for Joel's later song, "Uptown + Girl".[4] The song, which Joel had written for his first wife (and also his business + manager at the time) Elizabeth Weber, was not liked by either Joel or his band, + and Joel had originally decided against making the track a part of The Stranger, + but at the request of both Linda Ronstadt and Phoebe Snow (both were recording + in other studios in the same building at the time), he agreed to put the song + on the final mix.[5] However, the album's producer, Phil Ramone, later contradicted + Joel's claim, stating in an interview that they could not afford to exclude the + song because Joel did not have that much material to choose from for the album.[6] +- source_sentence: who is the coach of the houston cougars basketball sentences: - - 'The main emotion of this example dialogue is: content' - - The intent of this example is to be offensive/disrespectful. - - This example wikipedia comment contains a threat. -- source_sentence: 'If you are looking for someone with the last name Kaib, you are - in the right place. If you browse through the results below you will discover - there are numerous people with the last name Kaib. To help expedite your people - search, you can restrict the number of results displayed by clicking on the link - that displays the first name of the person you are looking to find. - - After modifying your search results you will be shown a list of people with the - last name Kaib that match the first name you chose. In addition, there are other - types of people data such as date of birth, known locations, and possible relatives - that can help you locate the specific person you are searching for. - - If you have additional information about the person you are in search of, such - as their last known address or phone number, you can type that in the search box - above and further amend your results. This is an effective way to find the Kaib - you are trying to spot, if you know more about them.' + - World population In demographics, the world population is the total number of + humans currently living, and was estimated to have reached 7.6 billion as of December + 2017.[1] Estimates of the total number of humans who have ever lived range from + 106 to 108 billion as of 2007.[2][3][4] Though the population is currently growing + quite rapidly, future changes are influenced by difficult-to-predict factors such + as economic development, cultural changes, migration, and natural disasters. Various + mathematical models show that by 2100 the global population could be either rising + or falling depending on how these factors affect birth and death rates. + - Mantle (geology) The top of the mantle is defined by a sudden increase in seismic + velocity, which was first noted by Andrija Mohorovičić in 1909; this boundary + is now referred to as the Mohorovičić discontinuity or "Moho".[24][27] The uppermost + mantle plus overlying crust are relatively rigid and form the lithosphere, an + irregular layer with a maximum thickness of perhaps 200 km (120 mi). Below the + lithosphere the upper mantle becomes notably more plastic. In some regions below + the lithosphere, the seismic shear velocity is reduced; this so-called low-velocity + zone (LVZ) extends down to a depth of several hundred km. Inge Lehmann discovered + a seismic discontinuity at about 220 km (140 mi) depth;[28] although this discontinuity + has been found in other studies, it is not known whether the discontinuity is + ubiquitous. The transition zone is an area of great complexity; it physically + separates the upper and lower mantle.[26] Very little is known about the lower + mantle apart from that it appears to be relatively seismically homogeneous. The + D" layer at the core–mantle boundary separates the mantle from the core.[15][24] + In 2015, research using gravitational data from GRACE satellites and the long + wavelength nonhydrostatic geoid indicated viscosity[29] increases by a factor + of ten to 150 about 1,000 kilometres (620 mi) below earth's surface; separate + research also indicates sinking tectonic plates stall at this depth, leading Robert + van der Hilst to speculate "In term's of structure and dynamics, 1,000 kilometers + could be more important" (than the currently accepted 660 km depth upper—lower + division).[30] The lower mantle also contains some discontinueous zones, called + "thermochemical piles" which have been interpreted as either thermally differentiated, + upwellings bringing warmer material towards the surface, or as chemically differentiated + material.[31] A principal source of the heat that drives plate tectonics is the + radioactive decay of uranium, thorium, and potassium in Earth’s crust and mantle.[32] + - Kelvin Sampson Kelvin Dale Sampson (born October 5, 1955) is an American basketball + coach who is currently the head coach of the Houston Cougars men's basketball + team. He was the head coach at Montana Tech from 1981 to 1985, Washington State + University from 1987 to 1994, the University of Oklahoma from 1994 to 2006, and + Indiana University 2006 to 2008. He has also been an assistant coach for NBA teams + including the Milwaukee Bucks and Houston Rockets. +- source_sentence: where was the final scene of grease filmed sentences: - - 'This text is about: genealogy' - - 'This text is about: utility software' - - The example utterance is a query about emails. -- source_sentence: A woman is grating carrots + - Kentucky Derby Trophy The Kroger Company has been the official florist of the + Kentucky Derby since 1987. After taking over the duties from the Kingsley Walker + florist, Kroger began constructing the prestigious garland in one of its local + stores for the public to view on Derby Eve. The preservation of the garland and + crowds of spectators watching its construction are a testament to the prestige + and mystique of the Garland of Roses. + - Energy in Australia Historically–and until recent times–energy in Australia was + sourced largely from coal and natural gas,[1] however due to the increasing effects + of global warming and human-induced climate change on the global environment, + there has been a greater shift towards renewable energy such as solar power and + wind power both in Australia and abroad.[2][3] This in turn has led to a decrease + in the demand of coal worldwide.[4] + - Grease (film) The opening beach scene was shot at Malibu's Leo Carrillo State + Beach, making explicit reference to From Here to Eternity. The exterior Rydell + scenes, including the basketball, baseball and track segments, were shot at Venice + High School in Venice, California, while the Rydell interiors, including the high + school dance, were filmed at Huntington Park High School. The sleepover was shot + at a private house in East Hollywood. The Paramount Pictures studio lot was the + location of the scenes that involve Frosty Palace and the musical numbers "Greased + Lightning" and "Beauty School Dropout". The drive-in movie scenes were shot at + the Burbank Pickwick Drive-In (it was closed and torn down in 1989 and a shopping + center took its place). The race was filmed at the Los Angeles River, between + the First and Seventh Street Bridges, where many other films have been shot.[10] + The final scene where the carnival took place used John Marshall High School.[11] + And due to budget cuts a short scene was filmed at Hazard Park in Los Angeles. +- source_sentence: where is survivor heros vs hustlers vs healers filmed sentences: - - A cook is coating a pork chop - - No snake is being fed a mouse by a man - - A monkey is teasing a dog at the zoo -- source_sentence: A woman is slicing some tuna. + - 'Survivor: Heroes vs. Healers vs. Hustlers Survivor: Heroes vs. Healers vs. Hustlers + is the 35th season of the American CBS competitive reality television series Survivor. + This season features 18 new players divided into three tribes based on dominant + perceived trait: "Heroes" (courage), "Healers" (compassion), and "Hustlers" (tenacity).[1] + This is the fourth season of the show filmed in Fiji, following Survivor: Fiji, + Survivor: Millennials vs. Gen X, and Survivor: Game Changers.' + - Governor (India) The governors and lieutenant-governors are appointed by the president + for a term of five years. + - Competition (economics) The competitive process in a market economy exerts a sort + of pressure that tends to move resources to where they are most needed, and to + where they can be used most efficiently for the economy as a whole. For the competitive + process to work however, it is "important that prices accurately signal costs + and benefits." Where externalities occur, or monopolistic or oligopolistic conditions + persist, or for the provision of certain goods such as public goods, the pressure + of the competitive process is reduced.[2] +- source_sentence: first new zealander to run a mile in under four minutes sentences: - - A woman is cutting raw fish. - - A woman is peeling a potato. - - Still, Pinellas rabies cases fell sharply to 17 in 1996 and 3 in 1997, Agnew said. -datasets: -- tomaarsen/natural-questions-hard-negatives -- tomaarsen/gooaq-hard-negatives -- bclavie/msmarco-500k-triplets -- sentence-transformers/all-nli -- sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1 -- sentence-transformers/gooaq -- sentence-transformers/natural-questions -- tasksource/merged-2l-nli -- tasksource/merged-3l-nli -- tasksource/zero-shot-label-nli -- MoritzLaurer/dataset_train_nli -- google-research-datasets/paws -- nyu-mll/glue -- mwong/fever-evidence-related -- tasksource/sts-companion + - House of Saud The most influential member of the Royal family is the King of Saudi + Arabia, currently King Salman. The succession to the Saudi Arabian throne was + designed to pass from one son of the first king, Ibn Saud, to another. The next + in line, Crown Prince Mohammad bin Salman, is the son of King Salman, and thus + from the ruling House of Saud.[2][3][4] The king-appointed cabinet includes more + members of the royal family. The monarchy was hereditary by agnatic seniority + until 2006, when a royal decree provided that future Saudi kings are to be elected + by a committee of Saudi princes.[5] Although current King Salman first choose + his nephew and then his son as a crown prince without any consulation with Allegiance + Council. + - The Crown (TV series) The first season was released on Netflix on November 4, + 2016, while the second was released on December 8, 2017. The Crown has received + widespread acclaim, with critics praising the cast's performances, direction, + writing, cinematography, production values, and the relatively accurate historical + accounts of Queen Elizabeth's reign. Significant praise in the first season was + directed towards the performances of Foy in the leading role and John Lithgow + as Winston Churchill. The series has received several industry nominations and + awards, including winning Best Actress and Best Actor at the 23rd Screen Actors + Guild Awards for Foy and Lithgow, respectively, and receiving thirteen nominations + for the 69th Primetime Emmy Awards, including Outstanding Drama Series. + - Four-minute mile New Zealand's John Walker, the first man to run the mile under + 3:50, managed to run 135 sub-four-minute miles during his career (during which + he was the first person to run over 100 sub-four-minute miles), and American Steve + Scott has run the most sub-four-minute miles, with 136. Algeria's Noureddine Morceli + was the first under 3:45. Currently, the mile record is held by Morocco's Hicham + El Guerrouj, who ran a time of 3:43.13 in Rome in 1999. pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on jhu-clsp/ettin-encoder-17m -This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) on 21 datasets. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. +This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) on the contrastive dataset. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) -- **Maximum Sequence Length:** 7999 tokens +- **Maximum Sequence Length:** 128 tokens - **Output Dimensionality:** 256 dimensions - **Similarity Function:** Cosine Similarity -- **Training Datasets:** - - [tomaarsen/natural-questions-hard-negatives](https://huggingface.co/datasets/tomaarsen/natural-questions-hard-negatives) - - [tomaarsen/gooaq-hard-negatives](https://huggingface.co/datasets/tomaarsen/gooaq-hard-negatives) - - [bclavie/msmarco-500k-triplets](https://huggingface.co/datasets/bclavie/msmarco-500k-triplets) - - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) - - [sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) - - [sentence-transformers/gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) - - [sentence-transformers/natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) - - [merged-2l-nli](https://huggingface.co/datasets/tasksource/merged-2l-nli) - - [merged-3l-nli](https://huggingface.co/datasets/tasksource/merged-3l-nli) - - [zero-shot-label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli) - - [dataset_train_nli](https://huggingface.co/datasets/MoritzLaurer/dataset_train_nli) - - [paws/labeled_final](https://huggingface.co/datasets/paws) - - [glue/mrpc](https://huggingface.co/datasets/glue) - - [glue/qqp](https://huggingface.co/datasets/glue) - - [fever-evidence-related](https://huggingface.co/datasets/mwong/fever-evidence-related) - - [glue/stsb_0](https://huggingface.co/datasets/glue) - - [glue/stsb_1](https://huggingface.co/datasets/glue) - - sick/relatedness_0 - - sick/relatedness_1 - - [sts-companion_0](https://huggingface.co/datasets/tasksource/sts-companion) - - [sts-companion_1](https://huggingface.co/datasets/tasksource/sts-companion) -- **Language:** en +- **Training Dataset:** + - contrastive + ### Model Sources @@ -152,7 +179,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [j ``` SentenceTransformer( - (0): Transformer({'max_seq_length': 7999, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) + (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` @@ -174,23 +201,21 @@ from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("tasksource/ettin-17m-embed") # Run inference -queries = [ - "A woman is slicing some tuna.", -] -documents = [ - 'A woman is cutting raw fish.', - 'A woman is peeling a potato.', - 'Still, Pinellas rabies cases fell sharply to 17 in 1996 and 3 in 1997, Agnew said.', +sentences = [ + 'first new zealander to run a mile in under four minutes', + "Four-minute mile New Zealand's John Walker, the first man to run the mile under 3:50, managed to run 135 sub-four-minute miles during his career (during which he was the first person to run over 100 sub-four-minute miles), and American Steve Scott has run the most sub-four-minute miles, with 136. Algeria's Noureddine Morceli was the first under 3:45. Currently, the mile record is held by Morocco's Hicham El Guerrouj, who ran a time of 3:43.13 in Rome in 1999.", + "The Crown (TV series) The first season was released on Netflix on November 4, 2016, while the second was released on December 8, 2017. The Crown has received widespread acclaim, with critics praising the cast's performances, direction, writing, cinematography, production values, and the relatively accurate historical accounts of Queen Elizabeth's reign. Significant praise in the first season was directed towards the performances of Foy in the leading role and John Lithgow as Winston Churchill. The series has received several industry nominations and awards, including winning Best Actress and Best Actor at the 23rd Screen Actors Guild Awards for Foy and Lithgow, respectively, and receiving thirteen nominations for the 69th Primetime Emmy Awards, including Outstanding Drama Series.", ] -query_embeddings = model.encode_query(queries) -document_embeddings = model.encode_document(documents) -print(query_embeddings.shape, document_embeddings.shape) -# [1, 256] [3, 256] +embeddings = model.encode(sentences) +print(embeddings.shape) +# [3, 256] # Get the similarity scores for the embeddings -similarities = model.similarity(query_embeddings, document_embeddings) +similarities = model.similarity(embeddings, embeddings) print(similarities) -# tensor([[0.5809, 0.5259, 0.0517]]) +# tensor([[1.0000, 0.8112, 0.0234], +# [0.8112, 1.0000, 0.1274], +# [0.0234, 0.1274, 1.0000]]) ```