File size: 2,272 Bytes
f7fef32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
tags:
- static-embeddings
---
# Static Embeddings

This project contains multilingual static embeddings that are appropriate for generating
quick embeddings in edge devices. They are re-packaged from other projects in production
ready assets.

## Models

* [minishlab/potion-retrieval-32M/](models/minishlab/potion-retrieval-32M/README.md)
* [minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
* [sentence-transformers/static-retrieval-mrl-en-v1/](models/sentence-transformers/static-retrieval-mrl-en-v1/README.md)
* [sentence-transformers/static-similarity-mrl-multilingual-v1](models/sentence-transformers/static-similarity-mrl-multilingual-v1/README.md)

## Updating

Add models to `scripts/build_models.py`.

```sh
# Install dependencies and login to huggingface:
pipx install huggingface_hub
huggingface-cli login

# Re-build the models:
uv run scripts/build_models.py

# Version control:
git add .
git commit -m 'Updated the models'
git push
git tag v1.0.0 -m 'Model release description'
git push origin tag v1.0.0

# Upload the models
uv run scripts/upload_models.py --tag v1.0.0
```

## Precision

For static embeddings and cosine similarity, precision isn't as important. For an end
to end to test in Firefox on some vectors here was the cosine similarity for the same
mean pooled result. Note that the vector math happens in the f32 space, but storage
for the embeddings is in a lower precision.

> f32 vs f16: cosine similarity = 1.00000000<br/>
> → They are essentially identical in direction.
>
> f32 vs f8: cosine similarity = 0.99956375<br/>
> → Very close, only tiny quantization effects.

Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
has more loss.

Precision also affects download size. For instance with larger
[minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
model. The `fp32` is 228M compressed, while only 51M for `fp8_e4m3`, which has competetive
quantization values.

| precision     | dimensions | size    |
| ------------- | ---------- | ------- |
| fp32          | 128        | 228M    |
| fp16          | 128        | 114M    |
| **fp8_e4m3**  | 128        | **51M** |
| fp8_e5m2      | 128        |  44M    |