DeepSeek-Prover-V1.5-Base SINQ 4-bit

This is a 4-bit quantized version of DeepSeek-Prover-V1.5-Base using SINQ (Sinkhorn-Normalized Quantization).

Model Details

Base Model: DeepSeek-Prover-V1.5-Base (7B parameters)
Quantization Method: SINQ 4-bit with 2D tiling
Group Size: 128
Model Size: ~3.5 GB (75% reduction from ~14GB original)
Memory Usage: ~5 GB GPU memory at inference
Quality: Lower quality than original (word overlap ~40-50%)

Quantization Configuration

BaseQuantizeConfig(
    nbits=4,
    group_size=128,
    method="sinq",
    tiling_mode="2D",
    axis=1
)

Usage

Installation

pip install torch transformers
pip install git+https://github.com/huawei-csl/SINQ.git

Loading the Model

import torch
from sinq.patch_model import AutoSINQHFModel
from transformers import AutoTokenizer

# Load quantized model
tokenizer = AutoTokenizer.from_pretrained("Minhdn/deepseek-prover-sinq-4bit")
model = AutoSINQHFModel.from_quantized_safetensors(
    "Minhdn/deepseek-prover-sinq-4bit",
    device="cuda:0",
    compute_dtype=torch.bfloat16
)

# Generate
prompt = "theorem add_comm (a b : Nat) : a + b = b + a := by"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Performance

Inference Speed: ~3-4 tokens/second (slower without gemlite support for 2D tiling)
Memory Savings: 75% compared to original model
Quality: Moderate - best for experimentation or resource-constrained environments

Limitations

Lower quality compared to original model (word overlap ~40-50%)
Slower inference due to 2D tiling (gemlite only supports 1D)
May produce incorrect Lean4 proofs more frequently than original
Not recommended for production use where correctness is critical

Recommendations

For better quality at the cost of larger model size, consider:

6-bit version: Minhdn/deepseek-prover-sinq-6bit (~90% quality, ~5GB)
Original model: deepseek-ai/DeepSeek-Prover-V1.5-Base (100% quality, ~14GB)

Citation

If you use this model, please cite both the original DeepSeek-Prover paper and SINQ:

@article{deepseek2024prover,
  title={DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data},
  author={DeepSeek-AI},
  journal={arXiv preprint arXiv:2405.14333},
  year={2024}
}

@article{sinq2024,
  title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLMs},
  author={SINQ Authors},
  journal={arXiv preprint arXiv:2509.22944},
  year={2024}
}

License

This model inherits the MIT license from the original DeepSeek-Prover-V1.5-Base model.

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

F16

Model tree for Minhdn/deepseek-prover-sinq-4bit

Base model

deepseek-ai/DeepSeek-Prover-V1.5-Base

Finetuned

(3)

this model

Papers for Minhdn/deepseek-prover-sinq-4bit

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26, 2025 • 79

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 42