Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
magibu 's Collections
Pretrain Datasets
papers
Ekip karışık verileri
Fine-tuned LLMs
Turkish Language Healthcare Datasets

Pretrain Datasets

updated about 8 hours ago

Datasets we use for pretraining large language models

Upvote
-

  • omarkamali/wikipedia-monthly

    Viewer • Updated 8 days ago • 181M • 15.9k • 45

  • alibayram/hukuk_soru_cevap

    Viewer • Updated Nov 6, 2024 • 2.08k • 91 • 12

  • umutertugrul/turkish-hospital-medical-articles

    Viewer • Updated Oct 2, 2025 • 24.6k • 207 • 6

  • umutertugrul/turkish-medical-articles

    Viewer • Updated Oct 2, 2025 • 42.8k • 53 • 3

  • alibayram/tr-books

    Viewer • Updated 17 days ago • 3.7k • 32

  • selimfirat/bilkent-turkish-writings-dataset

    Viewer • Updated May 24, 2025 • 25.1k • 166 • 8

  • umutertugrul/turkish-academic-theses-dataset

    Viewer • Updated Aug 18, 2025 • 649k • 50 • 8

  • alibayram/onedio_haberler

    Viewer • Updated Jun 18, 2024 • 66.7k • 5 • 5

  • habanoz/news-tr-1.8M

    Viewer • Updated Oct 6, 2024 • 1.85M • 369 • 7

  • alibayram/hepsiburada_yorumlar

    Viewer • Updated Jun 18, 2024 • 2.66M • 70 • 13

  • alibayram/kitapyurdu_yorumlar

    Viewer • Updated Jun 18, 2024 • 405k • 25

  • alibayram/beyazperde_yorumlar

    Viewer • Updated Jun 18, 2024 • 192k • 21 • 5

  • BILGEM-AI/BILGE-Synthetic-Stories

    Viewer • Updated Nov 20, 2025 • 2.87M • 116 • 4
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs