Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a Space about 22 hours ago

Omarrran/OCR_DATASET_MAKER

upvoted a collection 4 days ago

upvoted a paper 5 days ago

Say Anything but This: When Tokenizer Betrays Reasoning in LLMs

View all activity

Organizations

upvoted a collection 4 days ago

GutenOCR

3 items • Updated 4 days ago • 5

upvoted 2 papers 5 days ago

Say Anything but This: When Tokenizer Betrays Reasoning in LLMs

Paper • 2601.14658 • Published 6 days ago • 1

GutenOCR: A Grounded Vision-Language Front-End for Documents

Paper • 2601.14490 • Published 6 days ago • 32

upvoted an article 10 days ago

Article

How We Built a Semantic Highlight Model To Save Token Cost for RAG

12 days ago

•

59

upvoted a collection 11 days ago

TranslateGemma

3 items • Updated 12 days ago • 189

upvoted 2 papers 12 days ago

It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models

Paper • 2601.08500 • Published 14 days ago • 1

TranslateGemma Technical Report

Paper • 2601.09012 • Published 13 days ago • 19

upvoted an article 19 days ago

Article

NVIDIA brings agents to life with DGX Spark and Reachy Mini

+1

22 days ago

•

58

upvoted a paper 29 days ago

Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

Paper • 2512.22100 • Published Dec 26, 2025 • 3

upvoted an article about 1 month ago

Article

The Optimal Architecture for Small Language Models

Dec 26, 2025

•

111

upvoted 2 papers about 1 month ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published Dec 15, 2025 • 15

upvoted an article about 2 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

582

upvoted a paper about 2 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 294

upvoted an article about 2 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

Dec 1, 2025

•

282

upvoted a changelog about 2 months ago

Changelog

Add a Status to your Hugging Face profile

Nov 28, 2025

• 97

upvoted 2 papers 2 months ago

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Paper • 2511.21613 • Published Nov 26, 2025 • 2

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 96

upvoted an article 2 months ago

Article

Building for an Open Future - our new partnership with Google Cloud

Nov 13, 2025

•

47

upvoted a paper 3 months ago

Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements

Paper • 2511.05560 • Published Nov 4, 2025 • 1