Why I think local, open-source models will eventually win.
The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.
In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.
An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.
Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.
Multilingual Tokenization Showdown Analyzing 12 LLM Tokenizers Across 204 Languages.
First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages: Norod78/WikiCat-Multilingual
For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).
I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo: https://github.com/Norod/wikicat-tokenizer-eval
🎮 Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo
Built in a garage, funded by pre-orders, no VC. Now we’re scaling to 1 k installer units.
We’re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.
👇 Drop “EUSKERA” in the comments if you want an invite, tag a friend who still thinks Alexa is “convenient,” and smash ♥️ if AI should belong to people - not servers.
Just wanted to annouce 🏭SmolFactory : it's the quickest and best way to finetune SmolLM3 and GPT-OSS-20B on huggingface !
Basicaly it's an app you can run on huggingface by duplicating the space and running your training directly on huggingface GPUs .
It will help you basically select datasets and models, fine tune your model , make an experiment tracker you can use on your mobile phone , push all your model card and even automatically make a demo for you on huggingface so you can directly test it out when it's done !
Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B. Training on this data dramatically improves the ability to execute code and analyze data.
We (@baptistecolle@hannayukhymenko@lvwerra) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.