LLMs Still Struggle with Non-English Languages
Large Language Models (LLMs) for languages other than English still lag 12-18 months behind their English counterparts, according to research from SEACrowd and other AI benchmarking organizations.
The Language Gap, By the Numbers
- English: GPT-4, Claude 3.5, Gemini 2.5 — all trained primarily on English data
- Chinese, Spanish, Japanese: 6-12 months behind English in quality
- Hindi, Arabic, Portuguese: 12-18 months behind, with noticeable gaps in idioms
- Indonesian, Vietnamese, Thai: 18-24 months behind
- Low-resource languages: 2+ years behind, if supported at all
Why the Gap Exists
It comes down to training data. The internet is approximately 60% English. Chinese is about 2%. Indonesian is roughly 0.5%. Less training data means worse performance — it is that simple.
The Gap Is Closing
The good news: organizations like SEACrowd, AI4Bharat, and EleutherAI are actively building multilingual datasets. Models like Llama 3 and Mistral have significantly improved non-English support.
What This Means for You
If you are using AI in a non-English language, set your expectations accordingly. For critical tasks — legal documents, medical information, financial advice — always have a human review the output.