OpenSLR 32 SA Languages

OpenSLR 32 (SLR32) is a multi-speaker text-to-speech dataset for four South African languages — Afrikaans, Sesotho, Setswana, and isiXhosa — totalling approximately 3.3GB of audio. Licensed under CC BY-SA 4.0, it permits commercial use with attribution and share-alike requirements. The dataset was created for TTS model development but is applicable to wake word model training using synthetic speech generation pipelines similar to those used by openWakeWord.

Afrikaans coverage (950MB) is the most practically useful for the voice distress app given Afrikaans is widely spoken across South Africa and no Afrikaans wake word models currently exist in any major framework.

Connections