Swivuriso

Swivuriso is a large-scale multilingual South African speech dataset comprising approximately 3,000 hours of audio across seven South African languages: isiZulu, isiXhosa, Sesotho, Setswana, Xitsonga, Tshivenda, and isiNdebele. Developed as part of the African Next Voices project and published in December 2024, it is available on Hugging Face and primarily designed for automatic speech recognition (ASR) training and benchmarking. Note: use for TTS, voice cloning, or voice synthesis is explicitly prohibited.

Swivuriso is relevant to the voice distress app as the most comprehensive available training corpus for building SA-language-specific wake word or speech recognition models, particularly for isiZulu and isiXhosa — two of South Africa’s most widely spoken first languages.

Connections

South African Language Wake Word Gap — mitigates, source: https://arxiv.org/html/2512.02201v1
Vosk — possible_component_for (SA language model training), source: https://arxiv.org/html/2512.02201v1

SignalTrace

Explorer

Swivuriso

Swivuriso

Connections

Graph View

Table of Contents

Backlinks