pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/egorsmkv/speech-recognition-uk

ssorigen="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-9e07ff8eaaaff3a3.css" /> GitHub - egorsmkv/speech-recognition-uk: 🇺🇦 Speech Recognition & Synthesis for Ukrainian · GitHub
Skip to content

egorsmkv/speech-recognition-uk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

323 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🇺🇦 Speech Recognition & Synthesis for Ukrainian

Overview

This repository collects links to models, datasets, and tools for Ukrainian Speech-to-Text and Text-to-Speech.

Speech-UK initiative

We have datasets/models/leaderboards on Hugging Face, check it out:

Community

Discord

🎤 Speech-to-Text

📦 Implementations

wav2vec2-bert

wav2vec2

You can check demos out here: https://github.com/egorsmkv/wav2vec2-uk-demo

HuBERT

Citrinet

ContextNet

FastConformer

Squeezeformer

Conformer-CTC

Whisper

Quantized variants:

Lite Whisper:

OWSM, OWSM-CTC, and OWLS

Flashlight

MMS

data2vec

VOSK

Models: https://huggingface.co/Yehor/vosk-uk

DeepSpeech

M-CTC-T

moonshine-tiny-uk

📊 Benchmarks

This benchmark uses Common Voice 10 test split.

  • WER: Word Error Rate
  • CER: Character Error Rate

wav2vec2-bert

Model WER CER Accuracy (words)
Yehor/w2v-bert-uk (FP16) 6.6% 1.34% 93.4%
Yehor/w2v-bert-uk-v2.1 (FP16) 17.34% 3.33% 82.66%

wav2vec2

Model WER CER Accuracy (words)
Yehor/w2v-xls-r-uk 20.24% 3.64% 79.76%
robinhad/wav2vec2-xls-r-300m-uk 27.36% 5.37% 72.64%
arampacha/wav2vec2-xls-r-1b-uk 16.52% 2.93% 83.48%

HuBERT

Model WER CER Accuracy (words)
Yehor/hubert-uk (FP16) 37.07% 6.87% 62.93%

Citrinet

Model WER CER Accuracy (words)
nvidia/stt_uk_citrinet_1024_gamma_0_25 4.32% 0.94% 95.68%
neongeckocom/stt_uk_citrinet_512_gamma_0_25 7.46% 1.6% 92.54%

ContextNet

Model WER CER Accuracy (words)
theodotus/stt_uk_contextnet_512 6.69% 1.45% 93.31%

FastConformer P&C

This model supports text punctuation and capitalization

Model WER CER Accuracy (words)
nvidia/stt_ua_fastconformer_hybrid_large_pc 4.52% 1% 95.48%
theodotus/stt_ua_fastconformer_hybrid_large_pc 4% 1.02% 96%

Squeezeformer

Model WER CER Accuracy (words)
theodotus/stt_uk_squeezeformer_ctc_xs 10.78% 2.29% 89.22%
theodotus/stt_uk_squeezeformer_ctc_sm 8.2% 1.75% 91.8%
theodotus/stt_uk_squeezeformer_ctc_ml 5.91% 1.26% 94.09%

Conformer-CTC

Model WER CER Accuracy (words)
taras-sereda/uk-pods-conformer 6.75% 1.41% 93.25%

Whisper

Model WER CER Accuracy (words)
tiny 63.08% 18.59% 36.92%
base 52.1% 14.08% 47.9%
small 30.57% 7.64% 69.43%
medium 18.73% 4.4% 81.27%
large (v1) 16.42% 3.93% 83.58%
large (v2) 13.72% 3.18% 86.28%
large (v3) 20.53% 5.28% 79.478%
turbo 22.83% 7.05% 77.17%

Quantized versions:

Model WER CER Accuracy (words)
Yehor/whisper-large-v2-quantized-uk 14.95% 4.23% 85.05%
Yehor/whisper-large-v3-turbo-quantized-uk 12.75% 3.25% 87.25%
efficient-speech/lite-whisper-large-v3-turbo 42.89% 12.59% 57.11%
efficient-speech/lite-whisper-large-v3-turbo-acc 17.79% 4.34% 82.21%

If you want to fine-tune a Whisper model on own data, then use this repository: https://github.com/egorsmkv/whisper-ukrainian

Flashlight

Model WER CER Accuracy (words)
Flashlight Conformer 19.15% 2.44% 80.85%

data2vec

Model WER CER Accuracy (words)
robinhad/data2vec-large-uk 31.17% 7.31% 68.83%

VOSK

Model WER CER Accuracy (words)
v3 53.25% 38.78% 46.75%

m-ctc-t

Model WER CER Accuracy (words)
speechbrain/m-ctc-t-large 57% 10.94% 43%

DeepSpeech

Model WER CER Accuracy (words)
v0.5 70.25% 20.09% 29.75%

moonshine-tiny-uk

Model WER CER Accuracy (words)
UsefulSensors/moonshine-tiny-uk 24.54% 7.58% 75.46%

📖 Development

📚 Datasets

Compiled dataset: ~1200 hours

Voice of America: ~390 hours

FLEURS

Ukrainian broadcast: ~300 hours

YODAS2: ~400 hours

Ukrainian podcasts

Cleaned Common Voice 10 (test set)

Noised Common Voice 10

Other

⭐ Related works

Language models

Inverse Text Normalization

Text Enhancement

Aligners

Other

📢 Text-to-Speech

Test sentence with stresses:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Without stresses:

Кам'янець-Подільський - місто в Хмельницькій області України, центр Кам'янець-Подільської міської об'єднаної територіальної громади і Кам'янець-Подільського району.

📦 Implementations

StyleTTS2

P-Flow TTS

audio.mp4

RAD-TTS

demo.mp4

Coqui TTS

tts_output.mp4

Neon TTS

neon_tts.mp4

FastPitch

Balacoon TTS

balacoon_tts.mp4

MMS

📚 Datasets

⭐ Related works

Accentors

Grapheme-to-Phoneme

ipa-uk:

Charsiu G2P:

Other:

Misc

Sponsor this project

Contributors

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy