Omnilingual ASR CTC 300M v2, FLEURS Persian + Thomcles Continuation

This repository contains the step-5000 checkpoint from continuing Peacockery/omni-ctc-300m-v2-fleurs-fa-ir on Thomcles/Persian-Farsi-Speech.

The checkpoint is a fairseq2 / Omnilingual ASR checkpoint. It is not packaged as a Transformers model.

Files

checkpoint-step-5000.pt: final continued model checkpoint.
fairseq2_card.yaml: local fairseq2 asset card for the checkpoint.
training-config.yaml: continuation training configuration.
benchmarks/fleurs-test-step5000-thomcles-summary.md: FLEURS fa_ir test benchmark after Thomcles continuation.
benchmarks/fleurs-test-before-thomcles-summary.md: FLEURS fa_ir test benchmark before Thomcles continuation.
dev-scores/: Thomcles dev WER scores saved during continuation training.
data/thomcles-language_distribution_0.tsv: prepared Thomcles training-hour summary.

FLEURS fa_ir test, 871 samples:

Checkpoint	WER	CER
FLEURS + Thomcles step 5000	18.02%	5.11%
FLEURS-only step 5000	18.55%	5.28%

Thomcles dev validation:

See training-config.yaml for the exact trainer settings.

Downloads last month: -; Downloads are not tracked for this model. How to track