--- language: - th license: apache-2.0 library_name: transformers tags: - whisper-event - generated_from_trainer datasets: - mozilla-foundation/common_voice_13_0 - google/fleurs metrics: - wer base_model: openai/whisper-large-v3 model-index: - name: Whisper Large V3 Thai Combined V1 - biodatlab results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: mozilla-foundation/common_voice_13_0 th type: mozilla-foundation/common_voice_13_0 config: th split: test args: th metrics: - type: wer value: 0.21 name: Wer on Private Insurance Test Set - type: wer value: 0.41 name: Wer on Private Contact Center Test Set --- # Amity Whisper Large Lora V1 (Thai) This model is a fine-tuned version of [biodatlab/whisper-th-large-v3-combined](https://huggingface.co/biodatlab/whisper-th-large-v3-combined) on private enterprise call-center voice datasets. It achieves the following results on the real industry (contact center, insurance) test set: | MODEL | Contact Certer Set ( WER ) | Insurance Set ( WER ) | |:-------------:|:-----:|:-----:| | biodatlab/whisper-th-large-v3-combined | 0.53 | 0.33 | | Thai Cloud STT Service | 0.47 | 0.29 | | amity-whisper-large-stt-th-lora-v1 | `0.41` | `0.21` | | amity-whisper-medium-stt-th-lora-v1 | `0.41` | `0.25` | ## Model description Use the model with huggingface's `transformers` as follows: ```py import torch from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor from peft import PeftModel BASE_MODEL = "biodatlab/whisper-th-large-v3-combined" LORA_MODEL = "amityco/amity-whisper-large-stt-th-lora-v1" LANG = "th" device = 0 if torch.cuda.is_available() else "cpu" # 1. Load the base Whisper model base_model = AutoModelForSpeechSeq2Seq.from_pretrained( BASE_MODEL, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, low_cpu_mem_usage=True, device_map="auto" ) # 2. Load and merge the LoRA weights model = PeftModel.from_pretrained(base_model, LORA_MODEL) model = model.merge_and_unload() # 3. Load processor/tokenizer processor = AutoProcessor.from_pretrained(BASE_MODEL) # 4. Force Thai transcription model.config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids( language=LANG, task="transcribe" ) # 5. Create pipeline pipe = pipeline( task="automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, chunk_length_s=30, device=device, ) # 6. Run transcription result = pipe("audio.mp3")["text"] print(result) ``` ### Framework versions - Transformers 4.37.2 - Pytorch 2.1.0 - Datasets 2.16.1 - Tokenizers 0.15.1