---
language:
- th
license: apache-2.0
library_name: transformers
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_13_0
- google/fleurs
metrics:
- wer
base_model: openai/whisper-large-v3
model-index:
- name: Whisper Large V3 Thai Combined V1 - biodatlab
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: mozilla-foundation/common_voice_13_0 th
      type: mozilla-foundation/common_voice_13_0
      config: th
      split: test
      args: th
    metrics:
    - type: wer
      value: 0.21
      name: Wer on Private Insurance Test Set
    - type: wer
      value: 0.41
      name: Wer on Private Contact Center Test Set
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Amity Whisper Large Lora V1 (Thai)

This model is a fine-tuned version of [biodatlab/whisper-th-large-v3-combined](https://huggingface.co/biodatlab/whisper-th-large-v3-combined) on private enterprise call-center voice datasets.
It achieves the following results on the real industry (contact center, insurance) test set:

| MODEL | Contact Certer Set ( WER ) | Insurance Set ( WER ) |
|:-------------:|:-----:|:-----:|
| biodatlab/whisper-th-large-v3-combined        | 0.53   |  0.33  |
| Thai Cloud STT Service        | 0.47   |  0.29  |
| amity-whisper-large-stt-th-lora-v1        | `0.41`   |  `0.21`  |
| amity-whisper-medium-stt-th-lora-v1        | `0.41`   |  `0.25`  |

## Model description

Use the model with huggingface's `transformers` as follows:

```py
import torch
from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
from peft import PeftModel

BASE_MODEL = "biodatlab/whisper-th-large-v3-combined"  
LORA_MODEL = "amityco/amity-whisper-large-stt-th-lora-v1"  
LANG = "th"

device = 0 if torch.cuda.is_available() else "cpu"

# 1. Load the base Whisper model
base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    low_cpu_mem_usage=True,
    device_map="auto"
)

# 2. Load and merge the LoRA weights
model = PeftModel.from_pretrained(base_model, LORA_MODEL)
model = model.merge_and_unload()

# 3. Load processor/tokenizer
processor = AutoProcessor.from_pretrained(BASE_MODEL)

# 4. Force Thai transcription
model.config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(
    language=LANG,
    task="transcribe"
)

# 5. Create pipeline
pipe = pipeline(
    task="automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    chunk_length_s=30,
    device=device,
)

# 6. Run transcription
result = pipe("audio.mp3")["text"]
print(result)

```

### Framework versions

- Transformers 4.37.2
- Pytorch 2.1.0
- Datasets 2.16.1
- Tokenizers 0.15.1