Instructions to use amityco/amity-whisper-large-stt-th-lora-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amityco/amity-whisper-large-stt-th-lora-v1 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("amityco/amity-whisper-large-stt-th-lora-v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
metadata
language:
- th
license: apache-2.0
library_name: transformers
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_13_0
- google/fleurs
metrics:
- wer
base_model: openai/whisper-large-v3
model-index:
- name: Whisper Large V3 Thai Combined V1 - biodatlab
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: mozilla-foundation/common_voice_13_0 th
type: mozilla-foundation/common_voice_13_0
config: th
split: test
args: th
metrics:
- type: wer
value: 0.21
name: Wer on Private Insurance Test Set
- type: wer
value: 0.41
name: Wer on Private Contact Center Test Set
Amity Whisper Large Lora V1 (Thai)
This model is a fine-tuned version of biodatlab/whisper-th-large-v3-combined on private enterprise call-center voice datasets. It achieves the following results on the real industry (contact center, insurance) test set:
| MODEL | Contact Certer Set ( WER ) | Insurance Set ( WER ) |
|---|---|---|
| biodatlab/whisper-th-large-v3-combined | 0.53 | 0.33 |
| Thai Cloud STT Service | 0.47 | 0.29 |
| amity-whisper-large-stt-th-lora-v1 | 0.41 |
0.21 |
| amity-whisper-medium-stt-th-lora-v1 | 0.41 |
0.25 |
Model description
Use the model with huggingface's transformers as follows:
import torch
from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor
from peft import PeftModel
BASE_MODEL = "biodatlab/whisper-th-large-v3-combined"
LORA_MODEL = "amityco/amity-whisper-large-stt-th-lora-v1"
LANG = "th"
device = 0 if torch.cuda.is_available() else "cpu"
# 1. Load the base Whisper model
base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
low_cpu_mem_usage=True,
device_map="auto"
)
# 2. Load and merge the LoRA weights
model = PeftModel.from_pretrained(base_model, LORA_MODEL)
model = model.merge_and_unload()
# 3. Load processor/tokenizer
processor = AutoProcessor.from_pretrained(BASE_MODEL)
# 4. Force Thai transcription
model.config.forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(
language=LANG,
task="transcribe"
)
# 5. Create pipeline
pipe = pipeline(
task="automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
device=device,
)
# 6. Run transcription
result = pipe("audio.mp3")["text"]
print(result)
Framework versions
- Transformers 4.37.2
- Pytorch 2.1.0
- Datasets 2.16.1
- Tokenizers 0.15.1