Tabahi commited on
Commit
76eeefc
·
verified ·
1 Parent(s): 6e1ab67

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +21 -23
README.md CHANGED
@@ -1,26 +1,6 @@
1
- # CUPE: Contextless Universal Phoneme Encoder
2
-
3
- A PyTorch model for contextless phoneme prediction from speech audio. CUPE processes 120ms frames independently, ensuring each frame's embeddings are acoustically pure—unlike transformer models that mix context across frames.
4
-
5
- ## Trained Models
6
-
7
- Two 30.1M parameter models are available in the [checkpoints directory](https://huggingface.co/Tabahi/CUPE-2i/tree/main/ckpt).
8
-
9
- ## Datasets
10
-
11
- - **LibriSpeech ASR corpus (SR12):** 960 hours of English speech from train-100, train-360, and train-500 splits.
12
- - **Multilingual LibriSpeech (MLS) (SLR94):** 800 hours total, with 100 hours each for 8 languages: `pl`, `pt`, `it`, `es`, `fr`, `nl`, `de`, `en`. Dataset's train/test/val splits.
13
- - **MSWC Multilingual Spoken Words Corpus:** 240 hours from 50 languages (max 10 hours/language).
14
- - **Training:** 38 languages (`en`, `de`, `fr`, `ca`, `es`, `fa`, `it`, `ru`, `pl`, `eu`, `cy`, `eo`, `nl`, `pt`, `tt`, `cs`, `tr`, `et`, `ky`, `id`, `sv-SE`, `ar`, `el`, `ro`, `lv`, `sl`, `zh-CN`, `ga-IE`, `ta`, `vi`, `gn`, `or`)
15
- - **Testing:** 6 languages (`lt`, `mt`, `ia`, `sk`, `ka`, `as`)
16
-
17
-
18
-
19
  ---
20
- language:
21
- - en
22
- - multilingual
23
- license: GPL-3.0
24
  library_name: pytorch
25
  pipeline_tag: audio-classification
26
  tags:
@@ -40,10 +20,28 @@ model-index:
40
  - name: Phoneme Error Rate
41
  type: phoneme-error-rate
42
  value: 0.25
43
- - name: Phoneme Group Error Rate
44
  type: phoneme-group-error-rate
45
  value: 0.23
46
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
 
49
  ## Metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: gpl-3.0
 
 
4
  library_name: pytorch
5
  pipeline_tag: audio-classification
6
  tags:
 
20
  - name: Phoneme Error Rate
21
  type: phoneme-error-rate
22
  value: 0.25
23
+ - name: Phoneme Group Error Rate
24
  type: phoneme-group-error-rate
25
  value: 0.23
26
  ---
27
+ # CUPE: Contextless Universal Phoneme Encoder
28
+
29
+ A PyTorch model for contextless phoneme prediction from speech audio. CUPE processes 120ms frames independently, ensuring each frame's embeddings are acoustically pure—unlike transformer models that mix context across frames.
30
+
31
+ ## Trained Models
32
+
33
+ Two 30.1M parameter models are available in the [checkpoints directory](https://huggingface.co/Tabahi/CUPE-2i/tree/main/ckpt).
34
+
35
+ ## Datasets
36
+
37
+ - **LibriSpeech ASR corpus (SR12):** 960 hours of English speech from train-100, train-360, and train-500 splits.
38
+ - **Multilingual LibriSpeech (MLS) (SLR94):** 800 hours total, with 100 hours each for 8 languages: `pl`, `pt`, `it`, `es`, `fr`, `nl`, `de`, `en`. Dataset's train/test/val splits.
39
+ - **MSWC Multilingual Spoken Words Corpus:** 240 hours from 50 languages (max 10 hours/language).
40
+ - **Training:** 38 languages (`en`, `de`, `fr`, `ca`, `es`, `fa`, `it`, `ru`, `pl`, `eu`, `cy`, `eo`, `nl`, `pt`, `tt`, `cs`, `tr`, `et`, `ky`, `id`, `sv-SE`, `ar`, `el`, `ro`, `lv`, `sl`, `zh-CN`, `ga-IE`, `ta`, `vi`, `gn`, `or`)
41
+ - **Testing:** 6 languages (`lt`, `mt`, `ia`, `sk`, `ka`, `as`)
42
+
43
+
44
+
45
 
46
 
47
  ## Metrics