x2bee
/

plateer_classifier_v0.1

@@ -1,151 +1,165 @@
----
-library_name: peft
-license: apache-2.0
-base_model: Qwen/Qwen2.5-1.5B
-tags:
-- generated_from_trainer
-metrics:
-- accuracy
-model-index:
-- name: plateer_classifier_test
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# plateer_classifier_test
-This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) on [x2bee/plateer_category_data](https://huggingface.co/datasets/x2bee/plateer_category_data).
-It achieves the following results on the evaluation set:
-- [MLflow Result(https://polar-mlflow.x2bee.com/#/experiments/27/runs/baa7269894b14f91b8a8ea3822474476)]
-- Loss: 0.3242
-- Accuracy: 0.8997
-## How To use
-#### Load Base Model and Plateer Classifier Model.
-```python
-import joblib;
-from huggingface_hub import hf_hub_download;
-from peft import PeftModel, PeftConfig;
-from transformers import AutoTokenizer, TextClassificationPipeline, AutoModelForSequenceClassification;
-from huggingface_hub import HfApi, login
-with open('./api_key/HGF_TOKEN.txt', 'r') as hgf:
-    login(token=hgf.read())
-api = HfApi()
-repo_id = "x2bee/plateer_classifier_v0.1"
-data_id = "x2bee/plateer_category_data"
-# Load Config, Tokenizer, Label_Encoder
-config = PeftConfig.from_pretrained(repo_id, subfolder="last-checkpoint")
-tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="last-checkpoint")
-label_encoder_file = hf_hub_download(repo_id=data_id, repo_type="dataset", filename="label_encoder.joblib")
-label_encoder = joblib.load(label_encoder_file)
-# Load base_model
-base_model = AutoModelForSequenceClassification.from_pretrained("Qwen/Qwen2.5-1.5B", num_labels=17)
-base_model.resize_token_embeddings(len(tokenizer))
-# Load Model
-model = PeftModel.from_pretrained(base_model, repo_id, subfolder="last-checkpoint")
-import torch
-class TextClassificationPipeline(TextClassificationPipeline):
-    def __call__(self, inputs, top_k=5, **kwargs):
-        inputs = self.tokenizer(inputs, return_tensors="pt", truncation=True, padding=True, **kwargs)
-        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
-        with torch.no_grad():
-            outputs = self.model(**inputs)
-        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
-        scores, indices = torch.topk(probs, top_k, dim=-1)
-        results = []
-        for batch_idx in range(indices.shape[0]):
-            batch_results = []
-            for score, idx in zip(scores[batch_idx], indices[batch_idx]):
-                temp_list = []
-                label = self.model.config.id2label[idx.item()]
-                label = int(label.split("_")[1])
-                temp_list.append(label)
-                predicted_class = label_encoder.inverse_transform(temp_list)[0]
-                batch_results.append({
-                    "label": label,
-                    "label_decode": predicted_class,
-                    "score": score.item(),
-                })
-            results.append(batch_results)
-        return results
-classifier_model = TextClassificationPipeline(tokenizer=tokenizer, model=model)
-def plateer_classifier(text, top_k=3):
-    result = classifier_model(text, top_k=top_k)
-    return result
-```
-#### Run
-```python
-user_input = "머리띠"
-result = plateer_classifier(user_input)[0]
-print(result)
-```
-```bash
-{'label': 6, 'label_decode': '뷰티/케어', 'score': 0.42996299266815186}
-{'label': 15, 'label_decode': '패션/의류/잡화', 'score': 0.1485249102115631}
-{'label': 8, 'label_decode': '스포츠', 'score': 0.1281907707452774}
-```
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- total_eval_batch_size: 32
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 10000
-- num_epochs: 1
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step   | Validation Loss | Accuracy |
-|:-------------:|:------:|:------:|:---------------:|:--------:|
-| 0.5023        | 0.0292 | 5000   | 0.5044          | 0.8572   |
-| 0.4629        | 0.0585 | 10000  | 0.4571          | 0.8688   |
-| 0.4254        | 0.0878 | 15000  | 0.4201          | 0.8770   |
-| 0.4025        | 0.1171 | 20000  | 0.4016          | 0.8823   |
-| 0.3635        | 0.3220 | 55000  | 0.3623          | 0.8905   |
-| 0.3192        | 0.6441 | 110000 | 0.3242          | 0.8997   |
-### Framework versions
-- PEFT 0.13.2
-- Transformers 4.46.3
-- Pytorch 2.2.1
-- Datasets 3.1.0
 - Tokenizers 0.20.3

+---
+library_name: peft
+license: apache-2.0
+base_model: Qwen/Qwen2.5-1.5B
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+model-index:
+- name: plateer_classifier_test
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# plateer_classifier_test
+This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) on [x2bee/plateer_category_data](https://huggingface.co/datasets/x2bee/plateer_category_data).
+It achieves the following results on the evaluation set:
+- [MLflow Result(https://polar-mlflow.x2bee.com/#/experiments/27/runs/baa7269894b14f91b8a8ea3822474476)]
+- Loss: 0.3242
+- Accuracy: 0.8997
+## How To use
+#### Load Base Model and Plateer Classifier Model.
+```python
+import joblib;
+from huggingface_hub import hf_hub_download;
+from peft import PeftModel, PeftConfig;
+from transformers import AutoTokenizer, TextClassificationPipeline, AutoModelForSequenceClassification;
+from huggingface_hub import HfApi, login
+with open('./api_key/HGF_TOKEN.txt', 'r') as hgf:
+    login(token=hgf.read())
+api = HfApi()
+repo_id = "x2bee/plateer_classifier_v0.1"
+data_id = "x2bee/plateer_category_data"
+# Load Config, Tokenizer, Label_Encoder
+config = PeftConfig.from_pretrained(repo_id, subfolder="last-checkpoint")
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="last-checkpoint")
+label_encoder_file = hf_hub_download(repo_id=data_id, repo_type="dataset", filename="label_encoder.joblib")
+label_encoder = joblib.load(label_encoder_file)
+# Load base_model
+base_model = AutoModelForSequenceClassification.from_pretrained("Qwen/Qwen2.5-1.5B", num_labels=17)
+base_model.resize_token_embeddings(len(tokenizer))
+# Load Model
+model = PeftModel.from_pretrained(base_model, repo_id, subfolder="last-checkpoint")
+import torch
+class TextClassificationPipeline(TextClassificationPipeline):
+    def __call__(self, inputs, top_k=5, **kwargs):
+        inputs = self.tokenizer(inputs, return_tensors="pt", truncation=True, padding=True, **kwargs)
+        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
+        with torch.no_grad():
+            outputs = self.model(**inputs)
+        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
+        scores, indices = torch.topk(probs, top_k, dim=-1)
+        results = []
+        for batch_idx in range(indices.shape[0]):
+            batch_results = []
+            for score, idx in zip(scores[batch_idx], indices[batch_idx]):
+                temp_list = []
+                label = self.model.config.id2label[idx.item()]
+                label = int(label.split("_")[1])
+                temp_list.append(label)
+                predicted_class = label_encoder.inverse_transform(temp_list)[0]
+                batch_results.append({
+                    "label": label,
+                    "label_decode": predicted_class,
+                    "score": score.item(),
+                })
+            results.append(batch_results)
+        return results
+classifier_model = TextClassificationPipeline(tokenizer=tokenizer, model=model)
+def plateer_classifier(text, top_k=3):
+    result = classifier_model(text, top_k=top_k)
+    return result
+```
+#### Run
+```python
+user_input = "머리띠"
+result = plateer_classifier(user_input)[0]
+print(result)
+```
+```bash
+{'label': 6, 'label_decode': '뷰티/케어', 'score': 0.42996299266815186}
+{'label': 15, 'label_decode': '패션/의류/잡화', 'score': 0.1485249102115631}
+{'label': 8, 'label_decode': '스포츠', 'score': 0.1281907707452774}
+```
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 128
+- total_eval_batch_size: 32
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 10000
+- num_epochs: 1
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step   | Validation Loss | Accuracy |
+|:-------------:|:------:|:------:|:---------------:|:--------:|
+| 0.5023        | 0.0292 | 5000   | 0.5044          | 0.8572   |
+| 0.4629        | 0.0585 | 10000  | 0.4571          | 0.8688   |
+| 0.4254        | 0.0878 | 15000  | 0.4201          | 0.8770   |
+| 0.4025        | 0.1171 | 20000  | 0.4016          | 0.8823   |
+| 0.3635        | 0.3220 | 55000  | 0.3623          | 0.8905   |
+| 0.3192        | 0.6441 | 110000 | 0.3242          | 0.8997   |
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.46.3
+- Pytorch 2.2.1
+- Datasets 3.1.0
 - Tokenizers 0.20.3