Building a Specialized Arabic Customer Service Assistant with Olama 3.1

#CustomLLM #PrivateLLM #LLM #Collab

#Unsloth

In the rapidly evolving world of AI, the ability to tailor a Large Language Model (LLM) to your specific business needs is a game-changer. This guide explores how to fine-tune Olama 3.1 8B to create a specialized Arabic customer service assistant, moving from raw data to a locally deployable model.

1. Efficiency First: The Setup

The foundation of this process is Olama 3.1 8B Instruct, loaded using the Unsloth library in 4-bit quantization. This approach is incredibly efficient; by focusing on Parameter-Efficient Fine-Tuning (PEFT), we only need to train approximately 0.52% of the total parameters (about 41.9 million out of 8 billion).

This efficiency allows the model to be trained on accessible hardware like a Tesla T4 GPU.

2. Preparing Your Arabic Dataset

To teach the model about your business, you need high-quality examples. The data is structured in a JSONL format with three main components: Instruction, Input, and Output.

Example Data:

Instruction: "What is the return policy?"
Output: "You can return any product within 14 days of the purchase date."

These examples are then processed using an Alpaca-style prompt template to ensure the model understands how to respond to specific queries in a professional, helpful manner.

3. The Training Phase

Using the SFTTrainer (Supervised Fine-Tuning Trainer), the model undergoes training with optimized hyperparameters:

Optimization: adamw_8bit is used to further reduce memory consumption.
Steps: While 60 steps can show promising results for a demo, 300–500 steps are recommended for a production-ready model.
Performance: In testing, the training process took roughly 14 minutes, successfully driving the training loss down from 2.4567 to 0.0891.

4. Exporting and Deployment with Ollama

Once training is complete and verified, the model is exported to the GGUF format (specifically using Q5_K_M quantization), resulting in a portable 5.7GB file.

To run this model locally using Ollama, you create a Modelfile with the following key configurations:

System Prompt: "أنت مساعد ذكي متخصص في موقعنا. أجب على أسئلة المستخدمين بناءً على معلومات الموقع فقط." (You are an intelligent assistant... answer based only on website information).
Parameters: Set a temperature of 0.7 for balanced creativity and a context window (num_ctx) of 2048 to match the training settings.

Conclusion

By following this workflow, you can transform a general-purpose model into a specialized Arabic assistant that understands your specific return policies, service hours, and customer needs. With tools like Unsloth and Ollama, what used to require massive server farms can now be accomplished efficiently and deployed locally for private, secure customer support.

Model & Efficiency

Llama 3.1 8B Instruct: The base large language model used for the fine-tuning process.
Unsloth: The library used for fast and memory-efficient loading and training.
4-bit Quantization: A memory-saving technique used to load the 8-billion-parameter model on consumer-grade hardware like a Tesla T4 GPU.
PEFT (Parameter-Efficient Fine-Tuning): A method where only a small fraction (0.52%) of the total model parameters are made trainable.

Data & Training

Arabic Customer Service (خدمة العملاء): The specific domain focus, covering topics like return policies and support hours.
Alpaca Prompt Template: The specific instruction-based formatting used to structure training data.
SFTTrainer (Supervised Fine-Tuning): The specialized trainer used to optimize the model on the instruction dataset.
Adamw_8bit: An optimized version of the AdamW optimizer used to further reduce memory requirements during training.
Training Loss: The metric used to track improvement, which dropped from 2.4567 to 0.0891 in the documented run.

Export & Local Deployment

GGUF: The file format used to export the model for local use and compatibility with tools like Ollama.
Q5_K_M Quantization: The specific bit-depth optimization used for the final exported model file.
Ollama: The platform used for local deployment of the fine-tuned assistant.
Modelfile: The configuration file containing the system prompt and inference parameters like temperature, top_p, and num_ctx.

| القسم | الخليات | الوصف |

| ------------------ | ------- | ---------------------------------- |

| **المقدمة** | 1 | شرح المشروع والمتطلبات |

| **التثبيت** | 2 | تثبيت Unsloth و datasets |

| **تحميل النموذج** | 3-4 | تحميل Llama 3.1 8B (4-bit) |

| **إعداد LoRA** | 5-6 | إعداد Adapters للتدريب |

| **بيانات الموقع** | 7-8 | إنشاء بيانات تدريب (قابلة للتعديل) |

| **تحضير البيانات** | 9-10 | تطبيق قالب Alpaca |

| **التدريب** | 11-12 | التدريب الرئيسي مع المخرجات |

| **الاختبار** | 13-14 | اختبار النموذج قبل التصدير |

| **التصدير** | 15-16 | تصدير GGUF لـ Ollama |

| **حفظ في Drive** | 17-18 | حفظ دائم في Google Drive |

| **Modelfile** | 19-20 | إعداد ملف Ollama |

| **الختام** | 21-31 | ملخص + نصائح + حلول المشاكل |

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb

Python

الخلية 1: التثبيت (3 دقائق)

Installation

%%capture

!pip install -q unsloth

!pip install -q datasets

print("✅ تم التثبيت!")

الخلية 2: تحميل النموذج

import torch

from unsloth import FastLanguageModel

print("⏳ جاري تحميل Llama 3.1 8B...")

model, tokenizer = FastLanguageModel.from_pretrained(

model_name="unsloth/Meta-Llama-3.1-8B-Instruct",

max_seq_length=2048,

dtype=None,

load_in_4bit=True,

)

print("✅ تم تحميل النموذج!")

print(f"📊 إجمالي البارامترات: {sum(p.numel() for p in model.parameters()):,}")

Ai model

Search This Blog

Fine-Tuning Llama 3.1 for Arabic Customer Service LLMs