Text Generation
Transformers
Safetensors
English
Chinese
qwen3
qwen3-8b
lora
qlora
sft
rag
faiss
dense-retrieval
agent
ppo
rlhf
rule-reward
harness-engineering
um-handbook
question-answering
chatbot
education
tensor-talk
Instructions to use TensorCat/TensorTalk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TensorCat/TensorTalk with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TensorCat/TensorTalk")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TensorCat/TensorTalk", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TensorCat/TensorTalk with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TensorCat/TensorTalk" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TensorCat/TensorTalk
- SGLang
How to use TensorCat/TensorTalk with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TensorCat/TensorTalk with Docker Model Runner:
docker model run hf.co/TensorCat/TensorTalk
| license: apache-2.0 | |
| # TensorTalk / UM_Handbook | |
| TensorTalk is a handbook-grounded academic chat assistant built for the **Faculty of Computer Science and Information Technology, Universiti Malaya (UM)**. | |
| This project focuses on turning UM handbook content into a usable question-answering system through: | |
| - handbook preprocessing | |
| - source chunk construction | |
| - supervised QA dataset building | |
| - Qwen3-8B LoRA fine-tuning | |
| - merged-model deployment | |
| - a browser-style HTML chat demo | |
| --- | |
| ## Project Goal | |
| The main goal of this project is to build a handbook-based assistant that can answer student questions using information learned from the UM handbook domain. | |
| The current version is designed around: | |
| - undergraduate and postgraduate handbook content | |
| - handbook-faithful answers | |
| - concise student-facing responses | |
| - a local/demo deployment workflow on DICC and notebook environments | |
| This project is also intended to support a broader experimental pipeline: | |
| - **Baseline 1:** closed-book supervised fine-tuning | |
| - **Baseline 2:** retrieval-augmented version for later comparison | |
| --- | |
| ## What This Project Contains | |
| ### 1. Dataset Preparation | |
| The project includes scripts and resources for preparing handbook data before fine-tuning: | |
| - handbook markdown preprocessing | |
| - source chunk dataset building | |
| - SFT QA dataset construction | |
| - configuration management for the preprocessing and dataset pipeline | |
| ### 2. Fine-Tuning Workflow | |
| The model training workflow uses a Qwen3-8B base model with LoRA-based fine-tuning on the UM handbook QA dataset. | |
| The fine-tuning workflow includes: | |
| - notebook-based training on DICC | |
| - device-aware loading logic | |
| - train / validation / test style evaluation workflow | |
| - merged-model export for direct inference | |
| - LoRA adapter export for optional PEFT-based reuse | |
| - metrics and prediction file generation | |
| ### 3. Deployment Demo | |
| The project includes a notebook-based HTML chat UI called **TensorTalk**. | |
| The demo provides: | |
| - a browser-style chat layout | |
| - a handbook-focused system prompt | |
| - merged-model loading for direct inference | |
| - a student-facing question-answer workflow | |
| - a simple deployment path for demonstration purposes | |
| --- | |
| ## Current Project Structure | |
| ```text | |
| UM_Handbook/ | |
| ├── Dataset/ | |
| │ └── SFT_Dataset/ | |
| │ ├── SFT_QA_Training_Ready.jsonl | |
| │ ├── SFT_QA_Training_Ready_pretty.json | |
| │ ├── SFT_QA_Metadata.jsonl | |
| │ └── SFT_QA_Metadata_pretty.json | |
| ├── assets/ | |
| ├── outputs/ | |
| │ └── qwen3_um_handbook_optimized_1/ | |
| │ ├── lora_adapter/ | |
| │ ├── merged_model/ | |
| │ ├── trainer_runs/ | |
| │ ├── test_eval_runs/ | |
| │ ├── dataset_split_summary.json | |
| │ ├── final_metrics.json | |
| │ ├── test_predictions.jsonl | |
| │ └── validation_predictions.jsonl | |
| ├── FineTune_QWEN3_UM_Handbook_optimized_1.ipynb | |
| ├── UM_Handbook_Markdown_Preprocess.py | |
| ├── UM_SFT_QA_Dataset_Builder_from_Index.py | |
| ├── UM_Source_Chunk_Dataset_Builder.py | |
| └── um_handbook_config.py | |
| ``` | |
| --- | |
| ## Key Files | |
| ### Training and Data | |
| - `Dataset/SFT_Dataset/SFT_QA_Training_Ready.jsonl` | |
| Main SFT training dataset used for handbook QA fine-tuning. | |
| - `UM_Handbook_Markdown_Preprocess.py` | |
| Preprocesses handbook markdown / extracted source text. | |
| - `UM_Source_Chunk_Dataset_Builder.py` | |
| Builds source chunks for downstream dataset and retrieval-related use. | |
| - `UM_SFT_QA_Dataset_Builder_from_Index.py` | |
| Builds the supervised QA dataset from curated handbook content. | |
| - `um_handbook_config.py` | |
| Central configuration file for paths and data-processing settings. | |
| ### Training Output | |
| - `outputs/qwen3_um_handbook_optimized_1/merged_model/` | |
| Main inference-ready model directory. | |
| This is the directory used by the demo chat UI. | |
| - `outputs/qwen3_um_handbook_optimized_1/lora_adapter/` | |
| LoRA adapter weights. | |
| This is useful for PEFT-style loading with a base model, but it is not the primary path used by the current demo UI. | |
| - `outputs/qwen3_um_handbook_optimized_1/final_metrics.json` | |
| Final evaluation summary. | |
| - `outputs/qwen3_um_handbook_optimized_1/validation_predictions.jsonl` | |
| Validation-set generated answers for inspection. | |
| - `outputs/qwen3_um_handbook_optimized_1/test_predictions.jsonl` | |
| Test-set generated answers for inspection. | |
| ### Demo | |
| - `FineTune_QWEN3_UM_Handbook_optimized_1.ipynb` | |
| Main notebook that contains the fine-tuning workflow and the TensorTalk HTML chat demo. | |
| --- | |
| ## Model Artifact Notes | |
| This project may contain several model-related outputs. They are not all used in the same way. | |
| ### `merged_model/` | |
| This is the most important deployment artifact for the current demo. | |
| Use this when: | |
| - running the current TensorTalk HTML chat UI | |
| - loading the fine-tuned model directly with Hugging Face `from_pretrained(...)` | |
| - sharing the main inference-ready model | |
| ### `lora_adapter/` | |
| This contains LoRA delta weights only. | |
| Use this when: | |
| - loading the adapter on top of the original base model | |
| - reusing the fine-tuning result in a PEFT workflow | |
| - experimenting with a smaller transferable fine-tuning artifact | |
| ### `.pt` exported model file | |
| If present, the `.pt` file is mainly a saved full-model artifact / backup export. | |
| Use this when: | |
| - archiving the full fine-tuned weights | |
| - running a custom loading workflow that explicitly expects a `.pt` file | |
| For the current TensorTalk chat UI, the primary runtime artifact is still **`merged_model/`**. | |
| --- | |
| ## Current Demo Behavior | |
| The current demo is designed to answer questions such as: | |
| - dress code and appearance guidance | |
| - programme core courses / credit requirements | |
| - undergraduate vs postgraduate handbook information | |
| - academic rules and handbook-supported policy questions | |
| The answer style is intended to be: | |
| - handbook-grounded | |
| - short and direct | |
| - student-facing | |
| - non-speculative | |
| --- | |
| ## Example Demo Output | |
| The screenshot below shows the current TensorTalk chat interface running with the fine-tuned UM handbook model. | |
|  | |
| --- | |
| ## Repository Preview | |
| The screenshot below shows the current top-level project layout. | |
|  | |
| --- | |
| ## Suggested Minimal Deployment Package | |
| If the goal is only to demonstrate the chat UI to teammates, the minimal useful set is: | |
| - `merged_model/` | |
| - the chat notebook / UI code | |
| - optional avatar image under `assets/` | |
| The following items are not required for a simple demo run: | |
| - intermediate training checkpoints | |
| - test evaluation run directories | |
| - optional full `.pt` export | |
| - raw training logs not used by the demo | |
| --- | |
| ## Notes | |
| - The project is organized so that **Dataset**, **models / outputs**, and **demo code** remain separate. | |
| - The current demo is notebook-friendly and was prepared around a DICC workflow. | |
| - The deployment path prioritizes clarity and reproducibility over a heavyweight full-stack application setup. | |
| --- | |
| ## Status | |
| Current project status: | |
| - handbook preprocessing pipeline prepared | |
| - supervised QA dataset prepared | |
| - LoRA fine-tuning workflow completed | |
| - merged model exported | |
| - TensorTalk HTML chat demo running | |
| - evaluation outputs generated | |
| --- | |
| ## Author / Project Name | |
| **TensorTalk** | |
| UM Handbook QA / Fine-Tuned Qwen3-8B LoRA Project |