Instructions to use TensorCat/TensorTalk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TensorCat/TensorTalk with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TensorCat/TensorTalk")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TensorCat/TensorTalk", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TensorCat/TensorTalk with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TensorCat/TensorTalk" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TensorCat/TensorTalk
- SGLang
How to use TensorCat/TensorTalk with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TensorCat/TensorTalk with Docker Model Runner:
docker model run hf.co/TensorCat/TensorTalk
license: apache-2.0
TensorTalk / UM_Handbook
TensorTalk is a handbook-grounded academic chat assistant built for the Faculty of Computer Science and Information Technology, Universiti Malaya (UM).
This project focuses on turning UM handbook content into a usable question-answering system through:
- handbook preprocessing
- source chunk construction
- supervised QA dataset building
- Qwen3-8B LoRA fine-tuning
- merged-model deployment
- a browser-style HTML chat demo
Project Goal
The main goal of this project is to build a handbook-based assistant that can answer student questions using information learned from the UM handbook domain.
The current version is designed around:
- undergraduate and postgraduate handbook content
- handbook-faithful answers
- concise student-facing responses
- a local/demo deployment workflow on DICC and notebook environments
This project is also intended to support a broader experimental pipeline:
- Baseline 1: closed-book supervised fine-tuning
- Baseline 2: retrieval-augmented version for later comparison
What This Project Contains
1. Dataset Preparation
The project includes scripts and resources for preparing handbook data before fine-tuning:
- handbook markdown preprocessing
- source chunk dataset building
- SFT QA dataset construction
- configuration management for the preprocessing and dataset pipeline
2. Fine-Tuning Workflow
The model training workflow uses a Qwen3-8B base model with LoRA-based fine-tuning on the UM handbook QA dataset.
The fine-tuning workflow includes:
- notebook-based training on DICC
- device-aware loading logic
- train / validation / test style evaluation workflow
- merged-model export for direct inference
- LoRA adapter export for optional PEFT-based reuse
- metrics and prediction file generation
3. Deployment Demo
The project includes a notebook-based HTML chat UI called TensorTalk.
The demo provides:
- a browser-style chat layout
- a handbook-focused system prompt
- merged-model loading for direct inference
- a student-facing question-answer workflow
- a simple deployment path for demonstration purposes
Current Project Structure
UM_Handbook/
βββ Dataset/
β βββ SFT_Dataset/
β βββ SFT_QA_Training_Ready.jsonl
β βββ SFT_QA_Training_Ready_pretty.json
β βββ SFT_QA_Metadata.jsonl
β βββ SFT_QA_Metadata_pretty.json
βββ assets/
βββ outputs/
β βββ qwen3_um_handbook_optimized_1/
β βββ lora_adapter/
β βββ merged_model/
β βββ trainer_runs/
β βββ test_eval_runs/
β βββ dataset_split_summary.json
β βββ final_metrics.json
β βββ test_predictions.jsonl
β βββ validation_predictions.jsonl
βββ FineTune_QWEN3_UM_Handbook_optimized_1.ipynb
βββ UM_Handbook_Markdown_Preprocess.py
βββ UM_SFT_QA_Dataset_Builder_from_Index.py
βββ UM_Source_Chunk_Dataset_Builder.py
βββ um_handbook_config.py
Key Files
Training and Data
Dataset/SFT_Dataset/SFT_QA_Training_Ready.jsonl
Main SFT training dataset used for handbook QA fine-tuning.UM_Handbook_Markdown_Preprocess.py
Preprocesses handbook markdown / extracted source text.UM_Source_Chunk_Dataset_Builder.py
Builds source chunks for downstream dataset and retrieval-related use.UM_SFT_QA_Dataset_Builder_from_Index.py
Builds the supervised QA dataset from curated handbook content.um_handbook_config.py
Central configuration file for paths and data-processing settings.
Training Output
outputs/qwen3_um_handbook_optimized_1/merged_model/
Main inference-ready model directory.
This is the directory used by the demo chat UI.outputs/qwen3_um_handbook_optimized_1/lora_adapter/
LoRA adapter weights.
This is useful for PEFT-style loading with a base model, but it is not the primary path used by the current demo UI.outputs/qwen3_um_handbook_optimized_1/final_metrics.json
Final evaluation summary.outputs/qwen3_um_handbook_optimized_1/validation_predictions.jsonl
Validation-set generated answers for inspection.outputs/qwen3_um_handbook_optimized_1/test_predictions.jsonl
Test-set generated answers for inspection.
Demo
FineTune_QWEN3_UM_Handbook_optimized_1.ipynb
Main notebook that contains the fine-tuning workflow and the TensorTalk HTML chat demo.
Model Artifact Notes
This project may contain several model-related outputs. They are not all used in the same way.
merged_model/
This is the most important deployment artifact for the current demo.
Use this when:
- running the current TensorTalk HTML chat UI
- loading the fine-tuned model directly with Hugging Face
from_pretrained(...) - sharing the main inference-ready model
lora_adapter/
This contains LoRA delta weights only.
Use this when:
- loading the adapter on top of the original base model
- reusing the fine-tuning result in a PEFT workflow
- experimenting with a smaller transferable fine-tuning artifact
.pt exported model file
If present, the .pt file is mainly a saved full-model artifact / backup export.
Use this when:
- archiving the full fine-tuned weights
- running a custom loading workflow that explicitly expects a
.ptfile
For the current TensorTalk chat UI, the primary runtime artifact is still merged_model/.
Current Demo Behavior
The current demo is designed to answer questions such as:
- dress code and appearance guidance
- programme core courses / credit requirements
- undergraduate vs postgraduate handbook information
- academic rules and handbook-supported policy questions
The answer style is intended to be:
- handbook-grounded
- short and direct
- student-facing
- non-speculative
Example Demo Output
The screenshot below shows the current TensorTalk chat interface running with the fine-tuned UM handbook model.
Repository Preview
The screenshot below shows the current top-level project layout.
Suggested Minimal Deployment Package
If the goal is only to demonstrate the chat UI to teammates, the minimal useful set is:
merged_model/- the chat notebook / UI code
- optional avatar image under
assets/
The following items are not required for a simple demo run:
- intermediate training checkpoints
- test evaluation run directories
- optional full
.ptexport - raw training logs not used by the demo
Notes
- The project is organized so that Dataset, models / outputs, and demo code remain separate.
- The current demo is notebook-friendly and was prepared around a DICC workflow.
- The deployment path prioritizes clarity and reproducibility over a heavyweight full-stack application setup.
Status
Current project status:
- handbook preprocessing pipeline prepared
- supervised QA dataset prepared
- LoRA fine-tuning workflow completed
- merged model exported
- TensorTalk HTML chat demo running
- evaluation outputs generated
Author / Project Name
TensorTalk
UM Handbook QA / Fine-Tuned Qwen3-8B LoRA Project

