Text Generation
Transformers
Safetensors
English
Chinese
qwen3
qwen3-8b
lora
qlora
sft
rag
faiss
dense-retrieval
agent
ppo
rlhf
rule-reward
harness-engineering
um-handbook
question-answering
chatbot
education
tensor-talk
Instructions to use TensorCat/TensorTalk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TensorCat/TensorTalk with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TensorCat/TensorTalk")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TensorCat/TensorTalk", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TensorCat/TensorTalk with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TensorCat/TensorTalk" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TensorCat/TensorTalk
- SGLang
How to use TensorCat/TensorTalk with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TensorCat/TensorTalk" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TensorCat/TensorTalk", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TensorCat/TensorTalk with Docker Model Runner:
docker model run hf.co/TensorCat/TensorTalk
Update README.md
Browse files
README.md
CHANGED
|
@@ -27,6 +27,34 @@ tags:
|
|
| 27 |
- education
|
| 28 |
- tensor-talk
|
| 29 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
# TensorTalk: UM Handbook Qwen3-8B SFT + RAG + Agent + PPO + Harness Engineering
|
| 32 |
|
|
@@ -612,7 +640,7 @@ It shows that retrieval grounding dramatically improves answer quality compared
|
|
| 612 |
|
| 613 |
## 6.1 Why an Agent Is Needed
|
| 614 |
|
| 615 |
-
The handbook is reliable for stable academic rules, but
|
| 616 |
|
| 617 |
Examples:
|
| 618 |
|
|
@@ -621,26 +649,87 @@ Who is the current dean?
|
|
| 621 |
Where can students find residential college information?
|
| 622 |
What official page mentions PEKOM?
|
| 623 |
Where is the official SPeCTRUM page?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 624 |
```
|
| 625 |
|
| 626 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 627 |
|
| 628 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 629 |
|
| 630 |
---
|
| 631 |
|
| 632 |
-
## 6.
|
| 633 |
|
| 634 |
-
The web agent is constrained to official UM / FSKTM domains.
|
| 635 |
|
| 636 |
-
Priority domains
|
| 637 |
|
| 638 |
```text
|
| 639 |
fsktm.um.edu.my
|
| 640 |
www.um.edu.my
|
|
|
|
| 641 |
```
|
| 642 |
|
| 643 |
-
Auxiliary official
|
|
|
|
|
|
|
| 644 |
|
| 645 |
```text
|
| 646 |
aasd.um.edu.my
|
|
@@ -653,41 +742,202 @@ intra.fsktm.um.edu.my
|
|
| 653 |
gallery.fsktm.um.edu.my
|
| 654 |
```
|
| 655 |
|
| 656 |
-
The agent
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 657 |
|
| 658 |
```text
|
| 659 |
-
|
| 660 |
-
|
| 661 |
-
|
| 662 |
-
|
| 663 |
-
|
| 664 |
-
|
| 665 |
-
Qwen-based evidence judging
|
| 666 |
-
retry if weak
|
| 667 |
-
fallback to handbook RAG if needed
|
| 668 |
```
|
| 669 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 670 |
---
|
| 671 |
|
| 672 |
-
## 6.
|
| 673 |
|
| 674 |
-
|
| 675 |
|
| 676 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 677 |
|
| 678 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 679 |
|
| 680 |
```text
|
| 681 |
-
|
| 682 |
-
|
| 683 |
-
|
| 684 |
-
|
| 685 |
-
|
|
|
|
|
|
|
| 686 |
```
|
| 687 |
|
| 688 |
-
|
| 689 |
|
| 690 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 691 |
|
| 692 |
---
|
| 693 |
|
|
@@ -695,35 +945,175 @@ So the agent is better described as:
|
|
| 695 |
|
| 696 |
## 7.1 What Harness Engineering Means Here
|
| 697 |
|
| 698 |
-
Harness Engineering is the
|
| 699 |
|
| 700 |
A simple analogy:
|
| 701 |
|
| 702 |
```text
|
| 703 |
The LLM/agent is the car.
|
| 704 |
-
Harness Engineering is the guardrail, traffic rule, checkpoint, fallback route, and
|
| 705 |
```
|
| 706 |
|
| 707 |
The model can generate fluent answers, but the harness controls:
|
| 708 |
|
| 709 |
-
|
| 710 |
-
|
| 711 |
-
|
| 712 |
-
|
| 713 |
-
|
| 714 |
-
|
| 715 |
-
|
| 716 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 717 |
|
| 718 |
---
|
| 719 |
|
| 720 |
-
## 7.2 Harness
|
|
|
|
|
|
|
| 721 |
|
| 722 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 723 |
|
| 724 |
```text
|
| 725 |
User Question
|
| 726 |
↓
|
|
|
|
|
|
|
| 727 |
Local Handbook RAG
|
| 728 |
↓
|
| 729 |
Official Web Discovery
|
|
@@ -742,96 +1132,271 @@ Entity-aware Retry
|
|
| 742 |
↓
|
| 743 |
Weak Evidence Fallback
|
| 744 |
↓
|
| 745 |
-
Answer Generator
|
| 746 |
↓
|
| 747 |
Answer Grounding Judge
|
| 748 |
↓
|
| 749 |
Completeness Guard
|
| 750 |
↓
|
|
|
|
|
|
|
| 751 |
UI Trace
|
| 752 |
```
|
| 753 |
|
|
|
|
|
|
|
| 754 |
---
|
| 755 |
|
| 756 |
-
## 7.
|
| 757 |
|
| 758 |
-
The
|
| 759 |
|
| 760 |
-
|
| 761 |
|
| 762 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 763 |
|
| 764 |
-
|
| 765 |
|
| 766 |
-
-
|
| 767 |
-
|
| 768 |
-
|
| 769 |
-
|
| 770 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 771 |
|
| 772 |
---
|
| 773 |
|
| 774 |
-
##
|
| 775 |
|
| 776 |
-
|
| 777 |
|
| 778 |
-
|
| 779 |
|
| 780 |
```text
|
| 781 |
-
|
| 782 |
-
|
| 783 |
-
|
| 784 |
-
|
| 785 |
-
→ Retry
|
| 786 |
-
→ Final Evidence
|
| 787 |
```
|
| 788 |
|
| 789 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 790 |
|
| 791 |
---
|
| 792 |
|
| 793 |
-
##
|
| 794 |
|
| 795 |
-
|
| 796 |
|
| 797 |
-
|
| 798 |
|
| 799 |
```text
|
| 800 |
-
|
| 801 |
-
|
| 802 |
-
|
| 803 |
-
|
| 804 |
-
|
| 805 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 806 |
```
|
| 807 |
|
|
|
|
|
|
|
| 808 |
---
|
| 809 |
|
| 810 |
-
##
|
| 811 |
|
| 812 |
-
|
| 813 |
|
| 814 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 815 |
|
| 816 |
---
|
| 817 |
|
| 818 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 819 |
|
| 820 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 821 |
|
| 822 |
---
|
| 823 |
|
| 824 |
-
##
|
|
|
|
|
|
|
| 825 |
|
| 826 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 827 |
|
| 828 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 829 |
|
| 830 |
---
|
| 831 |
|
| 832 |
-
##
|
| 833 |
|
| 834 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 835 |
|
| 836 |
---
|
| 837 |
|
|
@@ -1371,7 +1936,41 @@ Non-PPO fallback is forbidden in the final Improved Model demo.
|
|
| 1371 |
|
| 1372 |
---
|
| 1373 |
|
| 1374 |
-
# 18.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1375 |
|
| 1376 |
TensorTalk demonstrates a staged LLM system development workflow:
|
| 1377 |
|
|
|
|
| 27 |
- education
|
| 28 |
- tensor-talk
|
| 29 |
---
|
| 30 |
+
---
|
| 31 |
+
license: other
|
| 32 |
+
language:
|
| 33 |
+
- en
|
| 34 |
+
- zh
|
| 35 |
+
tags:
|
| 36 |
+
- qwen3
|
| 37 |
+
- qwen3-8b
|
| 38 |
+
- lora
|
| 39 |
+
- qlora
|
| 40 |
+
- sft
|
| 41 |
+
- rag
|
| 42 |
+
- faiss
|
| 43 |
+
- dense-retrieval
|
| 44 |
+
- agent
|
| 45 |
+
- ppo
|
| 46 |
+
- rlhf
|
| 47 |
+
- rule-reward
|
| 48 |
+
- harness-engineering
|
| 49 |
+
- um-handbook
|
| 50 |
+
- question-answering
|
| 51 |
+
- chatbot
|
| 52 |
+
- education
|
| 53 |
+
- tensor-talk
|
| 54 |
+
pipeline_tag: text-generation
|
| 55 |
+
base_model: Qwen/Qwen3-8B
|
| 56 |
+
library_name: transformers
|
| 57 |
+
---
|
| 58 |
|
| 59 |
# TensorTalk: UM Handbook Qwen3-8B SFT + RAG + Agent + PPO + Harness Engineering
|
| 60 |
|
|
|
|
| 640 |
|
| 641 |
## 6.1 Why an Agent Is Needed
|
| 642 |
|
| 643 |
+
The handbook is reliable for stable academic rules, but a practical university assistant cannot depend only on static handbook text. Some user questions naturally require **official web discovery**, **source checking**, or **routing decisions**.
|
| 644 |
|
| 645 |
Examples:
|
| 646 |
|
|
|
|
| 649 |
Where can students find residential college information?
|
| 650 |
What official page mentions PEKOM?
|
| 651 |
Where is the official SPeCTRUM page?
|
| 652 |
+
What facilities are associated with a specific lab?
|
| 653 |
+
```
|
| 654 |
+
|
| 655 |
+
For these cases, TensorTalk uses an official-source web agent. The agent is not designed as an unrestricted autonomous browser. It is deliberately designed as a **constrained agent** because the domain is academic handbook QA, where factual trustworthiness is more important than open-ended exploration.
|
| 656 |
+
|
| 657 |
+
In practical terms, the agent layer answers this question:
|
| 658 |
+
|
| 659 |
+
> If local handbook RAG is not enough, can the system search official UM/FSKTM sources, reject weak or fake sources, and return evidence safely?
|
| 660 |
+
|
| 661 |
+
---
|
| 662 |
+
|
| 663 |
+
## 6.2 How This Project Relates to LangChain and LangGraph
|
| 664 |
+
|
| 665 |
+
TensorTalk does **not** use LangChain, LangGraph, or LangSmith as the runtime framework. The agent and harness were implemented from scratch in the notebook.
|
| 666 |
+
|
| 667 |
+
However, the design is conceptually aligned with the official LangChain ecosystem ideas:
|
| 668 |
+
|
| 669 |
+
- LangChain describes agents as systems that combine language models with tools so they can reason about tasks, decide which tools to use, and iteratively work toward a result.
|
| 670 |
+
- LangGraph describes agent workflows using state, nodes, and edges, where nodes perform computation and edges determine the next transition.
|
| 671 |
+
- LangSmith describes evaluation as a workflow involving datasets, evaluators, and experiments to compare application versions.
|
| 672 |
+
- LangChain/LangGraph documentation also distinguishes between predetermined workflows and dynamic agents; TensorTalk intentionally uses a hybrid design because the handbook QA task needs both predictable guardrails and dynamic retrieval decisions.
|
| 673 |
+
|
| 674 |
+
Therefore, this project is best described as:
|
| 675 |
+
|
| 676 |
+
> A from-scratch implementation of a LangChain/LangGraph-inspired agentic RAG harness, not a project built by directly calling LangChain’s prebuilt agent framework.
|
| 677 |
+
|
| 678 |
+
This distinction is important. TensorTalk does not simply wrap a LangChain agent. Instead, it manually implements the major control ideas:
|
| 679 |
+
|
| 680 |
+
```text
|
| 681 |
+
State tracking
|
| 682 |
+
→ planning
|
| 683 |
+
→ retrieval/tool routing
|
| 684 |
+
→ source filtering
|
| 685 |
+
→ evidence normalization
|
| 686 |
+
→ evaluation/judging
|
| 687 |
+
→ retry/fallback
|
| 688 |
+
→ final generation
|
| 689 |
+
→ trace output
|
| 690 |
```
|
| 691 |
|
| 692 |
+
This gives the project more transparency because each part of the agent loop is visible in the notebook and UI trace.
|
| 693 |
+
|
| 694 |
+
---
|
| 695 |
+
|
| 696 |
+
## 6.3 Agent Design Philosophy
|
| 697 |
+
|
| 698 |
+
The TensorTalk agent is built around four principles.
|
| 699 |
+
|
| 700 |
+
### 1. Source-constrained autonomy
|
| 701 |
|
| 702 |
+
The agent can search and fetch information, but only from allowed official sources. It is not free to trust arbitrary search results.
|
| 703 |
+
|
| 704 |
+
### 2. Evidence-first generation
|
| 705 |
+
|
| 706 |
+
The model should not directly answer a web-sensitive question before evidence is collected and judged.
|
| 707 |
+
|
| 708 |
+
### 3. Retry and fallback
|
| 709 |
+
|
| 710 |
+
If official web evidence is weak, blocked, irrelevant, or unsafe, the system can retry with entity-aware search terms or fall back to local handbook RAG.
|
| 711 |
+
|
| 712 |
+
### 4. Traceable decisions
|
| 713 |
+
|
| 714 |
+
The agent does not hide its routing decisions. It records URL candidates, accepted evidence, rejected evidence, grounding decisions, and fallback decisions in trace panels.
|
| 715 |
|
| 716 |
---
|
| 717 |
|
| 718 |
+
## 6.4 Official UM / FSKTM Web Agent
|
| 719 |
|
| 720 |
+
The web agent is constrained to official UM / FSKTM domains. This is one of the most important safety and reliability choices in the project.
|
| 721 |
|
| 722 |
+
### Priority official domains
|
| 723 |
|
| 724 |
```text
|
| 725 |
fsktm.um.edu.my
|
| 726 |
www.um.edu.my
|
| 727 |
+
um.edu.my
|
| 728 |
```
|
| 729 |
|
| 730 |
+
### Auxiliary official UM-related domains
|
| 731 |
+
|
| 732 |
+
The project also recognizes selected UM-related service domains when they are relevant to student services, academic systems, library resources, research, career portals, or internal faculty resources:
|
| 733 |
|
| 734 |
```text
|
| 735 |
aasd.um.edu.my
|
|
|
|
| 742 |
gallery.fsktm.um.edu.my
|
| 743 |
```
|
| 744 |
|
| 745 |
+
The purpose of this whitelist is not to search the whole internet. The purpose is to constrain the agent to sources that are likely to be officially controlled by UM or FSKTM.
|
| 746 |
+
|
| 747 |
+
---
|
| 748 |
+
|
| 749 |
+
## 6.5 Domain Whitelist Design
|
| 750 |
+
|
| 751 |
+
The whitelist is used as a **domain guard** before a page can become trusted evidence.
|
| 752 |
+
|
| 753 |
+
The system treats URLs in three broad categories:
|
| 754 |
+
|
| 755 |
+
| URL type | Handling |
|
| 756 |
+
|---|---|
|
| 757 |
+
| Official UM/FSKTM URL | Can be considered as candidate evidence |
|
| 758 |
+
| UM-related service URL | Can be considered if relevant to the question type |
|
| 759 |
+
| Non-official or synthetic URL | Rejected or downgraded |
|
| 760 |
+
|
| 761 |
+
The whitelist helps prevent common LLM-agent failure cases:
|
| 762 |
|
| 763 |
```text
|
| 764 |
+
hallucinated programme pages
|
| 765 |
+
invented lab pages
|
| 766 |
+
fake student service URLs
|
| 767 |
+
misrouted search results
|
| 768 |
+
random third-party pages
|
| 769 |
+
SEO or unrelated pages
|
|
|
|
|
|
|
|
|
|
| 770 |
```
|
| 771 |
|
| 772 |
+
For example, the project specifically tests that the agent should not invent or accept URLs like:
|
| 773 |
+
|
| 774 |
+
```text
|
| 775 |
+
programme-ccna-lab-more-detailedly
|
| 776 |
+
bachelor-of-computer-science-artificial-intelligence
|
| 777 |
+
```
|
| 778 |
+
|
| 779 |
+
when those pages are not the correct evidence for the user’s question.
|
| 780 |
+
|
| 781 |
---
|
| 782 |
|
| 783 |
+
## 6.6 Web Agent Workflow
|
| 784 |
|
| 785 |
+
The official-source web agent follows a controlled workflow.
|
| 786 |
|
| 787 |
+
```text
|
| 788 |
+
User question
|
| 789 |
+
↓
|
| 790 |
+
Intent and entity detection
|
| 791 |
+
↓
|
| 792 |
+
Official-search query construction
|
| 793 |
+
↓
|
| 794 |
+
Candidate URL discovery
|
| 795 |
+
↓
|
| 796 |
+
Domain whitelist filtering
|
| 797 |
+
↓
|
| 798 |
+
Synthetic/fake URL rejection
|
| 799 |
+
↓
|
| 800 |
+
Fetch or static page fallback
|
| 801 |
+
↓
|
| 802 |
+
WAF/block detection
|
| 803 |
+
↓
|
| 804 |
+
Text extraction and normalization
|
| 805 |
+
↓
|
| 806 |
+
Evidence scoring
|
| 807 |
+
↓
|
| 808 |
+
Qwen evidence judge
|
| 809 |
+
↓
|
| 810 |
+
Accept evidence, retry, or fallback to handbook RAG
|
| 811 |
+
```
|
| 812 |
+
|
| 813 |
+
This means the agent is not only a web search function. It is a guarded evidence acquisition pipeline.
|
| 814 |
+
|
| 815 |
+
---
|
| 816 |
|
| 817 |
+
## 6.7 Planning Inside the Agent
|
| 818 |
+
|
| 819 |
+
Planning is a visible part of the TensorTalk system.
|
| 820 |
+
|
| 821 |
+
The planning layer is responsible for deciding:
|
| 822 |
|
| 823 |
```text
|
| 824 |
+
Is this a stable handbook question?
|
| 825 |
+
Is this a latest/current official-web question?
|
| 826 |
+
Should local RAG be used first?
|
| 827 |
+
Should official web discovery be attempted?
|
| 828 |
+
Which entity should be searched?
|
| 829 |
+
Which scope should be preferred: undergraduate, postgraduate, general, faculty, or university?
|
| 830 |
+
What evidence type is expected: handbook chunk, official page, contact page, facility page, announcement, policy page?
|
| 831 |
```
|
| 832 |
|
| 833 |
+
This planning step is aligned with the idea that agentic systems should not directly jump from user question to final answer. They need a control stage that decides which tools and evidence paths are appropriate.
|
| 834 |
|
| 835 |
+
TensorTalk’s planning is not a free-form hidden chain-of-thought that users must trust blindly. It is operationalized through explicit routing variables, trace objects, search decisions, and UI panels.
|
| 836 |
+
|
| 837 |
+
---
|
| 838 |
+
|
| 839 |
+
## 6.8 Generation Inside the Agent
|
| 840 |
+
|
| 841 |
+
Generation is the stage where the Qwen3-8B model produces an answer.
|
| 842 |
+
|
| 843 |
+
However, generation is not allowed to operate alone. The answer generator receives controlled context:
|
| 844 |
+
|
| 845 |
+
```text
|
| 846 |
+
user question
|
| 847 |
+
retrieved local handbook evidence
|
| 848 |
+
accepted official web evidence
|
| 849 |
+
scope hints
|
| 850 |
+
source metadata
|
| 851 |
+
harness instructions
|
| 852 |
+
answer style constraints
|
| 853 |
+
```
|
| 854 |
+
|
| 855 |
+
The generator is expected to:
|
| 856 |
+
|
| 857 |
+
```text
|
| 858 |
+
answer directly
|
| 859 |
+
avoid unsupported claims
|
| 860 |
+
avoid fake URLs
|
| 861 |
+
avoid exposing internal reasoning
|
| 862 |
+
use local handbook evidence when web evidence is weak
|
| 863 |
+
prefer official web evidence only when it is relevant and trusted
|
| 864 |
+
```
|
| 865 |
+
|
| 866 |
+
In the final stage, the generator is the PPO-trained Qwen3 actor, but it is still wrapped by the same RAG and Harness Engineering control layer.
|
| 867 |
+
|
| 868 |
+
---
|
| 869 |
+
|
| 870 |
+
## 6.9 Evaluation Inside the Agent
|
| 871 |
+
|
| 872 |
+
Evaluation is the other core part of the agent loop. TensorTalk evaluates both evidence and answers.
|
| 873 |
+
|
| 874 |
+
### Evidence evaluation
|
| 875 |
+
|
| 876 |
+
The system checks:
|
| 877 |
+
|
| 878 |
+
```text
|
| 879 |
+
Is the source official?
|
| 880 |
+
Is the URL synthetic or fake?
|
| 881 |
+
Is the page blocked by WAF?
|
| 882 |
+
Is the evidence relevant to the user question?
|
| 883 |
+
Does the evidence mention the right entity?
|
| 884 |
+
Does the evidence match the expected scope?
|
| 885 |
+
```
|
| 886 |
+
|
| 887 |
+
### Answer evaluation
|
| 888 |
+
|
| 889 |
+
The system checks:
|
| 890 |
+
|
| 891 |
+
```text
|
| 892 |
+
Is the final answer grounded in accepted evidence?
|
| 893 |
+
Does it answer the user’s actual question?
|
| 894 |
+
Does it leak internal thinking?
|
| 895 |
+
Does it invent URLs?
|
| 896 |
+
Is it too vague?
|
| 897 |
+
Is it incomplete enough to require fallback or rewrite?
|
| 898 |
+
```
|
| 899 |
+
|
| 900 |
+
This creates a full agentic loop:
|
| 901 |
+
|
| 902 |
+
```text
|
| 903 |
+
Planning
|
| 904 |
+
→ Retrieval / tool use
|
| 905 |
+
→ Generation
|
| 906 |
+
→ Evaluation
|
| 907 |
+
→ Retry or fallback
|
| 908 |
+
→ Final answer
|
| 909 |
+
```
|
| 910 |
+
|
| 911 |
+
---
|
| 912 |
+
|
| 913 |
+
## 6.10 Why the Agent Is Not Fully Autonomous
|
| 914 |
+
|
| 915 |
+
The agent is intentionally not fully autonomous.
|
| 916 |
+
|
| 917 |
+
A fully autonomous browsing agent may:
|
| 918 |
+
|
| 919 |
+
```text
|
| 920 |
+
search too broadly
|
| 921 |
+
trust wrong sources
|
| 922 |
+
follow irrelevant pages
|
| 923 |
+
invent missing pages
|
| 924 |
+
overuse web search
|
| 925 |
+
ignore handbook evidence
|
| 926 |
+
produce unsupported answers
|
| 927 |
+
```
|
| 928 |
+
|
| 929 |
+
TensorTalk instead uses a constrained model:
|
| 930 |
+
|
| 931 |
+
```text
|
| 932 |
+
Dynamic when needed
|
| 933 |
+
Guarded by default
|
| 934 |
+
Official-source-only for web evidence
|
| 935 |
+
RAG-first for handbook-stable questions
|
| 936 |
+
Fallback-aware when web evidence is weak
|
| 937 |
+
Traceable for debugging and demonstration
|
| 938 |
+
```
|
| 939 |
+
|
| 940 |
+
This is more appropriate for a university handbook assistant.
|
| 941 |
|
| 942 |
---
|
| 943 |
|
|
|
|
| 945 |
|
| 946 |
## 7.1 What Harness Engineering Means Here
|
| 947 |
|
| 948 |
+
Harness Engineering is the external control system around the LLM, RAG, and agent.
|
| 949 |
|
| 950 |
A simple analogy:
|
| 951 |
|
| 952 |
```text
|
| 953 |
The LLM/agent is the car.
|
| 954 |
+
Harness Engineering is the guardrail, traffic rule, checkpoint, fallback route, dashboard, and driving examiner.
|
| 955 |
```
|
| 956 |
|
| 957 |
The model can generate fluent answers, but the harness controls:
|
| 958 |
|
| 959 |
+
```text
|
| 960 |
+
what it can search
|
| 961 |
+
which domains are trusted
|
| 962 |
+
which URLs are rejected
|
| 963 |
+
which evidence is useful
|
| 964 |
+
when to retry
|
| 965 |
+
when to fallback
|
| 966 |
+
whether the answer is grounded
|
| 967 |
+
whether the UI should show warning traces
|
| 968 |
+
whether the final response is safe enough to display
|
| 969 |
+
```
|
| 970 |
+
|
| 971 |
+
In TensorTalk, Harness Engineering is not just prompt engineering. Prompt engineering tells the model what to do. Harness Engineering builds the surrounding execution system that checks whether the model actually did it correctly.
|
| 972 |
|
| 973 |
---
|
| 974 |
|
| 975 |
+
## 7.2 From Prompt Engineering to Harness Engineering
|
| 976 |
+
|
| 977 |
+
Prompt engineering is like telling a driver:
|
| 978 |
|
| 979 |
+
```text
|
| 980 |
+
Please drive carefully.
|
| 981 |
+
```
|
| 982 |
+
|
| 983 |
+
Harness Engineering is like building:
|
| 984 |
+
|
| 985 |
+
```text
|
| 986 |
+
lane barriers
|
| 987 |
+
speed checks
|
| 988 |
+
traffic rules
|
| 989 |
+
navigation checkpoints
|
| 990 |
+
fallback routes
|
| 991 |
+
dashboards
|
| 992 |
+
incident logs
|
| 993 |
+
```
|
| 994 |
+
|
| 995 |
+
In this project, prompt engineering alone is not enough because the model may still:
|
| 996 |
+
|
| 997 |
+
```text
|
| 998 |
+
invent fake URLs
|
| 999 |
+
mix undergraduate and postgraduate rules
|
| 1000 |
+
leak internal reasoning
|
| 1001 |
+
trust weak web snippets
|
| 1002 |
+
answer without evidence
|
| 1003 |
+
overuse web search
|
| 1004 |
+
ignore local handbook RAG
|
| 1005 |
+
```
|
| 1006 |
+
|
| 1007 |
+
The harness prevents or detects these failures through code-level controls, not only prompt instructions.
|
| 1008 |
+
|
| 1009 |
+
---
|
| 1010 |
+
|
| 1011 |
+
## 7.3 From-scratch LangChain-style Harness
|
| 1012 |
+
|
| 1013 |
+
TensorTalk’s harness was built manually rather than by importing a prebuilt LangChain/LangGraph agent.
|
| 1014 |
+
|
| 1015 |
+
The implementation follows the same conceptual loop used in many modern agent frameworks:
|
| 1016 |
+
|
| 1017 |
+
```text
|
| 1018 |
+
Planning
|
| 1019 |
+
→ Tool / Retrieval Action
|
| 1020 |
+
→ Generation
|
| 1021 |
+
→ Evaluation
|
| 1022 |
+
→ Retry / Fallback
|
| 1023 |
+
→ Finalization
|
| 1024 |
+
```
|
| 1025 |
+
|
| 1026 |
+
But each component is implemented explicitly:
|
| 1027 |
+
|
| 1028 |
+
| Conceptual framework idea | TensorTalk from-scratch implementation |
|
| 1029 |
+
|---|---|
|
| 1030 |
+
| Agent state | Trace dictionaries, evidence bundles, routing flags, runtime status |
|
| 1031 |
+
| Tools | Local RAG retriever, official web search/fetch, URL validator, evidence judge |
|
| 1032 |
+
| Nodes | Planning, retrieval, web discovery, evidence filtering, judging, generation, grounding |
|
| 1033 |
+
| Edges / transitions | Conditional retry, weak-evidence fallback, RAG fallback, final answer route |
|
| 1034 |
+
| Evaluation | Qwen evidence judge, rule checks, answer grounding judge, smoke tests |
|
| 1035 |
+
| Observability | Collapsed UI trace panels and printed diagnostic outputs |
|
| 1036 |
+
|
| 1037 |
+
This makes the system easier to inspect in an academic notebook because the control logic is visible.
|
| 1038 |
+
|
| 1039 |
+
---
|
| 1040 |
+
|
| 1041 |
+
## 7.4 Planning → Generation → Evaluation Closed Loop
|
| 1042 |
+
|
| 1043 |
+
The most important Harness Engineering contribution in TensorTalk is the closed loop:
|
| 1044 |
+
|
| 1045 |
+
```text
|
| 1046 |
+
Planning
|
| 1047 |
+
↓
|
| 1048 |
+
Generation
|
| 1049 |
+
↓
|
| 1050 |
+
Evaluation
|
| 1051 |
+
↓
|
| 1052 |
+
Retry / Fallback / Finalization
|
| 1053 |
+
```
|
| 1054 |
+
|
| 1055 |
+
### Planning
|
| 1056 |
+
|
| 1057 |
+
The planning layer decides how to handle the query.
|
| 1058 |
+
|
| 1059 |
+
It considers:
|
| 1060 |
+
|
| 1061 |
+
```text
|
| 1062 |
+
question type
|
| 1063 |
+
scope
|
| 1064 |
+
entity
|
| 1065 |
+
whether local RAG is enough
|
| 1066 |
+
whether official web is needed
|
| 1067 |
+
whether the query is dynamic/current
|
| 1068 |
+
whether the answer should be handbook-grounded or web-grounded
|
| 1069 |
+
```
|
| 1070 |
+
|
| 1071 |
+
### Generation
|
| 1072 |
+
|
| 1073 |
+
The generation layer produces an answer using controlled evidence.
|
| 1074 |
+
|
| 1075 |
+
It receives:
|
| 1076 |
+
|
| 1077 |
+
```text
|
| 1078 |
+
local handbook chunks
|
| 1079 |
+
official web evidence
|
| 1080 |
+
scope hints
|
| 1081 |
+
source metadata
|
| 1082 |
+
answer constraints
|
| 1083 |
+
```
|
| 1084 |
+
|
| 1085 |
+
### Evaluation
|
| 1086 |
+
|
| 1087 |
+
The evaluation layer checks the result.
|
| 1088 |
+
|
| 1089 |
+
It evaluates:
|
| 1090 |
+
|
| 1091 |
+
```text
|
| 1092 |
+
source trust
|
| 1093 |
+
URL validity
|
| 1094 |
+
evidence relevance
|
| 1095 |
+
answer grounding
|
| 1096 |
+
completeness
|
| 1097 |
+
process leakage
|
| 1098 |
+
fake URLs
|
| 1099 |
+
fallback need
|
| 1100 |
+
```
|
| 1101 |
+
|
| 1102 |
+
If evaluation fails, the system can retry, reroute, or fall back.
|
| 1103 |
+
|
| 1104 |
+
This is the engineering loop that makes TensorTalk more than a simple RAG chatbot.
|
| 1105 |
+
|
| 1106 |
+
---
|
| 1107 |
+
|
| 1108 |
+
## 7.5 Standardized Harness Core Pipeline
|
| 1109 |
+
|
| 1110 |
+
The final standardized TensorTalk Harness Core follows this pipeline:
|
| 1111 |
|
| 1112 |
```text
|
| 1113 |
User Question
|
| 1114 |
↓
|
| 1115 |
+
Planning Layer
|
| 1116 |
+
↓
|
| 1117 |
Local Handbook RAG
|
| 1118 |
↓
|
| 1119 |
Official Web Discovery
|
|
|
|
| 1132 |
↓
|
| 1133 |
Weak Evidence Fallback
|
| 1134 |
↓
|
| 1135 |
+
PPO/SFT Answer Generator
|
| 1136 |
↓
|
| 1137 |
Answer Grounding Judge
|
| 1138 |
↓
|
| 1139 |
Completeness Guard
|
| 1140 |
↓
|
| 1141 |
+
Final Answer
|
| 1142 |
+
↓
|
| 1143 |
UI Trace
|
| 1144 |
```
|
| 1145 |
|
| 1146 |
+
This pipeline is intentionally explicit. Each part has a clear job.
|
| 1147 |
+
|
| 1148 |
---
|
| 1149 |
|
| 1150 |
+
## 7.6 Harness State and Trace Objects
|
| 1151 |
|
| 1152 |
+
The harness keeps structured trace data so that every answer can be inspected.
|
| 1153 |
|
| 1154 |
+
Typical trace information includes:
|
| 1155 |
|
| 1156 |
+
```text
|
| 1157 |
+
retrieved local RAG chunks
|
| 1158 |
+
candidate web URLs
|
| 1159 |
+
accepted official URLs
|
| 1160 |
+
rejected URLs
|
| 1161 |
+
web evidence bundle
|
| 1162 |
+
harness core route
|
| 1163 |
+
evidence judge result
|
| 1164 |
+
answer grounding result
|
| 1165 |
+
fallback reason
|
| 1166 |
+
final answer preview
|
| 1167 |
+
```
|
| 1168 |
|
| 1169 |
+
This is similar in spirit to observability and tracing in agent platforms, but implemented directly in the notebook and UI.
|
| 1170 |
|
| 1171 |
+
---
|
| 1172 |
+
|
| 1173 |
+
## 7.7 Domain Guard
|
| 1174 |
+
|
| 1175 |
+
The domain guard checks whether a candidate source belongs to the allowed official domain set.
|
| 1176 |
+
|
| 1177 |
+
It protects against:
|
| 1178 |
+
|
| 1179 |
+
```text
|
| 1180 |
+
random third-party websites
|
| 1181 |
+
unofficial mirrors
|
| 1182 |
+
search result noise
|
| 1183 |
+
LLM-fabricated domains
|
| 1184 |
+
wrong university pages
|
| 1185 |
+
```
|
| 1186 |
+
|
| 1187 |
+
It also makes the system explainable. If the agent rejects a page, the trace can show why.
|
| 1188 |
|
| 1189 |
---
|
| 1190 |
|
| 1191 |
+
## 7.8 Fake URL Guard
|
| 1192 |
|
| 1193 |
+
The fake URL guard is one of the most important parts of the project because raw LLM generations can invent plausible-looking URLs.
|
| 1194 |
|
| 1195 |
+
Examples of risky synthetic URLs include:
|
| 1196 |
|
| 1197 |
```text
|
| 1198 |
+
https://spectrum.umlms
|
| 1199 |
+
http://spectrux.medicum
|
| 1200 |
+
programme-ccna-lab-more-detailedly
|
| 1201 |
+
https://aasd um edu my/studetn
|
|
|
|
|
|
|
| 1202 |
```
|
| 1203 |
|
| 1204 |
+
The guard checks and rejects URLs that:
|
| 1205 |
+
|
| 1206 |
+
```text
|
| 1207 |
+
are malformed
|
| 1208 |
+
look fabricated
|
| 1209 |
+
contain suspicious path patterns
|
| 1210 |
+
do not belong to allowed domains
|
| 1211 |
+
are query-fabricated rather than discovered from official search/fetch
|
| 1212 |
+
```
|
| 1213 |
+
|
| 1214 |
+
The PPO reward function also penalizes hallucinated URLs, but the harness is still necessary because reward shaping does not guarantee perfect URL behavior.
|
| 1215 |
|
| 1216 |
---
|
| 1217 |
|
| 1218 |
+
## 7.9 WAF Detection
|
| 1219 |
|
| 1220 |
+
Some official pages can be blocked, partially loaded, or protected by web application firewalls.
|
| 1221 |
|
| 1222 |
+
The WAF-aware harness detects cases where:
|
| 1223 |
|
| 1224 |
```text
|
| 1225 |
+
the page cannot be fetched normally
|
| 1226 |
+
the content is a block page instead of the real page
|
| 1227 |
+
the browser click fails
|
| 1228 |
+
the official site returns insufficient text
|
| 1229 |
+
```
|
| 1230 |
+
|
| 1231 |
+
When this happens, the system avoids treating the blocked page as strong evidence. It can use diagnostics, retry, static fallback, or local RAG fallback.
|
| 1232 |
+
|
| 1233 |
+
---
|
| 1234 |
+
|
| 1235 |
+
## 7.10 Evidence Normalizer
|
| 1236 |
+
|
| 1237 |
+
Fetched web pages and handbook chunks may be noisy.
|
| 1238 |
+
|
| 1239 |
+
The evidence normalizer attempts to convert them into a consistent evidence structure:
|
| 1240 |
+
|
| 1241 |
+
```text
|
| 1242 |
+
title
|
| 1243 |
+
url
|
| 1244 |
+
source type
|
| 1245 |
+
domain
|
| 1246 |
+
text snippet
|
| 1247 |
+
score
|
| 1248 |
+
scope
|
| 1249 |
+
entity
|
| 1250 |
+
reason
|
| 1251 |
+
```
|
| 1252 |
+
|
| 1253 |
+
This makes later judging and UI display easier.
|
| 1254 |
+
|
| 1255 |
+
---
|
| 1256 |
+
|
| 1257 |
+
## 7.11 Qwen Evidence Judge
|
| 1258 |
+
|
| 1259 |
+
The Qwen evidence judge is used to decide whether retrieved evidence actually helps answer the user’s question.
|
| 1260 |
+
|
| 1261 |
+
It checks:
|
| 1262 |
+
|
| 1263 |
+
```text
|
| 1264 |
+
Does the evidence mention the right entity?
|
| 1265 |
+
Does it answer the question directly?
|
| 1266 |
+
Is it only loosely related?
|
| 1267 |
+
Is it a wrong programme/page?
|
| 1268 |
+
Is it official but irrelevant?
|
| 1269 |
```
|
| 1270 |
|
| 1271 |
+
This is important because official sources can still be irrelevant. A page can be official and still be the wrong evidence.
|
| 1272 |
+
|
| 1273 |
---
|
| 1274 |
|
| 1275 |
+
## 7.12 Entity-aware Retry
|
| 1276 |
|
| 1277 |
+
If the first web discovery result is weak or misrouted, the harness can retry with better query terms.
|
| 1278 |
|
| 1279 |
+
For example, if a question about PEKOM gets routed toward an AI bachelor programme page, the harness should retry using terms related to:
|
| 1280 |
+
|
| 1281 |
+
```text
|
| 1282 |
+
PEKOM
|
| 1283 |
+
Persatuan Komputer UM
|
| 1284 |
+
student society
|
| 1285 |
+
FSKTM student association
|
| 1286 |
+
```
|
| 1287 |
+
|
| 1288 |
+
This prevents the agent from accepting the first official-looking but semantically wrong page.
|
| 1289 |
|
| 1290 |
---
|
| 1291 |
|
| 1292 |
+
## 7.13 Weak Evidence Fallback
|
| 1293 |
+
|
| 1294 |
+
If the official web evidence is weak, TensorTalk can fall back to local handbook RAG.
|
| 1295 |
+
|
| 1296 |
+
This prevents a common agent failure:
|
| 1297 |
+
|
| 1298 |
+
```text
|
| 1299 |
+
The system found a web page, so it trusts it even though it does not answer the question.
|
| 1300 |
+
```
|
| 1301 |
+
|
| 1302 |
+
Instead, TensorTalk uses:
|
| 1303 |
+
|
| 1304 |
+
```text
|
| 1305 |
+
web evidence if strong
|
| 1306 |
+
local handbook RAG if web evidence is weak
|
| 1307 |
+
hybrid answer if both are useful
|
| 1308 |
+
refusal/uncertainty if neither is sufficient
|
| 1309 |
+
```
|
| 1310 |
+
|
| 1311 |
+
---
|
| 1312 |
+
|
| 1313 |
+
## 7.14 Answer Grounding Judge
|
| 1314 |
+
|
| 1315 |
+
After answer generation, the answer grounding judge checks whether the final answer is supported by the accepted evidence.
|
| 1316 |
+
|
| 1317 |
+
It helps catch cases where:
|
| 1318 |
+
|
| 1319 |
+
```text
|
| 1320 |
+
retrieval was correct but generation added unsupported claims
|
| 1321 |
+
the model invented a URL
|
| 1322 |
+
the model mixed evidence from different scopes
|
| 1323 |
+
the answer contains a statement that does not appear in evidence
|
| 1324 |
+
```
|
| 1325 |
+
|
| 1326 |
+
This is the evaluation part of the Planning → Generation → Evaluation loop.
|
| 1327 |
+
|
| 1328 |
+
---
|
| 1329 |
+
|
| 1330 |
+
## 7.15 Completeness Guard
|
| 1331 |
+
|
| 1332 |
+
The completeness guard checks whether the answer is too short, vague, or incomplete.
|
| 1333 |
|
| 1334 |
+
It can identify cases where the answer:
|
| 1335 |
+
|
| 1336 |
+
```text
|
| 1337 |
+
only repeats the question
|
| 1338 |
+
does not include required details
|
| 1339 |
+
misses key fields
|
| 1340 |
+
does not answer the requested scope
|
| 1341 |
+
cuts off early
|
| 1342 |
+
```
|
| 1343 |
+
|
| 1344 |
+
Depending on runtime settings, this can trigger a rewrite or fallback.
|
| 1345 |
|
| 1346 |
---
|
| 1347 |
|
| 1348 |
+
## 7.16 Smoke Tests as Harness Unit Checks
|
| 1349 |
+
|
| 1350 |
+
The smoke tests are lightweight checks that make sure the harness pipeline still works after model or code changes.
|
| 1351 |
|
| 1352 |
+
Examples:
|
| 1353 |
+
|
| 1354 |
+
```text
|
| 1355 |
+
PEKOM should not be routed to the AI bachelor page.
|
| 1356 |
+
Residential college should prefer the student-affairs residential page.
|
| 1357 |
+
CCNA Lab should not invent synthetic URLs.
|
| 1358 |
+
```
|
| 1359 |
|
| 1360 |
+
These tests check:
|
| 1361 |
+
|
| 1362 |
+
```text
|
| 1363 |
+
routing
|
| 1364 |
+
URL filtering
|
| 1365 |
+
official page preference
|
| 1366 |
+
fake URL rejection
|
| 1367 |
+
answer grounding trace
|
| 1368 |
+
harness core route
|
| 1369 |
+
```
|
| 1370 |
+
|
| 1371 |
+
They are not a full benchmark. They are fast sanity checks that the system still runs through the expected pipeline.
|
| 1372 |
|
| 1373 |
---
|
| 1374 |
|
| 1375 |
+
## 7.17 Why Harness Engineering Is Central to This Project
|
| 1376 |
|
| 1377 |
+
The final system does not rely on only one technique.
|
| 1378 |
+
|
| 1379 |
+
```text
|
| 1380 |
+
SFT gives domain answer style.
|
| 1381 |
+
RAG gives handbook evidence.
|
| 1382 |
+
The web agent gives official external evidence.
|
| 1383 |
+
PPO improves answer behavior.
|
| 1384 |
+
Harness Engineering controls the whole system.
|
| 1385 |
+
```
|
| 1386 |
+
|
| 1387 |
+
Without the harness, the system would still be vulnerable to:
|
| 1388 |
+
|
| 1389 |
+
```text
|
| 1390 |
+
wrong source selection
|
| 1391 |
+
fake URLs
|
| 1392 |
+
weak web evidence
|
| 1393 |
+
scope confusion
|
| 1394 |
+
process leakage
|
| 1395 |
+
unsupported final answers
|
| 1396 |
+
stale artifact loading
|
| 1397 |
+
```
|
| 1398 |
+
|
| 1399 |
+
Therefore, Harness Engineering is the system-level contribution that connects SFT, RAG, Agent, and PPO into one controlled workflow.
|
| 1400 |
|
| 1401 |
---
|
| 1402 |
|
|
|
|
| 1936 |
|
| 1937 |
---
|
| 1938 |
|
| 1939 |
+
# 18. Relation to LangChain / LangGraph / LangSmith Concepts
|
| 1940 |
+
|
| 1941 |
+
This project does not claim to be a LangChain implementation. Instead, it uses a from-scratch notebook implementation that follows similar engineering ideas.
|
| 1942 |
+
|
| 1943 |
+
Official LangChain ecosystem references that influenced the design include:
|
| 1944 |
+
|
| 1945 |
+
- [LangChain Agents documentation](https://docs.langchain.com/oss/javascript/langchain/agents): agents combine language models with tools and can iteratively work toward a goal.
|
| 1946 |
+
- [LangGraph Overview](https://docs.langchain.com/oss/python/langgraph/overview): LangGraph focuses on durable execution, streaming, human-in-the-loop, persistence, and orchestration for agent workflows.
|
| 1947 |
+
- [LangGraph Graph API](https://docs.langchain.com/oss/python/langgraph/graph-api): agent workflows can be modeled through state, nodes, and edges.
|
| 1948 |
+
- [LangGraph Workflows and Agents](https://docs.langchain.com/oss/python/langgraph/workflows-agents): workflows use predetermined code paths, while agents are more dynamic in tool usage and process control.
|
| 1949 |
+
- [LangSmith Evaluation](https://docs.langchain.com/langsmith/evaluation): evaluation can be structured around datasets, evaluators, and experiments.
|
| 1950 |
+
- [LangSmith Evaluation Types](https://docs.langchain.com/langsmith/evaluation-types): evaluation may include benchmarking, unit tests, regression tests, LLM-as-judge evaluators, code evaluators, and online monitoring.
|
| 1951 |
+
- [LangSmith Application-specific Evaluation Approaches](https://docs.langchain.com/langsmith/evaluation-approaches): autonomous agents are commonly discussed in terms of tool calling, memory, and planning.
|
| 1952 |
+
|
| 1953 |
+
TensorTalk maps these ideas into a custom system:
|
| 1954 |
+
|
| 1955 |
+
| LangChain ecosystem idea | TensorTalk implementation |
|
| 1956 |
+
|---|---|
|
| 1957 |
+
| Agent uses model + tools | Qwen3 model + local RAG + official web search + URL validator + evidence judge |
|
| 1958 |
+
| State | Trace dictionaries, evidence bundles, routing flags, model/backend status |
|
| 1959 |
+
| Nodes | Planning, retrieval, web discovery, filtering, judging, generation, grounding, completeness checking |
|
| 1960 |
+
| Edges | Conditional retry, official-web route, local-RAG fallback, weak-evidence fallback, final-answer route |
|
| 1961 |
+
| Planning | Query classification, scope detection, entity-aware routing, web/RAG decision |
|
| 1962 |
+
| Generation | SFT/PPO Qwen3 actor generates with accepted evidence |
|
| 1963 |
+
| Evaluation | Evidence judge, answer grounding judge, completeness guard, fake URL checks, smoke tests |
|
| 1964 |
+
| Observability | TensorTalk collapsed trace panels and diagnostic logs |
|
| 1965 |
+
| Regression/smoke testing | PEKOM route test, residential-college URL test, CCNA synthetic URL test |
|
| 1966 |
+
|
| 1967 |
+
This is why the project can be described as:
|
| 1968 |
+
|
| 1969 |
+
> A from-scratch LangChain/LangGraph-inspired RAG agent harness for UM Handbook QA, with a Planning → Generation → Evaluation control loop.
|
| 1970 |
+
|
| 1971 |
+
---
|
| 1972 |
+
|
| 1973 |
+
# 19. Summary
|
| 1974 |
|
| 1975 |
TensorTalk demonstrates a staged LLM system development workflow:
|
| 1976 |
|