Agnuxo commited on
Commit
39bc540
verified
1 Parent(s): 9ed9fb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md CHANGED
@@ -21,3 +21,166 @@ tags:
21
  This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
24
+
25
+ ---
26
+
27
+ A Multi-Expert Question Answering System
28
+ ---
29
+
30
+ This Python program implements a simple multi-expert question answering system using a Mixture of Experts (MOE) approach with large language models (LLMs).
31
+
32
+ Here's how it works: Model Loading: It loads a "director" LLM and keeps expert LLMs for programming, biology, and mathematics on standby. Expert Routing: When a user asks a question, the program uses keyword matching or consults the director LLM to determine the most relevant expert. Dynamic Expert Loading: It loads the chosen expert LLM into memory, releasing the previous expert to conserve resources. Response Generation: The program prompts the selected expert LLM with the question and returns the generated answer to the user. Chat Interface: A simple chat interface allows users to interact with the system and ask questions on various topics. This MOE approach allows for more efficient and potentially more accurate responses compared to using a single general-purpose LLM..
33
+
34
+ ---
35
+ ---
36
+
37
+ https://huggingface.co/Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit/resolve/main/MOE-LLMs3.py
38
+ ---
39
+
40
+ ---
41
+ import os
42
+ import torch
43
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
44
+
45
+ ---
46
+
47
+ # Model configuration
48
+ MODEL_CONFIG = {
49
+ "director": {
50
+ "name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_Director_16bit",
51
+ "task": "text-generation",
52
+ },
53
+ "programming": {
54
+ "name": "Qwen/Qwen2-1.5B-Instruct",
55
+ "task": "text-generation",
56
+ },
57
+ "biology": {
58
+ "name": "Agnuxo/Qwen2-1.5B-Instruct_MOE_BIOLOGY_assistant_16bit",
59
+ "task": "text-generation",
60
+ },
61
+ "mathematics": {
62
+ "name": "Qwen/Qwen2-Math-1.5B-Instruct",
63
+ "task": "text-generation",
64
+ }
65
+ }
66
+
67
+ ---
68
+ ---
69
+ # Keywords for each subject
70
+ KEYWORDS = {
71
+ "biology": ["cell", "DNA", "protein", "evolution", "genetics", "ecosystem", "organism", "metabolism", "photosynthesis", "microbiology", "c茅lula", "ADN", "prote铆na", "evoluci贸n", "gen茅tica", "ecosistema", "organismo", "metabolismo", "fotos铆ntesis", "microbiolog铆a"],
72
+ "mathematics": ["Math" "mathematics", "equation", "integral", "derivative", "function", "geometry", "algebra", "statistics", "probability", "ecuaci贸n", "integral", "derivada", "funci贸n", "geometr铆a", "谩lgebra", "estad铆stica", "probabilidad"],
73
+ "programming": ["python", "java", "C++", "HTML", "scrip", "code", "Dataset", "API", "framework", "debugging", "algorithm", "compiler", "database", "CSS", "JSON", "XML", "encryption", "IDE", "repository", "Git", "version control", "front-end", "back-end", "API", "stack trace", "REST", "machine learning"]
74
+ }
75
+
76
+ ---
77
+ ---
78
+
79
+ class MOELLM:
80
+ def __init__(self):
81
+ self.current_expert = None
82
+ self.current_model = None
83
+ self.current_tokenizer = None
84
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
85
+ print(f"Using device: {self.device}")
86
+ self.load_director_model()
87
+
88
+ def load_director_model(self):
89
+ """Loads the director model."""
90
+ print("Loading director model...")
91
+ model_name = MODEL_CONFIG["director"]["name"]
92
+ self.director_tokenizer = AutoTokenizer.from_pretrained(model_name)
93
+ self.director_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16).to(self.device)
94
+ self.director_pipeline = pipeline(
95
+ MODEL_CONFIG["director"]["task"],
96
+ model=self.director_model,
97
+ tokenizer=self.director_tokenizer,
98
+ device=self.device
99
+ )
100
+ print("Director model loaded.")
101
+
102
+ def load_expert_model(self, expert):
103
+ """Dynamically loads an expert model, releasing memory from the previous model."""
104
+ if expert not in MODEL_CONFIG:
105
+ raise ValueError(f"Unknown expert: {expert}")
106
+
107
+ if self.current_expert != expert:
108
+ print(f"Loading expert model: {expert}...")
109
+
110
+ # Free memory from the current model if it exists
111
+ if self.current_model:
112
+ del self.current_model
113
+ del self.current_tokenizer
114
+ torch.cuda.empty_cache()
115
+
116
+ model_config = MODEL_CONFIG[expert]
117
+ self.current_tokenizer = AutoTokenizer.from_pretrained(model_config["name"])
118
+ self.current_model = AutoModelForCausalLM.from_pretrained(model_config["name"], torch_dtype=torch.float16).to(self.device)
119
+ self.current_expert = expert
120
+
121
+ print(f"{expert.capitalize()} model loaded.")
122
+
123
+ return pipeline(
124
+ MODEL_CONFIG[expert]["task"],
125
+ model=self.current_model,
126
+ tokenizer=self.current_tokenizer,
127
+ device=self.device
128
+ )
129
+
130
+ def determine_expert_by_keywords(self, question):
131
+ """Determines the expert based on keywords in the question."""
132
+ question_lower = question.lower()
133
+ for expert, keywords in KEYWORDS.items():
134
+ if any(keyword in question_lower for keyword in keywords):
135
+ return expert
136
+ return None
137
+
138
+ def determine_expert(self, question):
139
+ """Determines which expert should answer the question."""
140
+ expert = self.determine_expert_by_keywords(question)
141
+ if expert:
142
+ print(f"Expert determined by keyword: {expert}")
143
+ return expert
144
+
145
+ prompt = f"Classify the following question into one of these categories: programming, biology, mathematics. Question: {question}\nCategory:"
146
+ response = self.director_pipeline(prompt, max_length=100, num_return_sequences=1)[0]['generated_text']
147
+ expert = response.split(":")[-1].strip().lower()
148
+ if expert not in MODEL_CONFIG:
149
+ expert = "director"
150
+ print(f"Redirecting question to: {expert}")
151
+ return expert
152
+
153
+ def generate_response(self, question, expert):
154
+ """Generates a response using the appropriate model."""
155
+ try:
156
+ model = self.load_expert_model(expert)
157
+ prompt = f"Answer the following question as an expert in {expert}: {question}\nAnswer:"
158
+ response = model(prompt, max_length=200, num_return_sequences=1)[0]['generated_text']
159
+ return response.split("Answer:")[-1].strip()
160
+ except Exception as e:
161
+ print(f"Error generating response: {str(e)}")
162
+ return "Sorry, there was an error processing your request. Please try again."
163
+
164
+ def chat_interface(self):
165
+ """Simple chat interface."""
166
+ print("Welcome to the MOE-LLM chat. Type 'exit' to quit.")
167
+ while True:
168
+ question = input("\nYou: ")
169
+ if question.lower() in ['exit', 'quit']:
170
+ break
171
+
172
+ try:
173
+ expert = self.determine_expert(question)
174
+ response = self.generate_response(question, expert)
175
+ print(f"\n{expert.capitalize()}: {response}")
176
+ except Exception as e:
177
+ print(f"Error in chat: {str(e)}")
178
+ print("Please try asking another question.")
179
+
180
+ if __name__ == "__main__":
181
+ moe_llm = MOELLM()
182
+ moe_llm.chat_interface()
183
+
184
+ ---
185
+ https://github.com/Agnuxo1/NEBULA
186
+ ---