--- library_name: transformers tags: - sft - unsloth - science - reasoning license: apache-2.0 datasets: - mattwesney/CoT_Reasoning_Scientific_Discovery_and_Research language: - en base_model: - unsloth/Qwen3-1.7B pipeline_tag: text-generation --- ![20250421_1023_Scientific Discovery Design_simple_compose_01jscbjwqdetvtd5phh85mwsa7.png](https://cdn-uploads.huggingface.co/production/uploads/65dbedfd2f6d2dfc27763b98/mXuepdf8ZDBDMtdaBx-nV.png) # Model Card for Qwen3-CoT-Scientific-Research ## Model Details ### Model Description - **Base Model:** Qwen3-1.7B - **Task:** Scientific Reasoning with Chain-of-Thought (CoT) - **Dataset:** CoT_Reasoning_Scientific_Discovery_and_Research (custom dataset focusing on step-by-step scientific reasoning tasks) - **Training Objective:** Encourage step-by-step logical deductions for scientific reasoning problems ## Uses ### Direct Use This fine-tuned model is designed for: - Assisting in teaching and learning scientific reasoning - Supporting educational AI assistants in science classrooms - Demonstrating step-by-step scientific reasoning in research training contexts - Serving as a resource for automated reasoning systems to better emulate structured scientific logic It is not intended to replace human researchers, perform advanced analytics, or generate novel scientific discoveries. ## Bias, Risks, and Limitations - May oversimplify complex or interdisciplinary problems - Performance limited by the scope of training data (primarily introductory-level scientific reasoning tasks) - Does not handle real-world experimentation or advanced statistical modeling - May produce incorrect reasoning if the prompt is highly ambiguous ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("khazarai/Scie-R1") model = AutoModelForCausalLM.from_pretrained( "khazarai/Scie-R1", device_map={"": 0} ) question = """ How are microfluidic devices revolutionizing laboratory analysis techniques, and what are the primary advantages they offer over traditional methods? """ messages = [ {"role" : "user", "content" : question} ] text = tokenizer.apply_chat_template( messages, tokenize = False, add_generation_prompt = True, enable_thinking = True, ) from transformers import TextStreamer _ = model.generate( **tokenizer(text, return_tensors = "pt").to("cuda"), max_new_tokens = 1800, temperature = 0.6, top_p = 0.95, top_k = 20, streamer = TextStreamer(tokenizer, skip_prompt = True), ) ``` ## Training Details ### Training Data **Scope** This model was fine-tuned on tasks that involve core scientific reasoning: - Formulating testable hypotheses - Identifying independent and dependent variables - Designing simple controlled experiments - Interpreting graphs, tables, and basic data representations - Understanding relationships between evidence and conclusions - Recognizing simple logical fallacies in scientific arguments **Illustrative Examples** - Drawing conclusions from experimental results - Evaluating alternative explanations for observed data - Explaining step-by-step reasoning behind scientific conclusions **Emphasis on Chain-of-Thought (CoT)** - The dataset highlights explicit reasoning steps, making the model better at producing step-by-step explanations when solving scientific reasoning tasks. - Focus on Foundational Knowledge - The dataset aims to strengthen models in foundational scientific reasoning skills rather than covering all domains of scientific knowledge. **Focus on Foundational Knowledge** The dataset aims to strengthen models in foundational scientific reasoning skills rather than covering all domains of scientific knowledge. **Dataset:** [moremilk/CoT_Reasoning_Scientific_Discovery_and_Research](https://huggingface.co/datasets/moremilk/CoT_Reasoning_Scientific_Discovery_and_Research)