{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ! pip install tensorflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ! pip install torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ! pip install transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pipeline:\n",
    "The pipeline function is the most high level api in the transformers library.\n",
    "The pipeline function returns an end-to-end object that performs an NLP task on one or several texts.\n",
    "A pipeline includes all the necessary pre-processing as the model does not expect texts but numbers, it feeds the numbers to the model and the post-processing to make the output human readable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentiment Analysis Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'label': 'LABEL_0', 'score': 0.06559786200523376}]\n",
      "[{'label': 'LABEL_0', 'score': 0.12948733568191528}, {'label': 'LABEL_0', 'score': 0.12888683378696442}]\n"
     ]
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "classifier = pipeline('sentiment-analysis', model='distilgpt2')\n",
    "\n",
    "# pass single text:\n",
    "res = classifier(\"I've been waiting for a Huggingface course\")\n",
    "print(res)\n",
    "\n",
    "# Pass multiple texts:\n",
    "res = classifier(['I love you', 'I hate you'])\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Zero Shot Classification Pipeline\n",
    "Helps to classify what the sentence or topic is about "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "Failed to determine 'entailment' label id from the label2id mapping in the model config. Setting to -1. Define a descriptive label2id mapping in the model config to ensure correct outputs.\n",
      "Tokenizer was not supporting padding necessary for zero-shot, attempting to use  `pad_token=eos_token`\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'sequence': 'This is a course about the Transformers library',\n",
       " 'labels': ['education', 'ploitics', 'business'],\n",
       " 'scores': [0.36338528990745544, 0.3443466126918793, 0.29226812720298767]}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from transformers import pipeline \n",
    "\n",
    "classifier = pipeline('zero-shot-classification', model='distilgpt2')\n",
    "classifier('This is a course about the Transformers library',\n",
    "           candidate_labels=['education', 'ploitics', 'business'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text Generation pipeline:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "will auto complete a given prompt. \n",
    "Output is generated with a bit of randomness so it changes when you run it each time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'In this course we will teach you how to play and take your skill set as a starter, how to play, and as a player. I will'},\n",
       " {'generated_text': 'In this course we will teach you how to convert to Java as your main operating system and write to your friends through our website at Google+.\\n\\n'}]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "generator = pipeline('text-generation', model='distilgpt2')\n",
    "generator('In this course we will teach you how to',\n",
    "          max_length=30,\n",
    "          num_return_sequences=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'generated_text': 'in this course we will teach you how to play with a real life experience. It will be a lot more about the importance of understanding and building a'},\n",
       " {'generated_text': 'in this course we will teach you how to achieve the objectives of the program. We will teach you how to achieve the objectives of the program. We'}]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "generator = pipeline('text-generation', model='distilgpt2')\n",
    "generator('in this course we will teach you how to',\n",
    "          max_length=30,\n",
    "          num_return_sequences=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The text-generation pipeline is used with the model distilgpt2 above"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fill Mask Pipeline\n",
    "This pipeline is a pertraining objective of BERT. This is guess masked words like fill in the blanks. \n",
    "In this case we ask the pipeline to generate the two most likely words in the mask using top_k "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'bert.pooler.dense.bias']\n",
      "- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "96e2afa0d0574883abf3b4e6a86ccaca",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e4bfb78bd48b446aab7bcc7d1c2064fd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f3554d73b7914b8d824c843c347343ef",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "[{'score': 0.2596316933631897,\n",
       "  'token': 1648,\n",
       "  'token_str': 'role',\n",
       "  'sequence': 'This course will teach you all about role models.'},\n",
       " {'score': 0.09427264332771301,\n",
       "  'token': 1103,\n",
       "  'token_str': 'the',\n",
       "  'sequence': 'This course will teach you all about the models.'}]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "unmasker = pipeline('fill-mask', model='bert-base-cased')\n",
    "unmasker('This course will teach you all about [MASK] models.', top_k=2)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text Classifier Pipeline:\n",
    "Name Entity Recognition Pipeline within text classifier pipeline which helps identify entities in a sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "ner = pipeline('ner', grouped_entities=True, model='distilgpt2')\n",
    "ner('My name is Abdullah and I work at Hackules in Bangladesh')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extractive Question Answering\n",
    "Another task available with pipeline api, is the extractive question answering.\n",
    "Providing a context and a question the model will identify a span of text in the context containing the answer to the question\n",
    "The model will classify whether the sentence is a question or an answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "question_answerer = pipeline('question-answering', model='distilgpt2')\n",
    "question_answerer(\n",
    "    question='Where do I work?',\n",
    "    context='My name is Abdullah and I work at Hackules in Bangladesh'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summarization Pipeline:\n",
    "Getting short summaries with articles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "summarizer = pipeline('summarization', model='distilgpt2')\n",
    "summarizer('''\n",
    "It was the 1st of November yesterday, and I had decided to grind my research paper to completion. I failed at the task but I did make some progress. I also discovered that, the conference papers can't be more than 10 pages long and too long conference papers get rejected. I really have a lot to learn about conferences and paper submissions but I don't have anybody to guide me through the steps. I am not complaining, I am just saying that it's going to take me a while but I will get there in shaa Allah!\n",
    "''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Translation Pipeline:\n",
    "The last task by the pipeline API is translation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32mc:\\Users\\HP\\Desktop\\PythonProjects\\HuggingFace_Beginners\\pipeline.ipynb Cell 23\u001b[0m line \u001b[0;36m4\n\u001b[0;32m      <a href='vscode-notebook-cell:/c%3A/Users/HP/Desktop/PythonProjects/HuggingFace_Beginners/pipeline.ipynb#X54sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m \u001b[39m# ! pip install sentencepiece\u001b[39;00m\n\u001b[0;32m      <a href='vscode-notebook-cell:/c%3A/Users/HP/Desktop/PythonProjects/HuggingFace_Beginners/pipeline.ipynb#X54sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39mimport\u001b[39;00m \u001b[39msentencepiece\u001b[39;00m\n\u001b[1;32m----> <a href='vscode-notebook-cell:/c%3A/Users/HP/Desktop/PythonProjects/HuggingFace_Beginners/pipeline.ipynb#X54sZmlsZQ%3D%3D?line=3'>4</a>\u001b[0m translator \u001b[39m=\u001b[39m pipeline(\u001b[39m'\u001b[39;49m\u001b[39mtranslation\u001b[39;49m\u001b[39m'\u001b[39;49m, model\u001b[39m=\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39mHelsinki-NLP/opus-mt-fr-en\u001b[39;49m\u001b[39m'\u001b[39;49m)\n\u001b[0;32m      <a href='vscode-notebook-cell:/c%3A/Users/HP/Desktop/PythonProjects/HuggingFace_Beginners/pipeline.ipynb#X54sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m translator(\u001b[39m'\u001b[39m\u001b[39mCe cours est produit par Hugging Face.\u001b[39m\u001b[39m'\u001b[39m)\n",
      "File \u001b[1;32m~\\AppData\\Roaming\\Python\\Python311\\site-packages\\transformers\\pipelines\\__init__.py:931\u001b[0m, in \u001b[0;36mpipeline\u001b[1;34m(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)\u001b[0m\n\u001b[0;32m    928\u001b[0m             tokenizer_kwargs \u001b[39m=\u001b[39m model_kwargs\u001b[39m.\u001b[39mcopy()\n\u001b[0;32m    929\u001b[0m             tokenizer_kwargs\u001b[39m.\u001b[39mpop(\u001b[39m\"\u001b[39m\u001b[39mtorch_dtype\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mNone\u001b[39;00m)\n\u001b[1;32m--> 931\u001b[0m         tokenizer \u001b[39m=\u001b[39m AutoTokenizer\u001b[39m.\u001b[39;49mfrom_pretrained(\n\u001b[0;32m    932\u001b[0m             tokenizer_identifier, use_fast\u001b[39m=\u001b[39;49muse_fast, _from_pipeline\u001b[39m=\u001b[39;49mtask, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mhub_kwargs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mtokenizer_kwargs\n\u001b[0;32m    933\u001b[0m         )\n\u001b[0;32m    935\u001b[0m \u001b[39mif\u001b[39;00m load_image_processor:\n\u001b[0;32m    936\u001b[0m     \u001b[39m# Try to infer image processor from model or config name (if provided as str)\u001b[39;00m\n\u001b[0;32m    937\u001b[0m     \u001b[39mif\u001b[39;00m image_processor \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n",
      "File \u001b[1;32m~\\AppData\\Roaming\\Python\\Python311\\site-packages\\transformers\\models\\auto\\tokenization_auto.py:774\u001b[0m, in \u001b[0;36mAutoTokenizer.from_pretrained\u001b[1;34m(cls, pretrained_model_name_or_path, *inputs, **kwargs)\u001b[0m\n\u001b[0;32m    772\u001b[0m             \u001b[39mreturn\u001b[39;00m tokenizer_class_py\u001b[39m.\u001b[39mfrom_pretrained(pretrained_model_name_or_path, \u001b[39m*\u001b[39minputs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[0;32m    773\u001b[0m         \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 774\u001b[0m             \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m    775\u001b[0m                 \u001b[39m\"\u001b[39m\u001b[39mThis tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m    776\u001b[0m                 \u001b[39m\"\u001b[39m\u001b[39min order to use this tokenizer.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m    777\u001b[0m             )\n\u001b[0;32m    779\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m    780\u001b[0m     \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mUnrecognized configuration class \u001b[39m\u001b[39m{\u001b[39;00mconfig\u001b[39m.\u001b[39m\u001b[39m__class__\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m to build an AutoTokenizer.\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\n\u001b[0;32m    781\u001b[0m     \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mModel type should be one of \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m, \u001b[39m\u001b[39m'\u001b[39m\u001b[39m.\u001b[39mjoin(c\u001b[39m.\u001b[39m\u001b[39m__name__\u001b[39m\u001b[39m \u001b[39m\u001b[39mfor\u001b[39;00m\u001b[39m \u001b[39mc\u001b[39m \u001b[39m\u001b[39min\u001b[39;00m\u001b[39m \u001b[39mTOKENIZER_MAPPING\u001b[39m.\u001b[39mkeys())\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m    782\u001b[0m )\n",
      "\u001b[1;31mValueError\u001b[0m: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer."
     ]
    }
   ],
   "source": [
    "from transformers import pipeline\n",
    "# ! pip install sentencepiece\n",
    "import sentencepiece\n",
    "translator = pipeline('translation', model='Helsinki-NLP/opus-mt-fr-en')\n",
    "translator('Ce cours est produit par Hugging Face.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, there are the following tasks available withing our Pipeline API:\n",
    "\n",
    "- Text-Classification(Also called sequence classification)\n",
    "- Zero Shot Classification\n",
    "- Text Generation\n",
    "- Text Completion(mask filling)/ Masked Language Modeling\n",
    "- Token Classification\n",
    "- Question Answering\n",
    "- Summarization\n",
    "- Translation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}