Models and configs for Ollama backend

#4
by kndtran - opened

For granite-common issue https://github.com/ibm-granite/granite-common/issues/94.

PR notes

  1. Conversion scripts for Ollama backend. Designed to run once to create the files in this PR.
    1. convert_io_yaml_files.py - rewrites the vLLM io.yaml for an Ollama backend
      1. The response_format value is remapped to a different location in the io.yaml for Ollama. No modification of the value is currently needed.
    2. convert_to_gguf.py - converts the .safetensors to .gguf format and writes the Modelfiles for each LoRA adapter
  2. Ollama model naming scheme on filesystem renamed from granite4:micro to granite4_micro to avoid issues on Windows systems.
  3. Simplify run_ollama.sh script to assume that repo already contains converted files.
kndtran changed pull request status to open

Please do not merge this branch yet. There are issues with some of the quantized LoRA adapters as reported in the granite-common issue above. I can't seem to "unpublish" this PR.

This branch is ready to merge. The corresponding granite-common issue above will need to be updated to use the main branch once this PR is merged.

frreiss changed pull request status to merged

Sign up or log in to comment