Models and configs for Ollama backend
#4
by kndtran - opened
For granite-common issue https://github.com/ibm-granite/granite-common/issues/94.
PR notes
- Conversion scripts for Ollama backend. Designed to run once to create the files in this PR.
convert_io_yaml_files.py- rewrites the vLLMio.yamlfor an Ollama backend- The
response_formatvalue is remapped to a different location in theio.yamlfor Ollama. No modification of the value is currently needed.
- The
convert_to_gguf.py- converts the.safetensorsto.ggufformat and writes theModelfiles for each LoRA adapter
- Ollama model naming scheme on filesystem renamed from
granite4:microtogranite4_microto avoid issues on Windows systems. - Simplify
run_ollama.shscript to assume that repo already contains converted files.
kndtran changed pull request status to open
Please do not merge this branch yet. There are issues with some of the quantized LoRA adapters as reported in the granite-common issue above. I can't seem to "unpublish" this PR.
This branch is ready to merge. The corresponding granite-common issue above will need to be updated to use the main branch once this PR is merged.
frreiss changed pull request status to merged