HF Inference endpoint failure (Gemma Gated Access)

#2
by christinamaria - opened

Hello,

I am trying to create a HF Inference Endpoint to run the neo4j/text2cypher-gemma-2-9b-it-finetuned-2024v1 model, however i envouter several issues.
Before deploying it on a GPU this warning appears:

image.png

By ignoring this warning and continuing to deployment, i get restricted access error togoogle/ gemma-2-9b-it model.

However I do have access to google/gemma-2-9b-it through my Hugging Face account, and i can also create an Inference Endpoint using google/gemma-2-9b-it directly, and it works as expected.

The error is :

[Server message]Endpoint failed to start
Exit code: 3. Reason:  model=model_dir, device=device, **kwargs)                   
01 Client Error. (Request ID: Root=1-683ee783-4cec08f347c79cb014e07e8b;33d2dda5-53d0-48ac-9bbb-b7404440f05a) 
Cannot access gated repo for url https://huggingface.co/google/gemma-2-9b-it/resolve/main/config.json. 
Access to model google/gemma-2-9b-it is restricted. You must have access to it and be authenticated to access it. 
Please log in. Application startup failed. Exiting.

How could i deploy this model using HF Inference Endpoint?
Thanks in advance.

christinamaria changed discussion title from HF Inefrence endpoint failure (Gemma Gated Access) to HF Inference endpoint failure (Gemma Gated Access)
Neo4j org

Hello,
Our neo4j/text2cypher-gemma-2-9b-it-finetuned-2024v1 model is a PEFT model. This means it references the base model google/gemma-2-9b-it, which also needs to be loaded when deploying to an Inference Endpoint. As a result, the endpoint must be able to access both our model and the base model. If authentication details/token are not provided, you’ll encounter the "restricted access error" that you mentioned. To resolve this, you can pass your Hugging Face access token with permission to access the base model:

Steps to generate and provide your token: (Copied from online resources)

  1. Visit https://huggingface.co/google/gemma-2-9b-it and ensure you have access.
  2. Go to your token settings: https://huggingface.co/settings/tokens
  3. Create a new token with read access.
  4. When setting up the Inference Endpoint, provide this token as an environment variable named HF_TOKEN under Advanced Settings > Environment Variables.

This will allow the endpoint to authenticate and download the base model correctly.

Thank you! The issue is resolved

christinamaria changed discussion status to closed

Sign up or log in to comment