junk outputs
If I eval this model on GCP Vertex AI model garden its great and no junk outputs
If i use vllm myself I see a huge number of junk outputs and my eval metrics decline.
e.g
"*Ayano's eyes light up, her eyes expressionlijkly shifting"
""he doesn't even torightly look at the screen"
"He stays exactly where you're lean against him"
I have tried a lot of different settings including copying the vllm settings which get used by Vertex AI AND using the same docker container that vertex AI uses too but I still get issues.
It might be the slightly different weights that are used by vertex AI
My vllm arguments look like:
"engine_args": {
'gpu_memory_utilization': 0.92,
'language_model_only': True,
'max_model_len': 10240,
'max_num_batched_tokens': 10240,
'max_num_seqs': 64,
'tensor_parallel_size': 1,
'trust_remote_code': True,
'tool-call-parser': 'gemma4',
'reasoning-parser': 'gemma4'
},
im using the default settings in the config (temp 1.0, top_k 64, top_p 0.95 etc. --> they get set automatically by vllm)