Fantastic Model for Legal/English Language Use Cases
Just wanted to share a few thoughts for my fellow lawyers and developers working on legal/English language use cases, and to express a HUGE thank you to the Google team for a fantastic OS model:
- My deployment and use case:
- I am running a large bank of Blackwell GPUs locally (combination RTX 6000 Pros and RTX 5090s).
- I am primarily working with my law firm clients, companies that do not want to send highly confidential information to the cloud (e.g., regulated companies, startups developing cutting edge tech).
- I tried both the MOE gemma-4-26B-A4B-it and the dense gemma-4-31B-it models. I settled on on gemma-4-31B-it for my use cases. The MOE model is much faster, but the dense model has higher language comprehension for my complex tasks. I quantized and calibrated gemma-4-31B-it on my legal dataset.
- With the quantized gemma-4-31B-it model (NVFP4) and VLLM in high concurrency, I am getting prompt processing in the range of 9K tokens/second, and generation in the range of 850 tokens/second.
- Thoughts:
- I previously used in production a number of other US Open Source models (e.g., variations of LLAMA, GPT-OSS 120b, etc.), both in the original form and trained/quantized/calibrated on my dataset. I also tested virtually all Chinese models, up to 750B parameters.
- Based on my testing so far, the gemma-4-31B-it model is the best model for my legal use case. Distinctly better comprehension of legal text in English even when compared with much larger models (actually better than ALL other models, regardless of size).
- A huge plus for my deployment is that this is released by Google, US company. Not making any geopolitical statement, but my clients are US companies with special compliance/confidentiality requirements, and I can get them comfortable with a model from Google immediately.
- I saw some user comments requesting a larger version of Gemma. I don't know about use cases outside legal language comprehension, but the community should try out this gemma-4-31B-it model for anything that deals with English language comprehension - in my tests so far, it is better than anything else in the Open Source domain for this use case. Google performed some AI miracles here. I dove a bit into the architecture of the gemma-4-31B-it model, and I can see some major departures from current OS implementations.
- Special Thanks to Google!
Many of us have been hoping for a new Gemma model from Google. Here it is! Multimodal, frontier performance on English language comprehension, and free. The community is grateful for this, we realize how much it takes to train a highly efficient model like this, and we realize that you are risking some cannibalization of the Gemini API revenue. Thank you, Google!!! We will repay you with loyalty in other areas.
MD
Hi @md-1415 ,
Thank you so much for taking the time to provide this feedback. We truly appreciate your insights and the detailed perspective you’ve shared with the community. Your detailed comparison between the MOE and dense variants, especially your experience with the gemma-4-31B-it model, along with your observations on quantization and throughput, adds meaningful depth to the broader discussion.
If the original post written by Gemma - thats awful mess of facts without natural way to even combining them. Why thoughts and greetings even separated?
Examples in studio please.
I wont believe that until i see that.
What do you mean exactly by legal work? Writing a letter? Claim?
Even ChatGPT 5 can't really do legal work and usually mistake the sides inside court case, also no consistency whatsover in planning strategy of court case.
Im skeptical such small model capable better than ChatGPT 5, i haven't tested it yet.
OCR so text recognition IS NOT LEGAL WORK. Thats ARCHIVAL or secretary work at best, i've worked in archives. For legal work you need a full laws dataset of all country, which is not easy to obtain, laws distributed by proprietary databases, if you know any country supplying laws by zip archive - tell me. Courts data is esp hard to get, same proprietary systems and etc etc.
About scandals and dangers of using Ai in legal works - Steve Lehto lawyer channel on Youtube. Latest case was in one woman imprisonment where prosecutor with courts missed hallucinations in documents and all this cost a freedom of this person (Ai made mistakes in appeal case if im correct).
Please do not use current Ai models in legal works, too many scandals.