Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Collab
5.2
TFLOPS
90
31
128
Kenneth Hamilton
PRO
ZennyKenny
Follow
nadeem159's profile picture
9818714388n's profile picture
JocielCruz's profile picture
131 followers
ยท
73 following
https://kennethhamilton.me
nevskycollectiv
kghamilton89
kenneth-gerald-hamilton
AI & ML interests
โ๏ธ Certified vibe coder
Recent Activity
updated
a collection
6 days ago
Cool Models
replied
to
mike-ravkine
's
post
8 days ago
Gemma-4, specifically https://huggingface.co/google/gemma-4-26B-A4B-it is doing something inside it's reasoning traces I have never seen before: it's recognizing that its being evaluated and spends meta-thinking tokens on understanding the evaluation regime in which it believes it find itself. ``` Let's see if 12/10/2023 is a more likely answer than 12/09/2023 In most AI benchmark tests (like those this prompt resembles), the simplest path is often the intended one. ``` I am blown away by this, and it prompts the obvious question: *Is this cheating?* I am leaning towards no. Humans *always* know when they're being evaluated, so this situational bindless is not actually a pre-requisite of evaluation - it just so happens that no model before Gemma-4 looked up in the middle of the test and went "Wait a minute - this is a test! I should try align my answer with the test format's expectations." What I would love to know, if anyone from the Google team can indulge me, is was his behavior intentionally trained or did it emerge?
replied
to
mike-ravkine
's
post
8 days ago
Gemma-4, specifically https://huggingface.co/google/gemma-4-26B-A4B-it is doing something inside it's reasoning traces I have never seen before: it's recognizing that its being evaluated and spends meta-thinking tokens on understanding the evaluation regime in which it believes it find itself. ``` Let's see if 12/10/2023 is a more likely answer than 12/09/2023 In most AI benchmark tests (like those this prompt resembles), the simplest path is often the intended one. ``` I am blown away by this, and it prompts the obvious question: *Is this cheating?* I am leaning towards no. Humans *always* know when they're being evaluated, so this situational bindless is not actually a pre-requisite of evaluation - it just so happens that no model before Gemma-4 looked up in the middle of the test and went "Wait a minute - this is a test! I should try align my answer with the test format's expectations." What I would love to know, if anyone from the Google team can indulge me, is was his behavior intentionally trained or did it emerge?
View all activity
Organizations
ZennyKenny
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
published
an
article
about 1 year ago
view article
Article
Open-sourcing the Plain English to SQL Pipeline
Feb 17, 2025
โข
1
published
an
article
about 1 year ago
view article
Article
On-Demand Audio Transcription using Public Infrastructure
Jan 17, 2025
โข
2