From a linguistic perspective, information represents the novelty/surprise/importance of a word. Words that are harder to predict from context typically carry more information. A simple example: "This morning I opened the door and saw a 'UFO'." vs "This morning I opened the door and saw a 'cat'." — clearly "UFO" carries more information.
In our implementation, the information content of each token comes from how difficult it is for the LLM to predict that token from left to right.
From an information-theoretic perspective, this can be expressed as the conditional information of a token given the model and the preceding context:
Information of tokenᵢ in a text = -log₂P(tokenᵢ | model, token₀, …, tokenᵢ₋₁)
The core assumption behind Info Highlight is that this conditional information aligns with human subjective perception, such as novelty, surprise, and potential importance.
For an ideal model, whose knowledge and contextual understanding match that of the reader, the evaluation would perfectly align with human subjective perception.
Therefore, the gap between current results and reader perception mainly comes from two aspects:
The good news is that LLMs are improving so fast: current analysis results already reflect mainstream readers' subjective perception to some extent, and can be used to evaluate article information content and improve reading speed.
Built on the classic project GLTR.io, developed by Hendrik Strobelt et al. in 2019. GLTR was a web demo that pioneered using GPT-2 prediction probabilities to detect generated text.
However, Info Highlight is not meant to detect AI text, but to evaluate the "information quality" of text.
Is it an AI text detector?
No.
When we dislike AI text, we actually dislike low-quality text. We dislike low-quality human-written text, rather than high-quality AI-generated content. So the key is the "information quality" of the text. Info Highlight aims to detect "information quality" rather than "AI signs", though it can be used to detect AI-generated nonsense with no information content.
What LLM is currently used?
Currently the open-source Qwen3-0.6B/1.7B/4B/14B-Base is used. Among them, the 4B model gives results quite close to most people's subjective perception among the models the author has tested (note that larger model does not necessarily lead to more consistency with the reader's subjective perception). When hardware is limited, 0.6B/1.7B models are used; they perform slightly worse than 4B (information content difference is within ~15%), but the trend is similar.
Why does information content affect text quality?
Low information content means the LLM can easily predict it from context. If even a machine can predict it, how important can it be? Conversely, high information content means the LLM has difficulty predicting it from context. (Assuming it's not a mistake) Then it represents key information the author wants to convey that the machine doesn't know.